Writing an LLM from scratch, part 19 -- wrapping up Chapter 4

Posted on 29 August 2025 in AI, LLM from scratch, TIL deep dives

I've now finished chapter 4 in Sebastian Raschka's book "Build a Large Language Model (from Scratch)", having worked through shortcut connections in my last post. The remainder of the chapter doesn't introduce any new concepts -- instead, it shows how to put all of the code we've worked through so far into a full GPT-type LLM. You can see my code here, in the file gpt.py -- though I strongly recommend that if you're also working through the book, you type it in yourself -- I found that even the mechanical process of typing really helped me to solidify the concepts.

So instead of writing a post about the rather boring process of typing in code, I decided that I wanted to put together something in the spirit of writing the post that I wished I'd found when I started reading the book. I would summarise everything I've learned, with links back to the other posts in this series. As I wrote it, I realised that the best way to describe things was to try to explain things to myself as I was before ChatGPT came out, say mid-2022 -- a techie, yes, but with minimal understanding of how modern AI works.

Some 6,000 words in, I started thinking that perhaps I was trying to pack a little bit too much into it. So, coming up next, three "state of play" posts, targeting people with 2022-Giles' level of knowledge.

Next, it's time to move on to the next chapter, training. Hopefully all the time I spent fine-tuning LLMs last year will turn out to be useful there!

If you want to jump straight forward to that, here's the first post on training.