- January 2026 (2)
- December 2025 (1)
- November 2025 (3)
- October 2025 (9)
- September 2025 (3)
- August 2025 (5)
- July 2025 (1)
- June 2025 (2)
- May 2025 (3)
- April 2025 (2)
- March 2025 (7)
- February 2025 (10)
- January 2025 (6)
- December 2024 (7)
- September 2024 (1)
- August 2024 (2)
- July 2024 (2)
- May 2024 (2)
- April 2024 (2)
- February 2024 (2)
- April 2023 (1)
- March 2023 (2)
- September 2022 (1)
- February 2022 (1)
- November 2021 (1)
- March 2021 (1)
- February 2021 (2)
- August 2019 (1)
- November 2018 (1)
- May 2017 (1)
- December 2016 (1)
- April 2016 (1)
- August 2015 (1)
- December 2014 (1)
- August 2014 (1)
- March 2014 (1)
- December 2013 (1)
- October 2013 (3)
- September 2013 (4)
- August 2013 (2)
- July 2013 (1)
- June 2013 (1)
- February 2013 (1)
- October 2012 (1)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- October 2011 (1)
- June 2011 (1)
- May 2011 (1)
- April 2011 (1)
- March 2011 (1)
- February 2011 (1)
- January 2011 (1)
- December 2010 (3)
- November 2010 (1)
- October 2010 (1)
- September 2010 (1)
- August 2010 (1)
- July 2010 (1)
- May 2010 (3)
- April 2010 (1)
- March 2010 (2)
- February 2010 (3)
- January 2010 (4)
- December 2009 (2)
- November 2009 (5)
- October 2009 (2)
- September 2009 (2)
- August 2009 (3)
- July 2009 (1)
- May 2009 (1)
- April 2009 (1)
- March 2009 (5)
- February 2009 (5)
- January 2009 (5)
- December 2008 (3)
- November 2008 (7)
- October 2008 (4)
- September 2008 (2)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- May 2008 (1)
- April 2008 (1)
- January 2008 (4)
- December 2007 (3)
- March 2007 (3)
- February 2007 (1)
- January 2007 (2)
- December 2006 (4)
- November 2006 (18)
- AI (64)
- TIL deep dives (59)
- Python (58)
- Resolver One (34)
- LLM from scratch (31)
- Blogkeeping (18)
- PythonAnywhere (17)
- Linux (16)
- Startups (15)
- NSLU2 offsite backup project (13)
- TIL (13)
- Funny (11)
- Finance (10)
- Fine-tuning LLMs (10)
- Musings (10)
- C (9)
- Gadgets (8)
- Personal (8)
- Robotics (8)
- Website design (8)
- 3D (5)
- Rants (5)
- Cryptography (4)
- JavaScript (4)
- Music (4)
- Oddities (4)
- Quick links (4)
- Talks (4)
- Dirigible (3)
- Eee (3)
- Memes (3)
- Politics (3)
- Django (2)
- GPU Computing (2)
- LaTeX (2)
- MathML (2)
- OLPC XO (2)
- Retro Language Models (2)
- Space (2)
- VoIP (2)
- Copyright (1)
- Golang (1)
- Raspberry Pi (1)
- Software development tools (1)
- Agile Abstractions
- Astral Codex Ten
- :: (Bloggable a) => a -> IO ()
- David Friedman's Substack
- Econ & Energy
- Entrepreneurial Geekiness
- For some value of "Magic"
- Hackaday
- kaleidic.ai newsletter
- Knowing.NET
- Language Log
- Millennium Hand
- ntoll.org
- Obey the Testing Goat!
- PK
- PythonAnywhere News
- Simon Willison's Weblog
- Societive
- Software Deviser
- Some opinions, held with varying degrees of certainty
- tartley.com
Writing an LLM from scratch, part 19 -- wrapping up Chapter 4
I've now finished chapter 4 in Sebastian Raschka's book
"Build a Large Language Model (from Scratch)",
having worked through shortcut connections in my last post.
The remainder of the chapter doesn't introduce any new concepts -- instead, it
shows how to put all of the code we've worked through so far into a full GPT-type
LLM. You can see my code here, in
the file gpt.py -- though I strongly recommend that if you're also working through
the book, you type it in yourself -- I found that even the mechanical process of typing really helped me to solidify the concepts.
So instead of writing a post about the rather boring process of typing in code, I decided that I wanted to put together something in the spirit of writing the post that I wished I'd found when I started reading the book. I would summarise everything I've learned, with links back to the other posts in this series. As I wrote it, I realised that the best way to describe things was to try to explain things to myself as I was before ChatGPT came out, say mid-2022 -- a techie, yes, but with minimal understanding of how modern AI works.
Some 6,000 words in, I started thinking that perhaps I was trying to pack a little bit too much into it. So, coming up next, three "state of play" posts, targeting people with 2022-Giles' level of knowledge.
- "What AI chatbots are actually doing under the hood" This is a high-level post, including stuff that I already knew when I started the book, giving enough information for anyone -- hopefully even non-techies -- to understand how we get from next-word completion to something you can have a conversation with.
- "The maths you need to start understanding LLMs". This one is more techie-focused. For understanding LLMs at a fairly good level, you don't need much beyond high-school maths. So this is kind of a bridging section, which fills in the mathematical concepts that they don't teach at school. Nothing particularly difficult, though, at least if you remember matrices and similar stuff from your schooldays.
- "How do LLMs work?". This one actually explains how these AIs work, starting with a high-level description, then zooming in on the building blocks that make it up.
Next, it's time to move on to the next chapter, training. Hopefully all the time I spent fine-tuning LLMs last year will turn out to be useful there!
If you want to jump straight forward to that, here's the first post on training.