Archives
Categories
Blogroll
- November 2025 (3)
- October 2025 (9)
- September 2025 (3)
- August 2025 (5)
- July 2025 (1)
- June 2025 (2)
- May 2025 (3)
- April 2025 (2)
- March 2025 (7)
- February 2025 (10)
- January 2025 (6)
- December 2024 (7)
- September 2024 (1)
- August 2024 (2)
- July 2024 (2)
- May 2024 (2)
- April 2024 (2)
- February 2024 (2)
- April 2023 (1)
- March 2023 (2)
- September 2022 (1)
- February 2022 (1)
- November 2021 (1)
- March 2021 (1)
- February 2021 (2)
- August 2019 (1)
- November 2018 (1)
- May 2017 (1)
- December 2016 (1)
- April 2016 (1)
- August 2015 (1)
- December 2014 (1)
- August 2014 (1)
- March 2014 (1)
- December 2013 (1)
- October 2013 (3)
- September 2013 (4)
- August 2013 (2)
- July 2013 (1)
- June 2013 (1)
- February 2013 (1)
- October 2012 (1)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- October 2011 (1)
- June 2011 (1)
- May 2011 (1)
- April 2011 (1)
- March 2011 (1)
- February 2011 (1)
- January 2011 (1)
- December 2010 (3)
- November 2010 (1)
- October 2010 (1)
- September 2010 (1)
- August 2010 (1)
- July 2010 (1)
- May 2010 (3)
- April 2010 (1)
- March 2010 (2)
- February 2010 (3)
- January 2010 (4)
- December 2009 (2)
- November 2009 (5)
- October 2009 (2)
- September 2009 (2)
- August 2009 (3)
- July 2009 (1)
- May 2009 (1)
- April 2009 (1)
- March 2009 (5)
- February 2009 (5)
- January 2009 (5)
- December 2008 (3)
- November 2008 (7)
- October 2008 (4)
- September 2008 (2)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- May 2008 (1)
- April 2008 (1)
- January 2008 (4)
- December 2007 (3)
- March 2007 (3)
- February 2007 (1)
- January 2007 (2)
- December 2006 (4)
- November 2006 (18)
- AI (61)
- TIL deep dives (56)
- Python (55)
- Resolver One (34)
- LLM from scratch (28)
- Blogkeeping (18)
- PythonAnywhere (17)
- Linux (16)
- Startups (15)
- NSLU2 offsite backup project (13)
- TIL (13)
- Funny (11)
- Finance (10)
- Fine-tuning LLMs (10)
- Musings (10)
- C (9)
- Gadgets (8)
- Personal (8)
- Robotics (8)
- Website design (8)
- 3D (5)
- Rants (5)
- Cryptography (4)
- JavaScript (4)
- Music (4)
- Oddities (4)
- Quick links (4)
- Talks (4)
- Dirigible (3)
- Eee (3)
- Memes (3)
- Politics (3)
- Django (2)
- GPU Computing (2)
- LaTeX (2)
- MathML (2)
- OLPC XO (2)
- Retro Language Models (2)
- Space (2)
- VoIP (2)
- Copyright (1)
- Golang (1)
- Raspberry Pi (1)
- Software development tools (1)
- Agile Abstractions
- Astral Codex Ten
- :: (Bloggable a) => a -> IO ()
- David Friedman's Substack
- Econ & Energy
- Entrepreneurial Geekiness
- For some value of "Magic"
- Hackaday
- kaleidic.ai newsletter
- Knowing.NET
- Language Log
- Millennium Hand
- ntoll.org
- Obey the Testing Goat!
- PK
- PythonAnywhere News
- Simon Willison's Weblog
- Societive
- Software Deviser
- Some opinions, held with varying degrees of certainty
- tartley.com
Fine-tuning LLMs
From April until December 2024, I explored how you go about fine-tuning a 7B base model to handle chat. I started by training a smaller model locally, then found out how to train things on cloud computing environments, including multi-GPU training and training on machines where even a server-grade H100 GPU wasn't big enough to be able to train the model.
Here are the posts in this series:
- Messing around with fine-tuning LLMs (27 April 2024). In the first post in the series, I scope out the task, and fine-tune a 0.5B model on my own machine.
- Messing around with fine-tuning LLMs, part 2 -- to the cloud! (28 April 2024). Next, I take a look at cloud GPU providers and pick Lambda Labs. As a sanity check, I replicate my fine-tune of the 0.5B model on a single-GPU instance there.
- Messing around with fine-tuning LLMs, part 3 -- moar GPUs (15 May 2024). I then work out how to train the 0.5B model faster by using multiple GPUs in parallel.
- Messing around with fine-tuning LLMs, part 4 -- training cross-GPU. (21 May 2024). The first successful fine-tune of a 7B model -- but I have to offload the optimizer to the CPU. I'll need to find out why.
- Messing around with fine-tuning LLMs, part 5 -- exploring memory usage (5 July 2024). Some initial local experiments into memory usage for the 0.5B model to get some ideas as to why I had to offload the optimiser.
- Messing around with fine-tuning LLMs, part 6 -- measuring memory usage more systematically (10 July 2024). Measuring memory usage more systematically for the 0.5B model, also locally, to find out how it behaves with different sequence lengths.
- Messing around with fine-tuning LLMs, part 7 -- detailed memory usage across sequence lengths for an 8B model (16 August 2024). Making similar measurements of memory usage at different sequence lengths for the 8B model.
- Messing around with fine-tuning LLMs, part 8 -- detailed memory usage across batch sizes (25 August 2024). Measuring the effect of batch sizes on memory usage, with a sidetrack looking into Liger Kernel, a new and easy-to use replacement of the default CUDA kernels used for training that promises (and delivers) better memory usage and performance.
- Messing around with fine-tuning LLMs, part 9 -- gradient checkpointing (3 September 2024). Investigating how gradient checkpointing works, in the hope that it might allow me to trade off GPU processing for memory usage and get a larger batch size (meaning that each training iteration was slower, but the overall train took less time). Sadly, those hopes were dashed.
- Messing around with fine-tuning LLMs, part 10 -- finally training the model! (22 December 2024). The last in the series -- a deep dive into fine-tuning the 8B parameter LLM on instruction data, exploring memory usage, training strategies, and model deployment to Hugging Face.
Copyright (c) 2006-2025 by Giles Thomas.
This work is licensed under a Creative Commons Attribution 4.0 International License.