Retro Language Models: Rebuilding Karpathy’s RNN in PyTorch

Posted on 24 October 2025 in AI, Retro Language Models, Python, TIL deep dives |

I recently posted about Andrej Karpathy's classic 2015 essay, "The Unreasonable Effectiveness of Recurrent Neural Networks". In that post, I went through what the essay said, and gave a few hints on how the RNNs he was working with at the time differ from the Transformers-based LLMs I've been learning about.

This post is a bit more hands-on. To understand how these RNNs really work, it's best to write some actual code, so I've implemented a version of Karpathy's original code using PyTorch's built-in LSTM class -- here's the repo. I've tried to stay as close as possible to the original, but I believe it's reasonably PyTorch-native in style too. (Which is maybe not all that surprising, given that he wrote it using Torch, the Lua-based predecessor to PyTorch.)

In this post, I'll walk through how it works, as of commit daab2e1. In follow-up posts, I'll dig in further, actually implementing my own RNNs rather than relying on PyTorch's.

All set?

[ Read more ]


Revisiting Karpathy’s 'The Unreasonable Effectiveness of Recurrent Neural Networks'

Posted on 11 October 2025 in AI, Retro Language Models, TIL deep dives |

Being on a sabbatical means having a bit more time on my hands than I'm used to, and I wanted to broaden my horizons a little. I've been learning how current LLMs work by going through Sebastian Raschka's book "Build a Large Language Model (from Scratch)", but how about the history -- where did this design come from? What did people do before Transformers?

Back when it was published in 2015, Andrej Karpathy's blog post "The Unreasonable Effectiveness of Recurrent Neural Networks" went viral.

It's easy to see why. While interesting stuff had been coming out of AI labs for some time, for those of us in the broader tech community, it still felt like we were in an AI winter. Karpathy's post showed that things were in fact moving pretty fast -- he showed that he could train recurrent neural networks (RNNs) on text, and get them to generate surprisingly readable results.

For example, he trained one on the complete works of Shakespeare, and got output like this:

KING LEAR:
O, if you were a feeble sight, the courtesy of your law,
Your sight and several breath, will wear the gods
With his heads, and my hands are wonder'd at the deeds,
So drop upon your lordship's head, and your opinion
Shall be against your honour.

As he says, you could almost (if not quite) mistake it for a real quote! And this is from a network that had to learn everything from scratch -- no tokenising, just bytes. It went from generating random junk like this:

bo.+\x94G5YFM,}Hx'E{*T]v>>,2pw\nRb/f{a(3n.\xe2K5OGc

...to learning that there was such a thing as words, to learning English words, to learning the rules of layout required for a play.

This was amazing enough that it even hit the mainstream. A meme template you still see everywhere is "I forced a bot to watch 10,000 episodes of $TV_SHOW and here's what it came up with" -- followed by some crazy parody of the TV show in question. (A personal favourite is this one by Keaton Patti for "Queer Eye".)

The source of that meme template was actually a real thing -- a developer called Andy Herd trained an RNN on scripts from "Friends", and generated an almost-coherent but delightfully quirky script fragment. Sadly I can't find it on the Internet any more (if anyone has a copy, please share!) -- Herd is no longer on X/Twitter, and there seems to be no trace of the fragment, just news stories about it. But that was in early 2016, just after Karpathy's blog post. People saw it, thought it was funny, and (slightly ironically) discovered that humans could do better.

So, this was a post that showed techies in general how impressive the results you could get from then-recent AI were, and that had a viral impact on Internet culture. It came out in 2015, two years before "Attention Is All You Need", which introduced the Transformers architecture that powers essentially all mainstream AI these days. (It's certainly worth mentioning that the underlying idea wasn't exactly unknown, though -- near the end of the post, Karpathy explicitly highlights that the "concept of attention is the most interesting recent architectural innovation in neural networks".)

I didn't have time to go through it and try to play with the code when it came out, but now that I'm on sabbatical, it's the perfect time to fix that! I've implemented my own version using PyTorch, and you can clone and run it. Some sample output after training on the Project Gutenberg Complete Works of Shakespeare:

SOLANIO.
Not anything
With her own calling bids me, I look down,
That we attend for letters—are a sovereign,
And so, that love have so as yours; you rogue.
We are hax on me but the way to stop.

[_Stabs John of London. But fearful, Mercutio as the Dromio sleeps
fallen._]

ANTONIO.
Yes, then, it stands, and is the love in thy life.

There's a README.md in the repo with full instructions about how to use it -- I wrote the code myself (with some AI guidance on how to use the APIs), but Claude was invaluable for taking a look at the codebase and generating much better and more useful instructions on how to use it than I would have done :-)

This code is actually "cheating" a bit, because Karpathy's original repo has a full implementation of several kinds of RNNs (in Lua, which is what the original Torch framework was based on), while I'm using PyTorch's built-in LSTM class, which implements a Long Short-Term Memory network -- the specific kind of RNN used to generate the samples in the post (though not in the code snippets, which are from "vanilla" RNNs).

Over the next few posts in this series (which I'll interleave with "LLM from scratch" ones), I'll cover:

  1. A writeup of the PyTorch code as it currently is.
  2. Implementation of a regular RNN in PyTorch, showing why it's not as good as an LSTM.
  3. Implementation of an LSTM in PyTorch, which (hopefully) will work as well as the built-in one.

However, in this first post I want to talk about the original article and highlight how the techniques differ from what I've seen while learning about modern LLMs.

If you're interested (and haven't already zoomed off to start generating your own version of "War and Peace" using that repo), then read on!

[ Read more ]