Dropout and mandatory vacation
As I was dozing off the other night, after my post on dropout, it popped into my mind that it's not dissimilar to something many financial firms do. They require certain key employees to take at least two consecutive weeks of holiday every year -- not because they're kind employers who believe in a healthy work-life balance (source: I worked for one) but because it makes sure the firm is functioning safely and effectively, at a small cost in performance.
There are two reasons this helps:
- It reduces key-person risk. By enforcing vacation like this, they make absolutely sure that the business can continue to operate even if some people are out. If stuff goes wrong while they're out, then obviously processes are broken or other people don't have the knowledge they need to pick up the slack. So long as it's well-managed, those problems can be fixed, which means that if the key people quit, there's less damage done. Think of it as being like reducing the bus number of a dev team.
- It can uncover misbehaviour. Let's imagine a trader is doing something they shouldn't -- maybe fraud, or perhaps just covering up for their mistakes so that they don't get fired. They might be able to manage that by shuffling balances around if they're in the office every day, but two weeks out should mean that whoever is covering for them will work out that something isn't right.
Now, I should admit that the second of those (a) doesn't really apply to dropout 1 and (b) is probably the more important of the two from a bank's perspective.
But the first, I think, is a great metaphor for dropout during training. What we want to do is make sure that no particular parameter is "key"; we want the knowledge and intelligence to be spread across the model as a whole.
That also clears up a couple of questions I had about dropout:
- It slows down training. Yes, if you're doing dropout, you'll see your error falling more slowly than if you don't -- just like the trading desk sees their performance drop a bit when their top trader is on mandatory vacation. But that's a cost you pay to gain performance at other times -- the rest of the year for the bank, or at inference time for the model.
- Do you keep gradients for, and back-prop to, the dropped-out parameters? No, just like the bank wouldn't put the people who were out of the office through training for issues that came up during their absence. They'd train the people or fix the systems that had problems instead.
Now, is this a perfect metaphor, or even a great one? Maybe not. But it works for me, and I thought I'd share it in case it's useful for anyone else. And I'm going to be looking for similar organisational metaphors for other ML techniques -- I think they are a useful way to clarify things, especially for those of us who (for better or for worse) have spent time in the trenches of actual organisations.
-
There might be some applicability to alignment training of multi-agent models, though? ↩
It's still worth blogging in the age of AI
My post about blogging as writing the tutorial that you wished you'd found really took off on Hacker News. There were a lot of excellent comments, but one thing kept coming up: what's the point in blogging if people are using ChatGPT, Claude and DeepSeek to spoon-feed them answers? Who, apart from the AIs, will read what you write?
I was asking myself the same question when I started blogging semi-regularly again last year, and this post is an attempt to summarise why I decided that it was worthwhile. The TL;DR: blogging isn't just about being read -- it's about learning and thinking, and having a durable proof that you can do both.
On the benefits of learning in public
While laid up with a minor but annoying medical issue over the last week, I've blogged more than usual. I've also spent some time reading through the archives here, and come to the conclusion that the best posts I've made -- at least from my perspective -- follow a similar pattern. They're posts where I've been learning how to do something, or how something worked, and presented what I've found as a summary, often as a tutorial.
I think of these as writing the post that I wished I'd found when I started learning whatever it was.
On the perils of AI-first debugging -- or, why Stack Overflow still matters in 2025
"My AI hype/terror level is directly proportional to my ratio of reading news about it to actually trying to get things done with it."
This post may not age well, as AI-assisted coding is progressing at an absurd rate. But I think that this is an important thing to remember right now: current LLMs can not only hallucinate, but they can misweight the evidence available to them, and make mistakes when debugging that human developers would not. If you don't allow for this you can waste quite a lot of time!
Do reasoning LLMs need their own Philosophical Language?
A few days ago, I saw a cluster of tweets about OpenAI's o1 randomly switching to Chinese while reasoning -- here's a good example. I think I've seen it switch languages a few times as well. Thinking about it, Chinese -- or any other language written in a non-Latin alphabet -- would be particularly noticeable, because those notes describing what it's thinking about flash by pretty quickly, and you're only really likely to notice something weird if it's immediately visibly different to what you expect. So perhaps it's spending a lot of its time switching from language to language depending on what it's thinking about, and then it translates back to the language of the conversation for the final output.
Why would it do that? Presumably certain topics are covered better in its training set in specific languages -- it will have more on Chinese history in Chinese, Russian history in Russian, and so on. But equally possibly, some languages are easier for it to reason about certain topics in. Tiezhen Wang, a bilingual AI developer, tweeted that he preferred doing maths in Chinese "because each digit is just one syllable, which makes calculations crisp and efficient". Perhaps there's something similar there for LLMs.
That got me thinking about the 17th-century idea of a Philosophical Language. If you've read Neal Stephenson's Baroque Cycle books, you'll maybe remember it from there -- that's certainly where I heard about it. The idea was that natural human languages were not very good for reasoning about things, and the solution would be to create an ideal, consciously-designed language that was more rational. Then philosophers (or scientists as we'd say these days) could work in it and get better results.
There are echos of that in E' (E-Prime), another one I picked up on from fiction (this time from The Illuminatus! Trilogy). It's English, without the verb "to be", the idea being that most uses of the word are unnecessarily foggy and would be better replaced. "Mary is a doctor" implies that her job is the important thing about her, whereas "Mary practices medicine" is specific that it's one just aspect of her. What I like about it is that it -- in theory -- gets a more "Philosophical" language with a really small tweak rather than a complete redesign.
What I'm wondering is, are human languages really the right way for LLMs to be reasoning if we want accurate results quickly? We all know how easy it is to be bamboozled by words, either our own or other people's. Is there some way we could construct a language that would be better?
The baroque philosophers ultimately failed, and modern scientists tend to switch to mathematics when they need to be precise ("physics is a system for translating the Universe into maths so that you can reason about it" -- discuss).
But perhaps by watching which languages o1 is choosing for different kinds of reasoning we could identify pre-existing (grammatical/morphological/etc) structures that just seem to work better for different kinds of tasks, and then use that as a framework to build something on top of. That feels like something that could be done much more easily now than it could in the pre-LLM world.
Or maybe a reasoning language is something that could be learned as part of a training process; perhaps each LLM could develop its own, after pre-training with human languages to get it to understand the underlying concept of "language". Then it might better mirror how LLMs work -- its structures might map more directly to the way transformers process information. It might have ways of representing things that you literally could not describe in human languages.
Think of it as a machine code for LLMs, perhaps. Is it a dumb idea? As always, comments are open :-)
An aside: SEO for restaurants
The other day, we got an ad through our letterbox for a new Thai restaurant. We'd become fed up with the other neighbourhood Thais, so decided to try this one this evening. We could remember the name, "Cafe de Thai", and the street, All Saints Road, but no more, but hey, no problem: let's Google it!
The results were odd; I won't link to them because they'll change rapidly enough, but what we found was that the front page results had two links to aggregators of celebrity Twitter accounts (because someone who is apparently semi-famous tweeted about the place), but everything else was about other places on the same street, or with vaguely similar names. By contrast, a search for their competitors came up with a bunch of random London restaurant listing sites, many of which I'd never heard of -- but all of which had the information I was looking for, to wit the telephone number and the precise address.
What's interesting to me is that (a) neither restaurant's own web page was on the first page of the listings, and (b) this didn't matter. All that mattered was that the contact details were at the front of the list; the more established place had loads of listings sites giving contact details for them, but the newer place was nowhere to be found. So perhaps, while software companies spend money to make as sure as possible that their own website is at the top of the search results for their name and industry segment, SEO for restaurants is much more nuanced: you don't need your own website to come first, just that of a decent listings site. Ideally, one would assume, a listings site where you get a good rating...
Anyway, just in case anyone has wound up on this page looking for details of the restaurant:
Cafe de Thai
29 All Saints Road
London
020 7243 3001
I recommend the scallops and the weeping tiger; Lola liked her dim sum and red curry with prawns. Alan Carr recommends the green curry, apparently...
Making a fool of yourself in public
On the Business of Software Blog, Neil Davidson recommends using your fear of making yourself look stupid by failing publicly as a way to motivate yourself to work as hard as you need to work on your startup. Sounds right to me. When I was in my early 20s I saw the mortality rates for smokers and decided that I would give up at the age of 30. In order to make sure that I stuck to that, over the years I told pretty much every one of my friends that I was going to quit then, which meant that I really could not back down. The result is that on the night of my 30th birthday party I quit, and (bar one or two particularly drunken evenings) I've not touched a cigarette since.
Why should the government fund space exploration?
This article (via /.) is meant to discuss whether space exploration is worth the cost, but discusses government-funded space exploration almost exclusively. This makes sense; the discussion as to whether whether commercial and other private space exploration is worth the cost is more one for the boardroom, not the New York Times. And it's an interesting question; I'm pretty libertarian, and government-funded anything tends to raise my hackles -- and to be perfectly honest, many of the arguments mentioned by the contributors to the article sound pretty weak.
But one does stand out.
I asked guests on The Space Show, students, and people in space-related fields what inspired or motivated them to start a space business or pursue their science education, over 80 percent said they were inspired and motivated because of our having gone to the moon.
When I was a kid, like most boys then, I wanted to be an astronaut. I grew out of it, but my interest in science -- which eventually led to my career in technology -- started then.
It's hardly scientific to point at the decline in space exploration in the West and the decline in the number of science graduates, and the contrasting rises in both in -- say - China -- and claim some kind of correlation. But it does make you think.
If space exploration increases children's interest in science, and causes long-term benefits to the economy that are not directly captured (or, I think capturable) by the explorers, then perhaps that's a good reason for state spending in that area.
Of course -- as you might have realised by my use of the word "West" above, it's not directly captured by the funding country either. British children like me were inspired by American space exploration. Would they be inspired by Chinese space exploration?
I'll leave that one open.