- June 2026 (2)
- May 2026 (2)
- April 2026 (11)
- March 2026 (3)
- February 2026 (4)
- January 2026 (4)
- December 2025 (1)
- November 2025 (3)
- October 2025 (9)
- September 2025 (3)
- August 2025 (5)
- July 2025 (1)
- June 2025 (2)
- May 2025 (3)
- April 2025 (2)
- March 2025 (7)
- February 2025 (10)
- January 2025 (6)
- December 2024 (7)
- September 2024 (1)
- August 2024 (2)
- July 2024 (2)
- May 2024 (2)
- April 2024 (2)
- February 2024 (2)
- April 2023 (1)
- March 2023 (2)
- September 2022 (1)
- February 2022 (1)
- November 2021 (1)
- March 2021 (1)
- February 2021 (2)
- August 2019 (1)
- November 2018 (1)
- May 2017 (1)
- December 2016 (1)
- April 2016 (1)
- August 2015 (1)
- December 2014 (1)
- August 2014 (1)
- March 2014 (1)
- December 2013 (1)
- October 2013 (3)
- September 2013 (4)
- August 2013 (2)
- July 2013 (1)
- June 2013 (1)
- February 2013 (1)
- October 2012 (1)
- June 2012 (1)
- May 2012 (1)
- April 2012 (1)
- February 2012 (1)
- October 2011 (1)
- June 2011 (1)
- May 2011 (1)
- April 2011 (1)
- March 2011 (1)
- February 2011 (1)
- January 2011 (1)
- December 2010 (3)
- November 2010 (1)
- October 2010 (1)
- September 2010 (1)
- August 2010 (1)
- July 2010 (1)
- May 2010 (3)
- April 2010 (1)
- March 2010 (2)
- February 2010 (3)
- January 2010 (4)
- December 2009 (2)
- November 2009 (5)
- October 2009 (2)
- September 2009 (2)
- August 2009 (3)
- July 2009 (1)
- May 2009 (1)
- April 2009 (1)
- March 2009 (5)
- February 2009 (5)
- January 2009 (5)
- December 2008 (3)
- November 2008 (7)
- October 2008 (4)
- September 2008 (2)
- August 2008 (1)
- July 2008 (1)
- June 2008 (1)
- May 2008 (1)
- April 2008 (1)
- January 2008 (4)
- December 2007 (3)
- March 2007 (3)
- February 2007 (1)
- January 2007 (2)
- December 2006 (4)
- November 2006 (18)
- AI (83)
- TIL deep dives (75)
- Python (71)
- LLM from scratch (46)
- Resolver One (34)
- PyTorch (21)
- Blogkeeping (18)
- TIL (18)
- PythonAnywhere (17)
- Linux (16)
- Startups (15)
- Hugging Face (13)
- NSLU2 offsite backup project (13)
- Funny (11)
- Gadgets (11)
- Musings (11)
- Finance (10)
- Fine-tuning LLMs (10)
- C (9)
- Personal (8)
- Robotics (8)
- Website design (8)
- 3D (5)
- Rants (5)
- Cryptography (4)
- JavaScript (4)
- Music (4)
- Oddities (4)
- Quick links (4)
- Talks (4)
- Dirigible (3)
- Eee (3)
- JAX (3)
- Memes (3)
- Politics (3)
- Django (2)
- GPU Computing (2)
- LaTeX (2)
- MathML (2)
- OLPC XO (2)
- Retro Language Models (2)
- Space (2)
- VoIP (2)
- Copyright (1)
- Golang (1)
- Microprojects (1)
- Raspberry Pi (1)
- Software development tools (1)
- Agile Abstractions
- Astral Codex Ten
- :: (Bloggable a) => a -> IO ()
- David Friedman's Substack
- Econ & Energy
- Entrepreneurial Geekiness
- For some value of "Magic"
- Hackaday
- kaleidic.ai newsletter
- Knowing.NET
- Language Log
- Millennium Hand
- ntoll.org
- Obey the Testing Goat!
- PK
- PythonAnywhere News
- Simon Willison's Weblog
- Societive
- Software Deviser
- Some opinions, held with varying degrees of certainty
- tartley.com
An odd crontab problem
This took a little while to work out, so it's worth sharing here just in case anyone else has the same problems and is googling for solutions. We had a problem on one of our web servers at Resolver which manifested itself in some (but not all) cron jobs being run twice, which was causing all kinds of problems. Here's how we tracked it down and solved it.
The main symptom of the problem was that something was going wrong with Apache logfile rotation. The files appeared to be being rotated twice, so each week we'd get a properly rotated one and then a zero-length one created immediately after:
-rw-r--r-- 1 root root 861 2010-05-16 06:23 access.log.2.gz
-rw-r----- 1 root adm 0 2010-05-16 06:25 access.log.1
-rw-r----- 1 root adm 5590 2010-05-18 11:20 access.log
This was annoying, and it was making it unnecessarily difficult to measure some of our website metrics. It also made us worry that data was being lost; there were occasionally gaps in the logfiles, where it looked like a week's worth of data had been lost while rotating.
Our first thought was that because something seemed to be going wrong with the log rotate script, it was odd that we weren't receiving any email about it. Normally if a cron job writes output, it gets emailed to root. A bit of investigation revealed a problem with the mail setup (which I won't go into now), and fixing led to some interesting information. Once we'd fixed the email problem, we started getting messages like this at 17 minutes past every hour (when the hourly jobs were scheduled to run):
Subject: Cron <root@machine-name> root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
/bin/sh: root: command not found
It appeared that it was prepending the name of the user account that a cron job
should be run as onto the job's command. Now, the hourly jobs are triggered by a
line in /etc/crontab that looks like this:
17 * * * * root cd / && run-parts --report /etc/cron.hourly
So our first guess was that it was some kind of whitespace thing; the format is
MM HH DD MM WW username command
So perhaps there were spaces separating the day-of-week (WW) field from the username, when there should have been a single space or a tab? On first examination, this looked like it might be it: every other line in the crontab used a tab to separate the two fields, but the hourly cron job line used a number of spaces. We fixed that, and waited for the next 17 minutes past the hour.
But it didn't work -- we got the same error. By this time, it was getting quite late in the evening, so we left it to run overnight to see if we got any more useful information.
The next morning, we found that (as you'd expect) we'd got an error message at 17 minutes past each hour. However, more usefully, we got this pair of emails:
Time: 06:25
Subject: Cron <root@machine-name> root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
/bin/sh: root: command not found
/etc/cron.daily/logrotate:
error: Failed to remove old log /var/log/mysql.log.8.gz: No such file or directory
run-parts: /etc/cron.daily/logrotate exited with return code 1
/etc/cron.daily/sysklogd:
mv: cannot stat `/var/log/syslog.new': No such file or directory
Time: 06:25
Subject: Cron <root@machine-name> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
/etc/cron.daily/standard:
mv: cannot stat `.//dpkg.status.5.gz': No such file or directory
/etc/cron.daily/sysklogd:
gzip: /var/log//syslog.0: No such file or directory
mv: cannot stat `/var/log//syslog.0.gz': No such file or directory
So, the problem seemed clear. Each line in the crontab was being run twice; once with the username being mistakenly taken as part of the command, and once without. This was what was causing the double log-rotation -- and perhaps other problems besides.
It didn't take long to work out what had happened. The format of the file /etc/crontab
is unusual in having a username field between the timing information and the
command to run; there are also separate crontabs for each user, which omit that
field. You set up your own per-user crontab by running crontabfilename; this
installs the specified file as your personal crontab. However, there's no need
to do that with /etc/crontab -- it's always run, and there's no need to install it.
Clearly, someone had been unaware of that last point, and after modifying /etc/crontab,
had run crontab /etc/crontab as root to install it. We checked this by running
as root crontab -l, which outputs the installed crontab for the current user --
as we suspected, it displayed an out-of-date version of the contents of /etc/crontab.
Running crontab -r to remove it fixed the problem; jobs were being run from
/etc/crontab only, and things started working again.