This took a little while to work out, so it’s worth sharing here just in case anyone else has the same problems and is googling for solutions. We had a problem on one of our web servers at Resolver which manifested itself in some (but not all) cron jobs being run twice, which was causing all kinds of problems. Here’s how we tracked it down and solved it.
The main symptom of the problem was that something was going wrong with Apache logfile rotation. The files appeared to be being rotated twice, so each week we’d get a properly rotated one and then a zero-length one created immediately after:
-rw-r--r-- 1 root root 861 2010-05-16 06:23 access.log.2.gz -rw-r----- 1 root adm 0 2010-05-16 06:25 access.log.1 -rw-r----- 1 root adm 5590 2010-05-18 11:20 access.log
This was annoying, and it was making it unnecessarily difficult to measure some of our website metrics. It also made us worry that data was being lost; there were occasionally gaps in the logfiles, where it looked like a week’s worth of data had been lost while rotating.
Our first thought was that because something seemed to be going wrong with the log rotate script, it was odd that we weren’t receiving any email about it. Normally if a cron job writes output, it gets emailed to root. A bit of investigation revealed a problem with the mail setup (which I won’t go into now), and fixing led to some interesting information. Once we’d fixed the email problem, we started getting messages like this at 17 minutes past every hour (when the hourly jobs were scheduled to run):
root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) /bin/sh: root: command not found
It appeared that it was prepending the name of the user account that a cron job should be run as onto the job’s command. Now, the hourly jobs are triggered by a line in
/etc/crontab that looks like this:
17 * * * * root cd / && run-parts --report /etc/cron.hourly
So our first guess was that it was some kind of whitespace thing; the format is
MM HH DD MM WW username command
So perhaps there were spaces separating the day-of-week (WW) field from the username, when there should have been a single space or a tab? On first examination, this looked like it might be it: every other line in the crontab used a tab to separate the two fields, but the hourly cron job line used a number of spaces. We fixed that, and waited for the next 17 minutes past the hour.
But it didn’t work — we got the same error. By this time, it was getting quite late in the evening, so we left it to run overnight to see if we got any more useful information.
The next morning, we found that (as you’d expect) we’d got an error message at 17 minutes past each hour. However, more usefully, we got this pair of emails:
Time: 06:25 Subject: Cron
root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) /bin/sh: root: command not found /etc/cron.daily/logrotate: error: Failed to remove old log /var/log/mysql.log.8.gz: No such file or directory run-parts: /etc/cron.daily/logrotate exited with return code 1 /etc/cron.daily/sysklogd: mv: cannot stat `/var/log/syslog.new': No such file or directory
Time: 06:25 Subject: Cron
test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) /etc/cron.daily/standard: mv: cannot stat `.//dpkg.status.5.gz': No such file or directory /etc/cron.daily/sysklogd: gzip: /var/log//syslog.0: No such file or directory mv: cannot stat `/var/log//syslog.0.gz': No such file or directory
So, the problem seemed clear. Each line in the crontab was being run twice; once with the username being mistakenly taken as part of the command, and once without. This was what was causing the double log-rotation — and perhaps other problems besides.
It didn’t take long to work out what had happened. The format of the file
/etc/crontab is unusual in having a username field between the timing information and the command to run; there are also separate crontabs for each user, which omit that field. You set up your own per-user crontab by running
crontab filename; this installs the specified file as your personal crontab. However, there’s no need to do that with
/etc/crontab — it’s always run, and there’s no need to install it.
Clearly, someone had been unaware of that last point, and after modifying
/etc/crontab, had run
crontab /etc/crontab as root to install it. We checked this by running as root
crontab -l, which outputs the installed crontab for the current user — as we suspected, it displayed an out-of-date version of the contents of
crontab -r to remove it fixed the problem; jobs were being run from
/etc/crontab only, and things started working again.