People are surprisingly interested in the technical issues at play here.. :o I am more than happy to oblige.
Two things you need to know first:
- We recently moved servers partly to get more disk space (2x250GB HDDs, instead of 1x160GB)
- FreeBSD (and *nix in general) stores web logs and databases on a /var partition , which has its own space allocation
When we went to a new server a couple of weeks ago I asked the tech reinstalling the OS to put the /var partition on the second 250GB HDD. This would give plenty of space for the database, and also balance the load between the two HDDs fairly evenly.
After three hours and much agitation on my part I got through to him, and he said it would be quicker for him to just do the default install with everything on the one HDD.
The default install gives /var 8GB.. More than enough for a desktop OS or a database-light server, but for our system that's almost at capacity. (Someone mentioned the map cache etc, and although those take up a lot of space too they are on another partition)
I was planning to set up the second disk this weekend, but unfortunately our space use has been growing faster than I had thought.
The next safeguard is a script we have running which warns when any partitions are nearing capacity (as well as several other potential issues). This too was effectively disabled, because it uses a certain PHP function disk_free_space() not present on the new installation, and not yet reimplemented. This meant the disk space could go over, and the server went down.
Even then I would normally have got an e-mail within 30 minutes of when it ran out space and went down, and I got a dozen, but peak webDiplomacy time is US midday-time, which is West Australian sleep-time. The data comes in fastest when I'm zonked.
Even then the script should have started making my computer beep to wake me up, but I haven't fully tested that functionality yet. (I shelved that when the server started playing up and I had to focus on moving the server)
Even then when I woke up and restored the server the game processing script should have recognized that nothing had been processed for the last 6 hours, and that nothing should be processed until games were given more time. (I'm still not sure why this failed, I'll be investigating that this weekend)
Hopefully that more than satisfies those looking for the details. :P
So this was a major screwup, but hopefully you can see how it mostly relates to a recent server move and not having everything yet completely hooked back up. Thanks again for your patience. This weekend I'll be taking care of these issues and also deploying some changes Alderian has been working on.