loads and monitors
Forgot to report this earlier: The load on vanadium went through the roof last thursday (the 29th)... It slowed to a crawl for the better part of two hours, when I got in and called rackspace for a reboot.
Obviously one problem here is that there's nobody watching the servers 24x7... Rackspace offers this service, but I don't exactly have a budget to enable it on all machines. (One idea I had was to set up a "premium" machine where people paid a little extra to chip in for the monitoring... I'm open to suggestions.)
Meanwhile, as a preventative measure, I've added a monitor to all machines that kills CGI or cron jobs older than 30 seconds. This should keep runaway scripts and deadlocked processes from piling up and bogging down a server. (It's been running since Thursday and seems to be doing it's job.)