Off-site status information for cornerhost.com.

8/04/2007

mock data tutorial for cornerops/duckbill

And... Another tutorial, showing how to set up mock data for use with cornerops. Includes screenshots of the admin control panel and the billing database I use when I manage your account.

8/03/2007

how to set up the cornerhost source code

There's now a walkthrough for setting up the cornerhost source code. If you're looking to participate, this is the place to start. (I tested it both on my OS X box and a cornerhost shell account, but please let me know if there's anything confusing/missing/etc.)

7/31/2007

cornerops trac and subversion access

You can now download the code for the cornerhost billing system and control panel!

And... Thanks to the work of Chris Braybrook, we have a nice shiny Trac installation for browsing the source, tracking bugs, and posting documentation:

https://secure.sabren.com/trac/cornerops/

(Granted, the documentation part is a bit lacking at the moment, so this is mostly for the adventurous and the early adopters.)

More to come: I've held off posting vengeance and the other admin tools until I get the code cleaned up.

Anyway, check it out!

By the way, the Trac install is read-only without a password, but it's open to everybody: if you want access, just drop me an email.

7/30/2007

Incident Report for 2007-07-30

What went wrong?
7 Hour MySQL Outage

What was the fix?
I manually killed every MySQL process and restarted the service.
MySQL came back up immediately.

What initially caused the problem?
Unclear. MySQL has locked up like this a few times in the past.
The data center rebooted Vanadium 30 mins prior to first report of the problem.
I don't know whether that was related.

Why did it take so long to fix?
I was asleep. I was up all night working on the new cornerhost subversion repository.
I went to bed late this morning, and woke up around 4pm to discover the problem.

Shouldn't Vengeance have Solved this Automatically?
Yes. Vengeance is supposed to restart mysql if it goes down.
It tried repeatedly to issue the command: '/etc/init.d/mysqld restart' but this command failed.

Shouldn't someone or something have woken me up?
Yes. At the very least, I should have gotten an SMS note on my cell phone.

Could the Data Center have Solved this?
Ideally. They monitor quite a few services for me, but MySQL doesn't appear to be
among the services they monitor. I just opened a ticket with them to see if they can
provide a solution.

Could anyone else have solved this?
Yes, there are one or two people who have emergency access to kill services,
but they are not trained system administrators.

Could the initial problem have been prevented?
I don't know. I don't know how to duplicate the problem.

What steps will be taken to prevent this in the future?
1. Today: I have opened a ticket inquiring about MySQL monitoring from the data center.
2. This Week: I will install Nagios to add a second layer of monitoring, and SMS/IM me if a problem occurs.
3. This Week: I will fix vengeance to allow escalating approaches to repairing services.

Blog Archive