Getting a server back online

Last week our host had a power outage and the backup power didn’t kick in or something. In any case, the end result was a reboot of all our servers, which went surprisingly well, except for one.

This box could not have been restarted since long before I came to the company roughly a year ago and it needed to be accessed through the emergency console. The emergency console is an extra network card or something like that that you login to the machine through if all else fails. If you need to use it you know it’s bad.

As can be inferred, networking didn’t work at all. A colleague of mine discovered that there was a line missing in /etc/networking/interfaces, a gateway was not properly assigned in the eth0 part. After fixing that and restarting networking at least the server was accessible through the normal networking and I could SSH and FTP in and backup everything.

Now the problem was name resolution, the machine continued to be inaccessible through its domain name. We still don’t know exactly what the problem was here. Simply restarting BIND didn’t work, however after making minor changes to the db.domain.com cache file, changes that themselves shouldn’t matter much, and restarting bind the problem was solved.

And then apache2 refused to start, or output any kind of error message as to why the refusal for that matter, a double whammy. After checking the apache configuration my colleague noticed that some editor (probably nano) had automatically backed up one of the virtual host files as *.save. Apache was not ignoring this file which resulted in a double record and silent death. After we deleted it everything finally worked.

Rebooting can be a mess.

Related Posts

Tags: , , , ,