Did the last things that needed to be done before the old servers could be shut down. Currently experiencing some domain resolution issues, which is hopefully related to propagation.
Spent time today tracking down an issue on the VPS where Apache would become unresponsive, and time out. Everything else seemed to be working.
Thought it was a property of being on the VPS, so was looking for exotic things like TCP connection limits. It was simply MaxChildren being set at a sane default of 10, which was not enough for the traffic we were getting at the time (possibly exacerbated by everyone clicking refresh when the site failed to load). Upping this from 10 to 100 allowed all the sites to load, spiking up to approximately 60 worker processes, but dropping back within a relatively short period of time.
Set up Nagios monitoring for the number of Apache2 processes running. If this exceeds 70 processes, I get a warning, if it exceeds 90, a critical.
Archives



