On May 25th the server went down at 03:00 GMT due to a file system error caused by kernel issues.
May 25th 2011
At appx 03:00 GMT the server overloaded causing all services to go down and unfortunately the WHUK data centre monitoring service was not working correctly so the issue was not noticed until a few hours later. The server was rebooted once the issue was noticed and an investigation took place as to the cause.
It was found that the unix kernel was causing an issue so a kernel update was scheduled for between 3am-5am the next morning 26th of May. The kernel update normally only requires a quick reboot and updating of the kernel, less than a few minutes of downtime.
May 26th 2011
At appx 5am GMT the engineers in the WHUK data centre were performing the kernel update to latest most secure version, however unexpectantly upon reboot the server was showing kernel errors and file system would not load.
Multiple attempts were made to rollback to old kernel, then update again, but server was having difficulties, web services were refusing to come online, but after several attempts taking around 1.5 hours, the server was updated and all web services came online with latest kernel.
We have been monitoring the server since 07.45GMT 26th May and today 27th may, server is showing no signs of issues, server load is well within reasonable limits.
Friday, May 27, 2011