PSX Excessive lately suffered from a slightly extreme server failure, which took our web site offline between December thirtieth, and January sixth. This was the longest unplanned outage in our 24 12 months historical past.
So, what occurred?
To attempt to simplify a slightly lengthy story, the database server that powers PSX Excessive malfunctioned. Whereas we had tried to restore the database server quite a few occasions, our makes an attempt at repairing have been unsuccessful. In truth, we truly ended up making issues worse. The database that powers our web site, turned irreversibly corrupted.
Our solely actual resolution at this level, was to fully wipe our server clear, and reinstall the whole lot from the bottom up. On paper, this could have been a straightforward factor to do. Reinstall the working system, reconfigure our management panel. Simple. Time consuming, little question. However most undoubtedly a straightforward activity.
Besides, completely nothing had gone appropriately.
Downloading The Backups
PSX Excessive has 4 main and fully completely different backup strategies. Every methodology is meant for use for a unique sort of {hardware} or software program failure. As an illustration, we again our database, posts, pages, and first directories, as much as the cloud as soon as each 24 hours. This methodology of backup is nice for when we have to rapidly revert again a day or two. The downsides? It doesn’t again your entire web site and all directories up, however slightly, it’ll solely again up what’s required to maintain the core of our web site operational. In different phrases, absolutely the fundamentals.
We additionally create a full backup of our complete web site, and all directories inside our predominant internet folder. This methodology of backup is a precise reproduction of our web site because it appeared on the date the backup was created. Sadly, we solely run this clone-based backup methodology as soon as each seven days, which for a high-content web site like PSX Excessive, shouldn’t be probably the most ideally suited of options. Nonetheless, it’s a fallback that’s practically assured to work.
That can be the backup methodology we opted to make use of.
The precise act of downloading the backups from a server, and storing them on our native drive, took roughly 36 hours. PSX Excessive is a big web site, and accommodates over 400GB of whole information.
Simple, however time consuming.
Restoring The Backups
Sadly, that is the place issues began to take a flip for the more severe. Whereas the act of downloading the precise backup recordsdata wasn’t overly difficult, simply time consuming. The identical can’t be stated for the restoration course of.
We needed to add the compressed backup recordsdata to the server, after which run a restore command. Sadly, each single time that we tried to try this, the restore course of failed. We tried to do that a number of occasions, losing roughly three days. Every time, the backup would get to about 95%, after which dangle for a number of hours, earlier than in the end failing. Since we needed to restore a slightly giant file, having the restore course of dangle was regular and anticipated. Having it crash? Not as regular or anticipated.
As soon as we acquired the positioning restored, we tried to revive certainly one of our cloud backups, to get as near our earlier live-site as we might. Sadly, restoring the cloud backup ended up corrupting our database, requiring that we wipe the database and reinstall the unique backup once more. Every time we needed to do a brand new restore, we must sit and babysit the restoration course of for a whopping 4 hours.
So now, an extra eight hours have been wasted on simply attempting to revive a working backup. However lastly, it was accomplished. Issues have been not crashing. All was good on this planet!
And Now We’re Right here
PSX Excessive is again on-line. Issues should not totally secure fairly but, however on the very least, we’re purposeful. We are able to as soon as once more contribute content material to our web site, and all core performance is sweet to go.
And but, issues are nonetheless reasonably unstable. We’re sluggish, and have a number of visible bugs and glitches which have but to be mounted, as of this writing. However at the very least we’re again on-line, proper?
I want to thank everybody in your endurance. Restoring PSX Excessive was no straightforward activity, even when it was presupposed to be a straightforward activity on paper.
Preventative Measures
To attempt to make sure that this by no means occurs once more, we now have carried out a brand new caching methodology into our web site, which ought to velocity issues up slightly considerably. Past that, we’re additionally going to be creating full cloned copies of your entire public listing each 24 hours, to roughly match our cloud-based backup providers.
We’re additionally going to rely rather a lot much less on distant cloud backups, contemplating as how these haven’t, thus-far, been of any actual worth. This was presupposed to be our most safe, and most dependable, methodology of backup and restoration. However as an alternative, it turned the least dependable of the bunch.
We will even be wanting into the potential of internet hosting our web site on a unique internet hosting community. Proper now, we run our personal servers, and roughly present and do the whole lot ourselves. That is high-quality for when it really works, however as we simply found, is an actual ache within the ass for when issues hit the proverbial fan.
All in all, we’re again on-line. Hopefully for good this time round.
Associated