A week of power-supply problems. Not Eskom's fault, this time, but more localised failures.
First the power-supply for the network server had a fan stop turning. I could have taken the chance on the unit working without cooling, since it is relatively lightly loaded -- no graphics cards, only a single disk -- but, since I had a spare power-supply unit handy it was a task of mere minutes to swap the faulty unit out and get the server back into action. It is a fairly key piece of our little home network, being a web-cache, local domain-name server and cache, Subversion repository and file-share space, so we miss it badly when it is down.
Then the power supply on my desktop machine decided to follow suit. Also a fan failure. I hate those crappy little fans! There's absolutely nothing wrong with the basic electronics of the power supply itself, but the ball bearings in the fan have died. Pricing for a new power-supply runs from a little over R100 if I were in Cape Town with easy access to wholesalers, through R200 from a web-shop, all the way to R300 from the local PC shops! This is for the most basic 350W PSU -- none of that fancy gaming-machine stuff for me. (Though I will confess to being tempted by a unit costing around R800, simply because it is alleged to be completely quiet! I'm a self-confessed anti-noise-maniac.)
My guess is I'm going to spend an hour messing about with the soldering iron, installing new fans (I have a couple just lying about) in the "faulty" power-supplies.
At the same time, several warnings from my server-supplier in London telling the story of a week-long tail-of-woe about power-supply into the datacentre. Apparently a failover switch failed to work correctly during a power-outage last Sunday, causing the battery-based UPS to take the entire load for about 10 minutes before the batteries were totally drained. All servers in the DC went down hard. It has taken them until Thursday to isolate the problem and replace the parts (electrical and mechanical) that were at fault.
During the whole affair, all server owners have been kept fully informed via RSS feeds and emails at every step of the way, since there is a risk (however slight) that servers might go down if there is a power-grid outage again and the on-site staff -- now fully briefed on managing a manual switch from grid power to the backup generator -- should get taken-up at just the wrong moment.
This is exactly the sort of thing I expect from server providers and datacentre operators. Everybody understand that, despite the best-laid plans, sometimes shit happens. It is how they respond, and how transparent and communicative they are in responding to the crisis that truly matters.
This is in very sharp contrast to Verizon's datacentre in Durban, where my other client's servers are housed. About 10 days ago they had some electrical work going on in the DC, which in turn made some server-moves necessary. They did all this without warning their clients that there might be some risk to their operations. Needless to say, my client's servers went down without warning in the wee hours of Sunday morning. No heartbeat monitoring in place, so it was Monday before anybody knew that something was wrong. No peep from Verizon to their customers. Half-arsed, I call it.
There's a lesson in all this about Single Points of Failure. I've been warning for over 8 months that having all the servers housed in a single DC, or even in a single city, is a risk. Maybe now the business will take some action, but, given the general lack of respect or attention to the fact that, like it or not, they are a technology business, I have my doubts.
No comments:
Post a Comment