one mikro2nd: infrastructure

Showing posts with label infrastructure. Show all posts

07 December 2010

Amazon Route 53: Trustworthy?

As a "user" of Amazon's Web Services (I've kicked the tyres on S3, but not much more than that) I received an email from Amazon punting their new DNS service, dubbed "Amazon Route 53".

I wonder though, in the wake of their termination of WikiLeaks, whether I would trust any part of my DNS infrastructure to Amazon. Suppose I did something to piss off the US governement - hosted a DNS entry for WikiLeaks, perhaps? at, say, http://wikileaks.mikro2nd.net/ - and some US government official notices (pretty unlikely, I know, but...) and whispers into Amazon's ear, would I, too, lose use of this critical infrastructure without review, recourse or refund?

So, no! I don't think I'll be using Amazon Route 53 much...

11 September 2008

Infrastructure Fix

Finally! Took a day off from preparing for the (Advanced Architecture and Design) course I'm hosting next week, and spent the time getting my local infrastructure back on its feet after a breakdown some weeks ago. Basically the LAN server died... motherboard or CPU. Suddenly we realised just how much we've come to depend on that little machine! DNS caching, plus names for all the machine on our farm-LAN¹, HTTP proxy, local Subversion server, file-share, music streaming, and not to mention supplying the base-infrastructure for some Jini services that come and go as we play around with Jini and JavaSpaces.

So now that we've beefed-up the CPU and disk a bit, it was "just" a case of getting all the bits of server software installed and cleanly configured. Nothing difficult, but it all takes time! Especially when you're a bit.... fastidious?... about getting the config "right", as opposed to "just poking things until they work". I truly despise "voodoo config" where people don't really make the effort to understand the impact and effects of what they're doing, and fuck it up royally as a result. "Oh, well, I'll install mod_disk_cache and mod_mem_cache... after all, two caches must be better than one, mustn't they?" Ding! You Do Not Win The Lounge Suite!

Well, its been fun! And finally I can forget all about it again until the next hardware failure. The only thing I really would like to achieve is to get the server quiet², and get its power-consumption way, way down! And maybe have a few more tiny-little servers...

[1] The term "server farm" takes on a whole new significance ;-)

[2] Yeah, I'm a complete arsehole about noise. I really, really, really hate noise...

03 July 2008

Power to the Purple

A week of power-supply problems. Not Eskom's fault, this time, but more localised failures.

First the power-supply for the network server had a fan stop turning. I could have taken the chance on the unit working without cooling, since it is relatively lightly loaded -- no graphics cards, only a single disk -- but, since I had a spare power-supply unit handy it was a task of mere minutes to swap the faulty unit out and get the server back into action. It is a fairly key piece of our little home network, being a web-cache, local domain-name server and cache, Subversion repository and file-share space, so we miss it badly when it is down.

Then the power supply on my desktop machine decided to follow suit. Also a fan failure. I hate those crappy little fans! There's absolutely nothing wrong with the basic electronics of the power supply itself, but the ball bearings in the fan have died. Pricing for a new power-supply runs from a little over R100 if I were in Cape Town with easy access to wholesalers, through R200 from a web-shop, all the way to R300 from the local PC shops! This is for the most basic 350W PSU -- none of that fancy gaming-machine stuff for me. (Though I will confess to being tempted by a unit costing around R800, simply because it is alleged to be completely quiet! I'm a self-confessed anti-noise-maniac.)

My guess is I'm going to spend an hour messing about with the soldering iron, installing new fans (I have a couple just lying about) in the "faulty" power-supplies.

At the same time, several warnings from my server-supplier in London telling the story of a week-long tail-of-woe about power-supply into the datacentre. Apparently a failover switch failed to work correctly during a power-outage last Sunday, causing the battery-based UPS to take the entire load for about 10 minutes before the batteries were totally drained. All servers in the DC went down hard. It has taken them until Thursday to isolate the problem and replace the parts (electrical and mechanical) that were at fault.

During the whole affair, all server owners have been kept fully informed via RSS feeds and emails at every step of the way, since there is a risk (however slight) that servers might go down if there is a power-grid outage again and the on-site staff -- now fully briefed on managing a manual switch from grid power to the backup generator -- should get taken-up at just the wrong moment.

This is exactly the sort of thing I expect from server providers and datacentre operators. Everybody understand that, despite the best-laid plans, sometimes shit happens. It is how they respond, and how transparent and communicative they are in responding to the crisis that truly matters.

This is in very sharp contrast to Verizon's datacentre in Durban, where my other client's servers are housed. About 10 days ago they had some electrical work going on in the DC, which in turn made some server-moves necessary. They did all this without warning their clients that there might be some risk to their operations. Needless to say, my client's servers went down without warning in the wee hours of Sunday morning. No heartbeat monitoring in place, so it was Monday before anybody knew that something was wrong. No peep from Verizon to their customers. Half-arsed, I call it.

There's a lesson in all this about Single Points of Failure. I've been warning for over 8 months that having all the servers housed in a single DC, or even in a single city, is a risk. Maybe now the business will take some action, but, given the general lack of respect or attention to the fact that, like it or not, they are a technology business, I have my doubts.

15 December 2007

Quartz Crystal

A very trying couple of days... Faced with a job that cries out for a decent scheduler (polling feeds), I turned to OpenSymphony's Quartz. I mean, the ads look so good: Robustness, recoverability, scalability, blah, blah.

First hint of warning I should have paid attention to was a couple of developers' names that I long associate with Doomed Pieces of Shit. But it all still looked so good. Until I got closer to the code. Quartz? Quartz Crystals for accuracy? More like Crystal Meth! Documented methods that mysterious fail to exist. Examples that aren't. I thought the JavaDoc got generated from the source, no? I guess we have here the penetrating stench of Configuration Mismanagement.

Then you enter a twisty little maze of undocumented dependencies. You will use Commons Logging. You will use a bunch of J2EE stuff, even though you application is a simple standalone with no hint of J2EE awfulness in sight.

No. After a day or so of hacking at this steaming turdpile my brain feels like so much oatmeal porridge that I can't even work even work up enough bile for a decently vitriolic blog post. For me, one of the surest signs of a dying opensource project is when their wikis and forums are filled with spam because nobody can be bothered to disallow Guest users from posting; when the version-control system shows six checkins in the past six weeks.

I'm outa here in favour of Doug Lea's concurrency stuff. What a pleasure by contrast. I'll live without clustering for now...