With the server cluster in the office creaking alarmingly, and our fibre connection to the outside world almost warm to the touch we decided that we had to bite the bullet and move the entire mess into a true hosting centre that would provide 24/7 support, backup power, and redundant internet connections.
||Thorough Planning makes a difference|
The first step in the move was to upgrade the hardware in-house and test to ensure the new servers were up to the task. We priced the main players and soon realised that only by selling our kidneys could we ever afford the required hardware. Our other option was to purchase rack mounted cases and build the machines ourselves using quality parts at affordable prices. Sure - it meant some late nights and a bit of driving around, but we're all certified geeks and can build a box blindfold in under 6 minutes. A quick call to our local supplier - the one with the unmarked entrance at the back of a block of units in a distant industrial area - and the gear was on it's way.
That's when the trouble began.
We only received a single case and enough parts to make up 2 boxes if we did without some of the fancy stuff such as monitor support and Ethernet. Not good. We decided to move ahead and build what we had to at least ensure that the rest of the shipment, should it decide to turn up, would be suitable. Leading the charge was Peter, our resident hardware / network / sysadmin / all-round legend, who had the system mostly built in no time but then popped out for an extended lunch and decided, for some extremely bizarre reason, to ask Bianca to finish up. Bianca handles all advertising accounts and marketing, plus pretty much anything else thrown her way. She's amazing. She isn't, however, experienced in building servers. Hands up who can see what's wrong with this picture:
A hint: those are wire cutters she's holding.
So it was decided that those who have the most experience in building boxes should be the ones to, well, build the boxes. There were words. There was some pushing and shoving. There was a disaster.
|This can't be good
||"Yeah, Hi. Ummm...."|
A New Beginning
We tried to get replacement hardware but in the end it was decided that the chances of us getting the servers we originally envisioned in any manageable timeline, within budget, was simply not possible. What then ensued was the biggest call-out we as a group had ever done in our four years or building CodeProject and we came up with Gold.
|Dave in front of some of the boxed iPaqs.
They were piled up everywhere.
A friend of a friend of a... (you know how these things work) had a massive stock of superceded iPaqs. One thousand of the little tikes, to be exact. They'd been purchased by one of the larger Real Estate companies for use by their employees to provide customers with up-to-date information on homes while on the road. The problem is that none of the agents could use the devices, the application they were meant to use was no longer supported (or even worked half the time) and they'd simply given up and shipped them back. They were paid for, no one wanted them, so we ended up with the lot.
And what can an iPaq do by itself? Not a lot. But a thousand 64Mb 206MHz series 3700 iPaqs could really start to make a bit of noise.
CodeProject is alive!
We initially investigated whether or not it was possible for an iPaq to handle even part of the site's load and quickly realised that the units could handle everything that we required a desktop system to handle. We initially considered a reasonably radical move of installing Linux on the iPaq's and using mySQL as the backend, but instead chose to use vxWeb as a base for a webserver, and use any one of a number of plugin VBScript interpreters to quickly put together a lean and mean CE based ASP web server. On the backend it was decided that SQL Server CE was the obvious choice since it meant an easy port.
A single iPaq was not going to be of much use. We needed to cluster! Initially we figured this was probably not going to be possible but a number of solutions ranging from traditional dual-NIC clustering to a more ambitious IR based cluster were found. We had a ton of USB adapter cables so after a bit of slicing and splicing, lots of fiddling about and copious swearing we were on our way.
|- This is never gonna work
- Shuddup and get in the truck
|- OK. Maybe you're right.|
Initial development time was around 3 weeks, physical build time and testing was around 2 months and the entire move from the office to the hosting facility took around 8 hours. I was able to call on the help of Dundas Software who had tons of previous experience in CE based development, and we were also able to port the .NET parts of the site to the .NET Compact Framework with only mild swearing.
CodeProject's new home is the Telus facility on Laird in Toronto. It's a monster. 8 foot thick walls, ram-raid protection, underground fuel tanks with enough fuel to power the entire facility, fully stocked with all lights on for 3 weeks. Biometric security, pressure sensitive floors, multiple connections to major internet backbones and some seriously grumpy security guards.
CodeProject's new home
Yes Dave - they mean you.
Unfortunately we were banned from taking any pictures inside the hosting facility itself (other than the one of Dave and I in the loading dock looking dubiously at the boxes). The one pic of the rows of racks above was taken in the one place there were no security cameras, and at the expense of having Bianca sit through a 20 minute detailed explanation of the fire retardant system by our helpful, but hard to shake, escort.
The units themselves are mounted in custom sliding trays that are attached to the racks using traditional sliding rails. We use dual firewalls capable of handling up to 100MBit each, with automatic failover and restart. All switches are 100MBit for the external network, 1Gbit for the internal traffic. Each unit has it's own screen and can easily be lifted out of it's position in the tray, making a supplementary Keyboard/Video/Mouse unit unnecessary.
The increase in server capacity and network bandwidth saw an almost immediate increase in throughput. We're now able to serve more pages simultaneously with faster load times than ever before.
Scalability is a non-issue as units can either be added, or old units removed and replaced in situ without affecting the rest of the cluster. We've squeezed the thousand units into a single standard rack and will be looking to lease a second rack in the near future to expand again. After the initial teething problems it's been a great experience.
Obviously something this complex isn't all sweetness and light. Some of you will remember issues with had when we had 6 servers. Server number 2, you will remember, was always playing up and was, in the end, declared cursed and unsalvageable. Our current setup essentially multiplies these problems. Currently servers 45, 234, 294, 536 and 785 are all showing signs of early senility, and servers 239, 455 and 901 have been used in baseball batting practice. There is a limit to our patience.
Heat is another issue. We've had to install several large fans inside the rack to ensure that the batteries in the units, while charging, don't overheat. This has also allowed us to solve the other problem we had, namely dusty LCD screens. Coming in each day to clean a thousand dusty, fingerprint'd iPaq screens is simply no fun, but the constant airflow at least minimises the dust buildup.
Cables coming loose has been a surprising entrant in the 'Most Annoying Thing' competition of '04. A close runner up has been the dropping-the-stylus-into-the-lower-rack-cavity. We ended up using the old bankers trick of having a single stylus inside the rack cabinet attached to a very long string.
Overall it's been a wonderful experience. We have more room in the office, you get a faster, more reliable website and we get to say "Happy April Fools Day".