Unfortunately we're at the point where a ton of behind the scenes work still needs to be done. Processing bounced emails, improving database schemas, analysing logs and cleaning up and improving content attribution.
Our choices today, and for the next few weeks:
1) Make the site go faster
2) Do Really Cool Stuff that will make you go "ooh! aah!"
3) Make things work better for us here trying to run things
Clearly we want to do all 3, but at the moment our priorities have to lean on the side of being sensible. There is some leeway in the order we do (1), though, so I'm going to put a little more work into (3) in order to roll out bits of (2) while ensuring the focus is firmly on (1).
If I was the project lead, then the obvious choice is number 3.
Any changes you can make to improve the development team's ability to develop will have a large impact on the other two options.
Without seeing the actual feature request list, making a choice between doing 1 and 2 after 3 has been completed is a little harder. Personally, the site already has all the features I need and probably meets the requirements of a vast majority of the users. So making the site go faster would seem to be a more important task. The faster the site runs, the more people will be able to use the site, the more ad impressions will be generated, the more money you can make for implementing option 2.
So today's fun was in trying to debug some ASP.NET. It
s fairly simple stuff - I have a project that creates a middle-teir library, and a site in a separate project that uses that library. Everything's under the one solution. I have a reference in my website to the library's assembly and ensure that all the usual stuff like "Local copy" is set to true and have debug on.
But no matter what I do, VS.NET refuses to copy the pdb file over to the /bin directory of the website. I can copy manually but even then the debugger isn't picking up the symbols and so I can't set breakpoints or step through code.
I've tried everything, read every newsgroup, every blog, and every KB articles I could fine. I did it all. The only way out of this is to delete the reference then readd it and it works. Until I recompile the component. ie every 2 minutes.
The othe fun bit is that running the debug build of the application works perfectly. But if I run via the debugger it gets caught in an endless loop. I step through the code and see that it's jumping from one instruction to a completely unrelated function for no reason. Something is seriously busted.
But on a brighter note Paul's logfile analyser is essentially done bar some fit and finishing, meaning we have one of the pieces in place to countback article downloads and include download figures.
Also, Nish's work on our improved syntax colouring component based on Troy Marchand's gem seems to be doing the trick
No matter how good software companies make IDEs, or how safe and easy to program the underlying platform, we're always going to have to deal with device manufacturers and their applications that never, ever work properly.
I've got an Ericsson 610 and it's a little dodgy when it comes to syncing up with Outlook. It works, kinda, but it's never been the most painless process. At the moment I have 2 copies of all my contacts on my phone and I've hit the point where if I fiddle any more it's going to explode.
I've just upgraded (and I say that in the loosest possible manner) to a Nokia 6230. This thing has everything: MMC card, FM radio, MP3 player, video camera, bluetooth. But it will not, for the life of me, connect to my laptop to sync via bluetooth.
I've worked around the dodgy Belkin PCMCIA bluetooth card and arm wrestled it into submission. As long as I stay two feet away from the laptop while it's running, and as long as I don't stare at it directly, it usually won't cause a blue screen.
I've installed the latest Nokia connectivity software. The one that looks like it's flash driven. The one where you click a button for the connectivity dialog and that dialog appears underneath the main window (where'd ya go? I'm gonna get ya! Iiiiiii'm gonna get ya!). But even though I've paired the phone and laptop, and can connect to the phone through XP and see the files on the phone, and everything seems to be fine, the Nokia software mournfully concedes "Cannot use this connection type. Check that all needed hardware, software and drivers are available".
Hardware: laptop in corner sulking. Phone next to me, being painfully cheery but a little useless.
Software: Downloaded and installed all the latest go fast bits for the laptop. The phone continues to be cheery but a little useless.
Drivers: See "software" above.
So yet again, a less than spectacular hardware interface experience. Man oh man...
So today I had 3 SQL servers die on me. Two servers set up in a testing environment in our office, and one backup server that is sitting idle in the hosting facility. And we don't know why they died...
The mystery started when Clinton wanted to formalise our new testing, staging and deployment process which not surprisingly required a test rig. We have one setup but haven't used it for a month due to development being at that stage where we're in between doing patches on the current system, revisiting groundwork on the new system, and cataloguing perf issues on the live system.
So they've been sitting there idle until I tried to resync the test SQL servers with a copy of the latest production DB so we could have real world data with which to test. SQL1 had reported issues a couple of days ago, but nothing too worrying. SQL2 was hale and hearty (AFAIK) so I moved to it first. It was dead. Stone cold don't-even-think-about-trying-to-boot-me dead. Weird. So back to SQL1. A blank, gray cold screen of rigor mortis was all that was to be seen. Dead, too, but in a blank next-world staring kind of way. The bodies have been removed and the authorities informed.
And then the hosting facility. Like some sordid B-grade mystery I logged into the network there to fire up SQL #3 in order to partition out some data access and spread the love. I mean load. It looked fine. It was a walking, talking SQL box but with a few little nervous twitches that I put down to too much sleep (on its part) and not enough caffeine (on my part). I installed, I patched, I created the tables, defined the stored procedures and added the logins. All well so far, but then no sooner do I walk out the metaphorical room then we have another dead body littering the parlour floor. Not even a monogrammed glove or heiroglyph'd card to give a clue as to the perpetrator's identity.
Have you upgraded to three yet? You need to. Really. I'm still trying to find a way to fit a fourth on my desk. I think the phones can go - I have too many anyway and still can't tell which one is ringing.
Now wait until you get to try the combination of 3 or more monitors with Wi-Fi and Remote Desktop to administer those difficult to reach machines. *
* We all have them. It's the machine that you find you need to use at just the most inconvenient moments...usually when you've just settled down at your desk with a cuppa and don't feel like walking to the other side of the office where the ambience just isn't quite the same. Or the laptop that's still in your bag but you forgot to turn off, for that matter.
It's been a while since I posted anything about what's going on behind the scenes so I figured I'd post a quick update. We've been concentrating our efforts mainly on speed and uptime and have been attacking those two issues on multiple fronts.
The most common question I get asked is 'why aren't you running .NET?'. The answer is a simple one: we don't need to. Though I will qualify that with 'yet'. Many parts of CodeProject have been ported, rewritten, refactored and rewritten again in the various versions of .NET and I'm about to start work on a .NET 2.0 framework in order to test some ideas that have been swimming around in my head. However, our venerable ASP codebase works, is a known quantity, is easily maintainable and when something breaks it's very obvious what happened. We still, and probably will for some time have load issues, but rewriting code to be twice as fast is actually more expensive than buying a faster processor. And for us it always will be. Rewriting the code to be faster is, at best, a temporary solution, and sooner or later more hardware will have to be thrown in. The most important thing for us is that the system is scalable. This is what we have been addressing.
On the hardware side we increased our webserver capacity by 50%. This took a load off the other servers which, unfortunately, has quickly been filled by extra demand. I say unfortunately because after a marathon effort to move the entire server farm across town to our hosting facility, we ran out of cabinet room. So a few weeks ago we ordered our own cage and have two racks nearly full of equipment. Web servers are no longer a bottleneck since we can throw in new boxes whenever the piggy bank allows.
Bandwidth was a bottleneck that was solved when we moved to the hosting centre. Or so we thought. We've got hold of a Network Admin at Telus who spent a great deal of time working with us to find any kinks, and we found them in the oddest places. Switches which didn't operate as promised, cards in the hosting centre that had blown but not triggered alarms, a redundant network feed that turned out not to be redundant in the common sense meaning of the term, and gigabit NIC settings that should have auto-set but didn't. We know far more about network administration than any of us (except maybe Dave) ever wanted to know.
In adding more webservers we needed to ensure our load balancing solution was appropriate. We've been using the Network Load Balancing in Windows 2000 which up until now has been working fine. However, this service suffers from a few problems: it doesn't route requests based on server load; it doesn't halt requests to a server if IIS on that server has died; and for session state to be maintained (in ASP) you need to set IP affinity on. IP affinity means that once you hit a server, you are stuck to that server until you start a new session or the server recycles. And speaking of session state when a server running an ASP site goes down or is cycled, all session information is lost. Furthermore, sharing session state between ASP and ASP.NET sections of a site can be a hassle.
So we decided to kill a few birds with one stone and rewrite our session management to use SQL Server instead of IIS. This allows us to:
Turn off IP affinity. This means load balancing will be smoother
Use our firewall's load balancing system that responds to server load. Again, even smoother balancing.
Be more agressive with our automatic server cycling. We cycle servers as soon as memory usage or load is too high, but we tend to be judicious since cycling a server kills the sessions for that server. Taking session management off the server, combined with agile load balancing means we can cycle our servers with extreme prejudice.
Move to ASP.NET gracefully, instead of in one big lump
Load on the SQL cluster is, at the moment, manageable, but there are times when load can max out. Work has been done on further optimising data access and ensuring connection pooling is working as efficiently as possible. Even so, our database backend is the one piece of our puzzle ripe for optimising and will be the focus of our next set of upgrades.
After all the dust settled, load tests showed around a 10-20% improvement in load capacity. The improvements to scalability on both the web and data access sides of things are far more valuable, though, as is the ability to mix and match ASP and ASP.NET pages and components.
I mentioned the 'yet' caveat when talking about us not needing .NET, but that statement applies strictly to the site as it stands now. All future work, improvements and features will be .NET only. A more fine grained article attribution system, more thorough client-based site monitoring systems, offline caching and processing, and features that simply don't make sense in ASP mean the new .NET codebase will be used more and more.