Click here to Skip to main content
15,921,174 members

Welcome to the Lounge

   

For discussing anything related to a software developer's life but is not for programming questions. Got a programming question?

The Lounge is rated Safe For Work. If you're about to post something inappropriate for a shared office environment, then don't post it. No ads, no abuse, and no programming questions. Trolling, (political, climate, religious or whatever) will result in your account being removed.

 
GeneralRe: Worst deploy story? Pin
Chris Maunder2-Dec-14 3:20
cofounderChris Maunder2-Dec-14 3:20 
GeneralRe: Worst deploy story? Pin
agolddog3-Dec-14 5:26
agolddog3-Dec-14 5:26 
GeneralRe: Worst deploy story? Pin
Marc Clifton3-Dec-14 5:41
mvaMarc Clifton3-Dec-14 5:41 
AnswerRe: Worst deploy story? Pin
Tim Carmichael2-Dec-14 3:19
Tim Carmichael2-Dec-14 3:19 
AnswerRe: Worst deploy story? Pin
Ron Anders2-Dec-14 3:24
Ron Anders2-Dec-14 3:24 
GeneralRe: Worst deploy story? Pin
den2k882-Dec-14 3:35
professionalden2k882-Dec-14 3:35 
GeneralRe: Worst deploy story? Pin
PIEBALDconsult2-Dec-14 4:48
mvePIEBALDconsult2-Dec-14 4:48 
AnswerRe: Worst deploy story? Pin
Caeraerie3-Dec-14 3:45
Caeraerie3-Dec-14 3:45 
I've two that vie for worst ever:

First one was a server system upgrade. We were dropping in a new SAN, changing from the old switch to a iSCSI fabric and doing a major OS upgrade. All in the same night.

Why are you groaning?

My part was the (relatively) easy one; I had to make sure the databases were backed up and ready for shut down, wait for the rest of the team to do the physical changes in the colo facility, and then initiate the upgrade/rollout to 800 thin client machines throughout the building. The first part went just fine. The server techs pulled the plug on the switch...

And that is when the magic happened.

They hadn't actually bothered to shut down any of the servers before yanking the switch. We were in a 100% virtualized environment using vSphere and suddenly every server was basically disconnected simultaneously from the control systems. Our CPU on the servers red lined as the poor little VM's tried to figure out what had happened and fix it. Meanwhile, they're trying to dump stack traces and error logs to the database, to disk, to anywhere they can find a place... and nothing is responding. Of course, I have no idea this is happening, and the server techs are too busy trying to map a switch panel and replicate it in the fabric (I am not joking, they hadn't bothered to map this ahead of time) to notice.

Five hours later, the new fabric is in place and the team reconnects the servers. My monitoring systems are the first things back online and they suddenly flood with the best part of 400 GB of error data and stack dumps. It was like getting hit with the biggest DDoS I've ever seen. Everything was screaming for attention, the fabric was actually misconfigured (although we didn't know at the time) and was sending packets in a round-robin loop to non-existent addresses, and the databases, hit with error log write queries, slowed to a crawl.

It took 3 hours to sort it out. Reboot everything. Flush error logs to disk. Kill db threads. You name it. And all of this masked the fabric errors. So when our lead server tech left the colo, he headed straight for vacation. Two hours later, he was somewhere in the air with his cell phone off... and we found the misconfiguration issue when people started arriving for work.

The other one started as a relatively benign little program that ran questionnaires for our call center. Basically, it was a "ask these questions and route through the top 50 most common issues" type of program. Neat little thing. Anyway, we were asked to add a half-dozen additional scenarios to it, which we did. Tested. Retested. Deployed to the test bank. All good.

Our call center was 24-hour, but was split into three "banks," and usually only one would be in operation at a time. So, we planned the upgrade to banks 1 and 2 for 1830, and bank 3 at 0600 the next morning. Because of the nature of the business, all of these upgrades were getting pushed to the machines via central deployment script. We set it up. Sent it to the admins. They confirmed the script was ready. Everything good so far.

I asked my team to stay put for the 1830 rollout, because I've had deployments go pear-shaped in the past, but save for one brand-new (less than one year experience) developer, all of them had "other plans." I had no authority to force them to stay, so I watched them pack up and leave at 1700, leaving me and Newbie to watch the rollout. At 1805 I got a call from the call center supervisor. The system was down.

I logged on remotely to look at the problem. And then called the admin team. The dumb cluck that had set up the push schedule had put in 1800 and all three banks... And the version of the program that was sent for installation was the wrong one. The developer that had packaged it used the one on her local machine, NOT the one in source control. It was a debug version, and at least a week out of date. So, every system in the call center was locked out.

Newbie scrambled to repair the script with the source control version; I had to call the VP of operations to get permission to do an emergency deployment (yes, really), and we managed to rollout the new version with only an hour of downtime. The team meeting that next morning was not a happy one...

This one makes the list because when it happened, we had the big boss (company president) touring the call center. He was actually watching over the shoulder of one of the call center personnel when the system went down. Made for a whole new level of oops.
There are lies, there are damned lies and then there are statistics. But if you want to beat them all, you'll need to start running some aggregate queries...

AnswerRe: Worst deploy story? Pin
clearbrian13-Dec-14 4:53
clearbrian13-Dec-14 4:53 
AnswerRe: Worst deploy story? Pin
rnbergren3-Dec-14 5:54
rnbergren3-Dec-14 5:54 
AnswerRe: Worst deploy story? Pin
wizardzz3-Dec-14 6:30
wizardzz3-Dec-14 6:30 
AnswerRe: Worst deploy story? Pin
patbob3-Dec-14 8:15
patbob3-Dec-14 8:15 
AnswerRe: Worst deploy story? Pin
StatementTerminator3-Dec-14 12:10
StatementTerminator3-Dec-14 12:10 
AnswerRe: Worst deploy story? Pin
BrainiacV4-Dec-14 4:05
BrainiacV4-Dec-14 4:05 
AnswerRe: Worst deploy story? Pin
Ron Anders4-Dec-14 11:11
Ron Anders4-Dec-14 11:11 
GeneralCPU liquid cooling experiences Pin
Richard Andrew x641-Dec-14 16:20
professionalRichard Andrew x641-Dec-14 16:20 
GeneralRe: CPU liquid cooling experiences Pin
Chris Maunder1-Dec-14 16:33
cofounderChris Maunder1-Dec-14 16:33 
GeneralRe: CPU liquid cooling experiences Pin
_Maxxx_1-Dec-14 19:12
professional_Maxxx_1-Dec-14 19:12 
GeneralRe: CPU liquid cooling experiences Pin
PIEBALDconsult1-Dec-14 18:03
mvePIEBALDconsult1-Dec-14 18:03 
GeneralRe: CPU liquid cooling experiences Pin
Nareesh11-Dec-14 18:29
Nareesh11-Dec-14 18:29 
GeneralRe: CPU liquid cooling experiences Pin
Peter_in_27801-Dec-14 19:08
professionalPeter_in_27801-Dec-14 19:08 
GeneralRe: CPU liquid cooling experiences Pin
_Maxxx_1-Dec-14 19:14
professional_Maxxx_1-Dec-14 19:14 
GeneralRe: CPU liquid cooling experiences Pin
Nareesh11-Dec-14 19:19
Nareesh11-Dec-14 19:19 
GeneralRe: CPU liquid cooling experiences Pin
Michael Martin1-Dec-14 20:37
professionalMichael Martin1-Dec-14 20:37 
GeneralRe: CPU liquid cooling experiences Pin
Agent__0071-Dec-14 20:43
professionalAgent__0071-Dec-14 20:43 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.