|
"Oh, you guys were using metric units?"
|
|
|
|
|
agolddog wrote: "Oh, you guys were using metric units?"
Yup. I remember that one.
Marc
|
|
|
|
|
In the last '80s, I was working for a company that was running two production systems on PDP 11/73's in a manufacturing environment. We had maxed out the memory and had code that would work in test, but not in production because of the memory issues.
The first system was product routing in the plant; the second was a storage system for receipt/delivery from/to the first system.
A project was created to rewrite the first system to run on a MicroVAX; similar technology, language, etc. The rewrite took a year with a team of people.
When implemented (and there was no go back, only go forward), it was discovered the code had not been complied with array bound check. It wasn't done in the old version because that would use to much memory and overlooked in the new system. Production ground to a halt for January and most of February. Changes had to be described on a paper form and signed off by on site support before being implemented.
The next year, the second system was rewritten with lessons learned. On implementation, around the clock support. Management asked when it was going to be installed; we said it was installed - two weeks ago. Much better.
I am a firm proponent to a post mortem on all projects; see what worked, what didn't and learn from it. No finger pointing, just learning.
Tim
|
|
|
|
|
Great Thread.
I love ghost stories.
|
|
|
|
|
But I dread failed Ghost stories
|
|
|
|
|
Oh, and not a deployment story, but...
There was a system I wrote that involved communication with a third-party product via a socket connection. I had a table in the database to contain all the messages to and from the socket. It quickly became rather large so I decided to trim it down a bit, something along the lines of DELETE FROM messages WHERE timestamp<amonthago in SSMS, and I sat there wondering when it would finish. Then the phone rang, it's the President of the Company, "the system is unresponsive, the call center is at a stand-still, are you doing anything that might be causing trouble?" Oops. Try terminating the DELETE, no go. Shut down my PC, reboot, get back into database, see that everything is working again.
I then wrote a feature that would delete a thousand messages then sleep and repeat until there were no more messages to delete. It took several days for the process to get the old messages cleaned out.
|
|
|
|
|
I've two that vie for worst ever:
First one was a server system upgrade. We were dropping in a new SAN, changing from the old switch to a iSCSI fabric and doing a major OS upgrade. All in the same night.
Why are you groaning?
My part was the (relatively) easy one; I had to make sure the databases were backed up and ready for shut down, wait for the rest of the team to do the physical changes in the colo facility, and then initiate the upgrade/rollout to 800 thin client machines throughout the building. The first part went just fine. The server techs pulled the plug on the switch...
And that is when the magic happened.
They hadn't actually bothered to shut down any of the servers before yanking the switch. We were in a 100% virtualized environment using vSphere and suddenly every server was basically disconnected simultaneously from the control systems. Our CPU on the servers red lined as the poor little VM's tried to figure out what had happened and fix it. Meanwhile, they're trying to dump stack traces and error logs to the database, to disk, to anywhere they can find a place... and nothing is responding. Of course, I have no idea this is happening, and the server techs are too busy trying to map a switch panel and replicate it in the fabric (I am not joking, they hadn't bothered to map this ahead of time) to notice.
Five hours later, the new fabric is in place and the team reconnects the servers. My monitoring systems are the first things back online and they suddenly flood with the best part of 400 GB of error data and stack dumps. It was like getting hit with the biggest DDoS I've ever seen. Everything was screaming for attention, the fabric was actually misconfigured (although we didn't know at the time) and was sending packets in a round-robin loop to non-existent addresses, and the databases, hit with error log write queries, slowed to a crawl.
It took 3 hours to sort it out. Reboot everything. Flush error logs to disk. Kill db threads. You name it. And all of this masked the fabric errors. So when our lead server tech left the colo, he headed straight for vacation. Two hours later, he was somewhere in the air with his cell phone off... and we found the misconfiguration issue when people started arriving for work.
The other one started as a relatively benign little program that ran questionnaires for our call center. Basically, it was a "ask these questions and route through the top 50 most common issues" type of program. Neat little thing. Anyway, we were asked to add a half-dozen additional scenarios to it, which we did. Tested. Retested. Deployed to the test bank. All good.
Our call center was 24-hour, but was split into three "banks," and usually only one would be in operation at a time. So, we planned the upgrade to banks 1 and 2 for 1830, and bank 3 at 0600 the next morning. Because of the nature of the business, all of these upgrades were getting pushed to the machines via central deployment script. We set it up. Sent it to the admins. They confirmed the script was ready. Everything good so far.
I asked my team to stay put for the 1830 rollout, because I've had deployments go pear-shaped in the past, but save for one brand-new (less than one year experience) developer, all of them had "other plans." I had no authority to force them to stay, so I watched them pack up and leave at 1700, leaving me and Newbie to watch the rollout. At 1805 I got a call from the call center supervisor. The system was down.
I logged on remotely to look at the problem. And then called the admin team. The dumb cluck that had set up the push schedule had put in 1800 and all three banks... And the version of the program that was sent for installation was the wrong one. The developer that had packaged it used the one on her local machine, NOT the one in source control. It was a debug version, and at least a week out of date. So, every system in the call center was locked out.
Newbie scrambled to repair the script with the source control version; I had to call the VP of operations to get permission to do an emergency deployment (yes, really), and we managed to rollout the new version with only an hour of downtime. The team meeting that next morning was not a happy one...
This one makes the list because when it happened, we had the big boss (company president) touring the call center. He was actually watching over the shoulder of one of the call center personnel when the system went down. Made for a whole new level of oops.
There are lies, there are damned lies and then there are statistics. But if you want to beat them all, you'll need to start running some aggregate queries...
|
|
|
|
|
My friend was on work experience from college with one of ireland's largest bank about a decade ago.
She was on night shift and was tired and deploying something and mistyped and accidentally shut down every ATM in the country.
Had to wait ages for them all to come back online again.
|
|
|
|
|
wasn't me. Thank goodness.
But at an organization I used to work at. A co-worker upgraded a SAN storage system. on a Tuesday. AFternoon,
Everything crashed.
Had he verified the backups nope, No backup. Everyone lost 3 months worth of work.
two weeks later SAN was finally up.
Not sure whatever became of co-worker. Never saw him again.
To err is human to really mess up you need a computer
|
|
|
|
|
I do a lot our deployments. I have a rule. Never trust anything built after 5pm, or after 3pm on Fridays.
|
|
|
|
|
Got a phone call in the middle of the night once -- system crashed, and since my tools had been used to convert it to the new OS, my name was on the on-call list. Unbeknownst to me, I was in the middle of an appendicitis attack in my sleep. I can't really describe how awful I was feeling while trying to help them with a problem that I had no clue about (turned out to be buggy filesystem code).
We can program with only 1's, but if all you've got are zeros, you've got nothing.
|
|
|
|
|
A few years back a company I worked for decided to develop an in-house system to handle debit card transactions directly with MasterCard instead of going through a middle-man.
Luckily, I wasn't on that team, but I worked in the same room and worked with those programmers on other projects, so I had a really good view of the whole thing, from development to deployment.
It turned out that certain key information about how data needed to be passed back-and-forth was never relayed to the programmers, who ended up coding to some slightly fishy specs.
It went live, and all debit card transactions started going through the new system. At that point there was no going back. And it blew the $%#@ up! Suddenly people all over the country started having their cards declined, and these cards were being used for prescription medications! That's a level of pissed-off you don't even want to know about.
Our clients started getting massive amounts of angry calls from their customers, and they tore into us. Those poor clients were caught in the middle between their irate customers and us, the cause of the problem, with nothing they could do except take angry phone calls all day and wait for us to fix it.
Once the programmers figured out what was wrong they were able to fix the system, but it took months to clean up the data mess and get everything sorted out. A lot of clients dropped us over it, it was nasty, by far the worst SNAFU I've ever seen. And I still thank my lucky stars that I wasn't on that team!
|
|
|
|
|
Gather 'round kiddies while I spin the tale of the deployment from hell that I had nothing to do with, but be an amused, detached, viewer.
The company I worked for published tax law books. They also published tax preparation software. Initially they had a mainframe version that was their big cash cow, but the advent of PC's had cut terribly into that business and so they started creating versions for the PC. They did versions for the US, Canada, and Australia.
The time, early 90's. The Australian tax season is in full swing. At a company that usually sacrifices a goat on the altar of schedules, daily, the deployment of the new version is late, but they get it out the door.
But there is one, teensey, weensey, problem. All the code was developed on 386's, when most of Australia's accountants had 286 based machines.
In their rush to get the product out the door, they never bothered to test it on a 286 based machine. If they had, they would have found performance was glacier.
The collective scream from Australian accountants was heard all the way to the corporate headquarters in the US.
The new VP of Technology was seen walking through the US offices asking which programmers were on staff that had 1) a valid passport, and 2) their vaccinations up to date. Anyone answering in the affirmative was on a plane the next day.
The ultimate solution ended up that the company gave the accountants that screamed the loudest, new 386 machines to do their work on. The agreement was that they could keep their shiny new machines, provided they promised to continue buying the future releases of the tax software.
Needless to say, there weren't any profits from that division that year and more than a few heads rolled.
Psychosis at 10
Film at 11
Those who do not remember the past, are doomed to repeat it.
Those who do not remember the past, cannot build upon it.
|
|
|
|
|
Ok you code monkeys.
Monday's CC batch didn't show up in our account ($100.00)
But all batches for Tuesday on have shown up without incident.
A call to our bank (1st Bank) revealed that Monday a new CC processor was switched to and there was a problem so all deposits for Monday won't land in accounts until Friday (Tomorrow). Needlness to say, $100.00 isn't a lot to go without but the lady at the bank was thankful to us that we were so nice about it. We said, "well it's only 100 bucks, I'll bet there were many more with much more to loose" - The lady at the bank said she has been fielding irate calls all week from customers mad as #$%^.
Ok what happened? Who did it?
Let's hear all about it.
|
|
|
|
|
I'm trying to decide whether to go with liquid cooling for my next rig.
Does anyone here have any experiences good or bad with liquid cooling that they could share?
The difficult we do right away...
...the impossible takes slightly longer.
|
|
|
|
|
Richard Andrew x64 wrote: Does anyone here have any experiences good or bad with liquid cooling that they could share?
I'm assuming you're talking about boxes, not "It was a really hot day and I had this 6-pack chilling, and then a mate come over with a whole slab of beer, and then we decided to head out...[mumble mumble] ...and we never did find out what happened to the Penguin" kind of story.
cheers
Chris Maunder
|
|
|
|
|
MM would be the best source for tales of that ilk!
PooperPig - Coming Soon
|
|
|
|
|
It may be apocryphal, but I once heard of a liquid-cooled system (room-sized mainframe you understand) that shut down every Tuesday afternoon. When the gardener turned on the hose to water the plants.
|
|
|
|
|
Positioned cleverly, the TOWER, air vent in front of chair there to relieve went inside instead of toilet a stream, of liquid coolant at 36 degrees, not even leave, game, for a minute
|
|
|
|
|
Don't know about anyone else, but I find this unfunny, bordering on offensive. Certainly not a constructive contribution to the discussion.
Peter
Software rusts. Simon Stephenson, ca 1994. So does this signature. me, 2012
|
|
|
|
|
Nah - he's just taking the piss
PooperPig - Coming Soon
|
|
|
|
|
Pooper Pig is unfunny, may be offensive to muslims. Poop may be offensive if some is eating while to browse PLEASE REMOVE
|
|
|
|
|
Nareesh1 wrote: Pooper Pig is unfunny, may be offensive to muslims. Poop may be offensive if some is eating while to browse PLEASE REMOVE
And I think you're really a Joo.
You are circumcised, have a stupid beard and don't like ham, pork or bacon.
Maybe you are a member of the Judean People's Front.
Michael Martin
Australia
"I controlled my laughter and simple said "No,I am very busy,so I can't write any code for you". The moment they heard this all the smiling face turned into a sad looking face and one of them farted. So I had to leave the place as soon as possible."
- Mr.Prakash One Fine Saturday. 24/04/2004
|
|
|
|
|
No, I think he is a member of People's Front of Judea!
Your time will come, if you let it be right.
|
|
|
|
|
I thought he'd be part of the Judean Popular Peoples Front
|
|
|
|
|