I regularly read five percent of the programmers are one hundred percent more productive than the remaining ninety five percent. I am reminded of a line in the movie Little Buddha, when stringing a sitar, “If the string is too tight, it will break. If it is too loose, it will not make any sound.” I am an adequate programmer; a hack, who gets the job done. I would like to share with you the lessons I’ve learned to cope with the stress of troubleshooting, and hopefully keep your sitar playing a fine melody.
Over the years, I have experienced a personal journey from troubleshooting anxiety towards troubleshooting anticipation. System failures and deployment of new applications have been the two most stressful parts of my job. For awhile, I would dread the phone, particularly when releasing a new product into production. Inspiration can come from the most unexpected sources. One night while watching a PBS special on the U.S. Coast Guard, I was inspired on how they handled dangerous situations. They worked together rescuing people, making difficult life-altering decisions and putting themselves in harm's way. I realized that is what I wanted to do with my life. From that point on when the phone rang, I saw it as a training opportunity for facing difficult situations. I was allowed the opportunity to improve my skills, work under pressure and make tough choices, all preparing me for work in the Coast Guard. Turns out, I am too old to work full time for the Guard, but the preparation led me to the place where I can answer the phone with anticipation instead of anxiety.
I realize that I do not always know the answers, but I continue learning skills that help me work through deployment and production problems. Skills that help me face the pressure caused by a system failure, a software install failing or application updates not working. Over the last 2 years, I have compiled a list to work through when in a crisis and my initial ideas have failed:
- Check your assumptions.
- Check the obvious.
- It may not be my fault.
- I cannot do this alone.
- Be willing to make mistakes.
- Will this solve the problem?
- Let the machines do the work.
- One definition of insanity.
Check Your Assumptions
The first rule of troubleshooting is to check your assumptions. Ask yourself, “What environment is the application expecting? What application dependencies I have assumed are in this new environment?” Many times I have spent hours troubleshooting some installation, finally discovering the problem is caused by a difference in software environment.
Here are a few of the reasons why I have experienced software failure: VBScript version differences, MDAC version differences, libraries not found, locked down Oracle files and registry entries and network communication problems. None of the above reasons were the application’s fault; therefore, futilely spending hours looking at working code would not uncover the real problem.
I recommend making a list of what I expect: hardware, software and versions. This seems tedious, especially when that feeling, “I just want it to work.” is all over you. But it pays off.
A recent example of applying this troubleshooting method in a non-software environment illustrates the approach. A few months back, I had to repair a clothes dryer. Based on a comment from a service man, I was pretty certain the problem was the motor. So I went to the parts store and purchased a replacement motor.
I took the entire dryer apart to install the new motor. I had the barrel in the family room, the door and front panel leaning against the wall, and the skinny little belt that turns the drum in the trash (I had purchased a new one).
I put the new motor in and put the dryer back together and plugged the dryer into the electrical outlet. As the tightening of my stomach occurs, I press the ‘On’ button and the dryer does not work! After all the work…crud…what could have gone wrong? I remember patience; don’t panic, check your assumptions. Check the environment. Is the dryer plugged in? Is power actually coming to the outlet? I bring a table lamp into the room with the dryer, plug it in, and the light does not turn on. I check the fuse panel and find a blown fuse. When the motor died, it took the fuse with it! I replace the fuse, feel my stomach tighten as I press the ‘On’ button…voila the dryer starts.
Now I could have taken the dryer apart, thinking I didn’t install the motor correctly. I could have checked all the connections for the wiring on the motor. I could have assumed I am just not good at this. But by checking my assumptions, I bypassed all that extra, wasted work and found the source of the problem.
The moral is that we do not work in a vacuum and neither do our applications. Remember to check the environment and your assumptions.
Check the Obvious
Sometimes the answer is staring me in the face, but I want affirmation before I attempt the solution. Well when it’s staring me in the face, I don’t need affirmation; I need to do it. I’m not sure why this is an issue for me, but I want someone from the outside to affirm what I’m thinking, before I take action. Writing down what I’m thinking, literally has the answer staring me in the face and reinforcing my resolve to act upon the solution myself.
It May Not Be My Fault
I have learned over the years, it may not be my fault. When I remember this, it takes a great weight off my shoulders. Countless times production systems fail because a network engineer has changed routing configurations, there is an IP conflict or a database change. None of these things are within my control. By communicating with the responsible person, I have done my job. It is now there job to take the appropriate action.
My responsibility in these situations is making sure I am not assuming someone else changed something. I take the time to verify my assumptions from a list of expectations from item number one “Check your assumptions”. Testing things like, can the web server can talk to the database server or can I ping the application server. I can assist the person responsible for the real problem, by making sure it is their problem, and providing them with all the diagnostic data I have available.
I Can Not Do This Alone
No matter how righteous I act, I realize that I do not have all the answers. Years ago after a long walk, when struggling with a Visual Basic problem, I admitted, “I can’t do this … alone.” From that point forward, I found it easier to ask for help. As long as I have made an honest effort to resolve the problem first and the answer still eludes me, I ask for help.
There are several ways to do this: I ask a coworker with more experience for help, brainstorming with a colleague, search a newsgroup, post to a newsgroup and finally I call technical support, even if it costs money.
I am still impressed how in the course of discussing a frustrating problem with another person; I discover the answer, myself. Hearing something aloud or from another person, makes a connection in my brain, which I cannot make when sitting quietly, thinking.
groups.google.com is a search engine for every newsgroup under the sun. Many times during the day, I am at this site looking for an answer. Eighty percent of the time the same problem has already been discussed, it is just up to me to unearth the recorded discussion. If the answer is not found, I post a question to the relevant newsgroup, forty percent of the time someone responds with something useful.
Another recent troubleshooting example involved support of a web application. It would not work with Internet Explorer 6 Service Pack 1 (SP1) and I had no idea why. Unsuccessfully, I talked the problem over with our network engineer. I found newsgroups postings affirming there was an issue with SP1, but gave no solution for fixing the problem. The SP1 release documentation also proved useless in this area. Finally under encouragement from a coworker, I opened a support ticket with Microsoft. It turns out that SP1 enforced the W3C HTTP header specification to a greater degree. This included expecting the header to indicate content-length. By changing my code to specify content-length, the problem was resolved.
In this particular case, I would have never found the solution without Microsoft’s help. Because they never specified the change in header enforcement in the SP1 release documentation, Microsoft agreed not to charge for the incident.
Be Willing to Make Mistakes
“Even a mistake may turn out to be the one thing necessary to a worthwhile achievement.” Henry Ford
Fear of making a mistake can be paralyzing. I am not sure if it is perfectionism or fear of failure, but this fear can prevent me from taking a simple step. I have a question to help me get past this paralysis, “How many mistakes am I allowed today?”. This question gets me over the hump; I make mistakes, I do not purposely try to make mistakes, yet I know they will happen and when they do, the earth keeps spinning and my children keep breathing. I am OK after they happen.
The next step in the journey is remembering to ask “How many mistakes am I allowed today?”, before I start something new.
After making a mistake, fix it. In 1998, I upgraded my development computer from Windows 95 to Windows NT4. I was not sure of what I was doing and made a mistake during the install. This caused issues with security on the computer. After the systems engineer could not fix the problem, I continued limping along with this broken system. One day while having a conversation with two systems engineers, one pointed out I screwed up the install. I replied, “I am allowed to make mistakes.” He replied, “You need to fix your mistakes.” He was right; I wiped the drive, reinstalled NT properly and learned another lesson.
Will This Fix the Problem?
After some frustration with a problem, I find myself going down paths that do not contribute to the solution. I suspect I do this to demonstrate that I can make something work. This demonstration does not do anything to solve the problem and can be a lengthy unproductive diversion. When becoming aware of going down the diversion road, the question, “Will this help fix the problem?” brings me back on the right path.
At this point I need to acknowledge that taking a break from a problem can prove fruitful. When I return to the problem, my mind has had time to process. When taking a break I choose to do something else that is productive or beneficial. Sometimes I make the choice to do something I am pretty certain will succeed. This helps with my need to prove that I can make something work, and it contributes to another goal at work.
Let the Machines Do the Work
Let the machines and tools do the work. When debugging, I can get into stepping through some difficult and lengthy code. Many of the Integrated Development Environments (IDEs) offer great debugging features, but I have to remember to take the time to use them. It is easy to get mentally lazy: set some breakpoints, just keep hitting F10, and watch the values of the variables change. Often, half the breakpoints the debugger stops at, I am not currently interested in. It is possible in many IDEs to disable breakpoints without having to go to each line of code. A breakpoints dialog is available in Visual C++. It is simple to uncheck the breakpoints I do not need, knowing I can use this dialog to enable them in the future. This is one way that a little mental alertness allows me to tell the machine to do the work.
Captain Kirk, in a Star Trek book, referred to computers as brain amplifiers. He went on to remark that they are not brain replacements. Being alert, I can direct the machine to operate more effectively. This small mental investment saves energy down the road.
One Definition of lnsanity
Finally, one definition of insanity is "Doing the same thing again and again, expecting different results". For some reason, software development exposes this type of insane behavior. Click on that button again, type these commands again, run it again and maybe it will work this time.
STOP! Save the energy. If it did not work the second time, it is not going to work the tenth time. Count on it.
STOP. Take a break when in an obsessive-compulsive loop. Put down the mouse and let the brain engage. This break allows patience to have a chance to work on the problem.
STOP trying too hard.
We have looked at 8 items on my troubleshooting checklist. Items I use when facing a production system failure or facing difficulty when deploying production software. Items that allow me to handle a point of ignorance in a crisis, with a calm and common sense approach. Hopefully you found something useful in my experiences, something that clicked internally that will aid you in a tough situation. The next time you find yourself saying, "It works on my machine.", may be the next time you recollect what you read here. Hopefully is assists you as much as it helps me.