|Title||Software Project Secrets: Why Software Projects Fail|
- Why Software is Different
- Project Management Assumptions
- Case Study: The Billing System Project
- The New Agile methodologies
- Budgeting Agile Projects
- Case Study: The Billing System Revisited
Chapter 1: Introduction
Your boss has asked you to oversee the development of a new billing system, and you've brought together a capable project manager and a group of handpicked developers. They've chosen state-of-the-art technologies and tools to build the system. The business analyst has talked at length with the accounting manager, and has written up a detailed set of requirements. The project has everything it needs to be a success—doesn't it?
Apparently not. Six months later the project is already late and over budget. The developers have been working overtime for weeks, and one has already quit, but despite this the software never seems to get any closer to completion. Part of the problem is that the accounting team keeps claiming that the software doesn't do what they need, and they have pushed through a steady stream of "essential" change requests, not to mention a flood of bug reports. Your boss will be furious when she hears about this.
So what went wrong?
Whatever it is, it must be something that most companies get wrong. According to Standish Group  research, only 28 percent of software projects in 2000 succeeded outright (see Figure 1-1). Some 23 percent were canceled, and the remainder were substantially late (by 63 percent on average), over budget (by 45 percent), lacking features (by 33 percent), or, very often, all of those issues combined.
Figure 1-1. The success and failure of software projects in 2000.
At New Zealand's Ministry of Justice, the new $42 million Case Management System was $8 million over budget and over a year late when it was rolled out in 2003. Of the 27 benefits expected from the system, only 16 have been realized. Instead of boosting productivity, the system has actually increased the time needed to manage court cases by doubling the amount of data entry. A post-implementation review identified over 1,400 outstanding issues. But "the only challenges faced by the developers were those common to large and complex systems" [Bell 2004].
In contrast, things look very different in the engineering and construction industry. According to the Engineering News-Record, 94 percent of the project customers they queried were satisfied with the results of their projects, which suggests that construction projects have much lower failure rates than software projects. That's why the collapse of the tube-shaped roof in the newly constructed terminal 2E at Charles de Gaulle airport (Paris) in May 2004 made front-page news around the world: it was so unusual. Failed software projects are far too common to merit such attention.
We can learn why by looking at commercial and noncommercial software development. Commercial software is produced by companies for profit. Some software is custom written for individual clients, such as your billing system, but there are also generic "off-the-shelf" products like Microsoft Word. Virtually all of these are created within a project, or within a series of projects. Noncommercial software is very often open source, which means that anyone can read its source code. Users can find out how it works, and make changes to fix bugs and add the features they want. With open source software, developers from around the world work together on software that has no fixed feature list, budget, or deadline. Open source developers coordinate their efforts in ways that are quite different from traditional project management.
Open source software is a huge success. "The Internet runs on open source software (BIND, Sendmail, Apache, Perl)," says Tim O'Reilly, CEO of O'Reilly & Associates, one of the largest publishers of computer books. Open source software generally has far fewer reliability issues or bugs than commercial software. But is it a success by the same criteria we use to measure commercial projects? After all, with unlimited time, wouldn't every project succeed?
It's true that unlimited time can compensate for poor productivity. However, the productivity of open source developers is legendary. In 1991, Linus Torvalds wrote a complete, stable, operating system kernel (Linux) in less than a year, substantially on his own at that stage. And less than a year after, eight core contributors came together to form the Apache Group, they had made Apache 1.0 so compelling a piece of software that it became the most widely used webpage server on the Internet.
These successes suggest that software development can work very well outside traditional project management. This is perplexing, considering that project management techniques work well in most other areas. We have seen that this is true for construction and engineering. There must be something quite different about software development that makes project management fail.
The next chapter will begin the analysis by identifying the characteristics of software, and of the software development process, that make them unique. These characteristics will then be compared against project management's best practices to discover where the process of project management breaks down for software development. The first part of the book closes with a simulated case study that shows how these problems can cause an otherwise promising project to fail.
These chapters describe the problems in software development in some detail. This may seem discouraging, but don't abandon hope just yet. Identifying the source of a problem is the first step towards finding a solution. The second part of the book focuses on strategies that can help to bring software projects to a successful conclusion. It begins by surveying three popular and promising new software development methodologies. It then considers ways to reconcile these methodologies with project management. Finally, the case study from Part One is reworked to show how the same project could have succeeded by using the new techniques.
Chapter 4: Case Study: The Billing System Project
In the previous chapter, we performed an in-depth analysis of project management to discover where it breaks down for software development. This chapter covers the same issues, but from a different perspective. It introduces a fictional case study (the same scenario as the one that begins the Introduction) to illustrate how the ten hidden assumptions come into play, and how they lead to the problems that cause projects to fail. At the end of the chapter, we will consider what impact each of the assumptions had.
The case study is not necessarily a typical project, since many aspects have been simplified for reasons of clarity and space, but it is by no means unusual. Each of the issues in the case study has occurred in at least one real-life project that the author has been involved with.
This is the last chapter in Part One. In Part Two, we'll try to find solutions for these issues, building on the ideas that we have already discussed. Part Two will finish with another case study that reworks this scenario to show how it could have succeeded had the project been managed differently.
Acme Inc.—a medium-sized toy manufacturer—has seen its stock price slide significantly over ongoing losses from its expansion program. Each department has been asked to cut its costs by 10 percent to help profitability and reassure investors. Karen, the accounting manager, has come up with the idea of integrating the various financial applications that are used by her team, so that the data would be entered only once into a new master application, which would then automatically copy it into the other applications. Her department could shed three full-time data-entry roles by eliminating multiple entry of the same data.
Karen went to see her boss Salim, who, as the chief financial officer, had the authority to approve or reject the project. He liked the idea, but urged caution: "Remember, we're trying to save money, so we've got to keep the cost of the project down. Company policy says that any new investment has to pay for itself within three years. Given the position that the company's in right now, I'd like to see payback well before then. See what you can do." Salim contacted Acme's preferred employment agency, People Co., to hire an experienced business analyst as a contractor for two weeks to scope and estimate the project. Brian came on board a week later, and immediately set up a series of meetings with Karen to go over the requirements. By the end of the two weeks, he had completed a thick functional specification document. He had also come up with an initial estimate of $300,000 for the whole project, including $7,500 for his work so far.
This figure relieved both Karen and Salim, as the expected savings were around $150,000 per year, so the investment would be fully recouped within two years. The project was given the go-ahead to begin planning.
While Acme has outsourced all of its IT needs, it does still have a number of capable and experienced operational project managers on its staff. After chatting to the operations manager, Salim found out that one of his subordinates—Phil—happened to have some free time over the coming months, and that he could certainly take responsibility for the new project.
Phil spent a week going over the estimates and scope that Brian had written up, and in his project plan he organized the duration, resource, and cost estimates (Table 4-1).
Table 4-1. Duration, Resource, and Cost Estimates for the Project's Activities
|Key business owner||Karen||N/A||N/A|
|Requirements||1 business analyst||Brian||½ month||$7,500|
|Design||1 software architect||Angela||1 month||$20,800|
|Construction||4 developers||Reiko, Tim, Hua, Mike||2½ months||$138,600|
|System Testing||1 tester||Ian||½ month||$6,100|
|User Testing||1 end user||Emily||½ month||$4,300|
|Rework||4 developers||(as above)||½ month||$27,700|
|Project Management||½ project manager||Phil||4 months||$34,700|
He divided the remainder of the project into Design, Construction, and Testing/Debug phases (Figure 4-1).
Figure 4-1. The overall project plan.
Phil also considered which risks were most likely to affect the project (Table 4-2). He followed the common practice of multiplying together the probability and impact of each risk to obtain a figure for how much contingency was needed. The impact of sickness could be ignored since Acme didn't pay for the contractors' sick leave, and because he thought that one end user could easily be substituted for another. After adding 10 percent contingency for unknown risks, Phil ended up with what he thought was a generous contingency reserve of 25 percent for the project as a whole.
Table 4-2. The Risk Register
|Billing System Project Risks||Probability||Impact||Contingency|
|Changes to requirements.||20%||25%||5%|
|Problems integrating with existing systems.||25%||20%||5%|
|Developers not as competent as expected.||10%||20%||2%|
|The system will be more buggy than expected.||30%||10%||3%|
|Sickness will delay the project.||10%||0%||0%|
|Unknown risks will arise.||10%|
The final estimates for the project's cost and duration were $299,600 and five months respectively, and, as the formal sponsor for the project, Salim was happy to sign off on it.
People Co. quickly found an experienced software architect, and Angela became the first contractor on the team. Her brief was to write a technical specification document that included both a high-level architecture and detailed design work. She soon decided that Microsoft .NET web services would be the best technology to connect the various accounting applications, and began drawing UML diagrams to show what the solution would look like.
She thought it a bit strange that she wasn't allowed to create a prototype or write any test code, but Phil had been very clear about this at their first meeting. "I'm sorry, but you're just too expensive to write the software. This project is under strict financial constraints. There's a lot of code to write, and we want it done at $80 an hour—not $120."
Angela knew that this arrangement wasn't a good idea, but this contract was only for a month, and it wasn't worthwhile making a fuss about it. Besides, it wouldn't be her problem when things went awry.
As soon as Angela had decided on the basic technology, Phil went back to People Co. to look for developers with the corresponding skills. He didn't want novices who couldn't be trusted to deliver the results, but he didn't want anyone too expensive either. People Co. was able to find four intermediate-level developers who claimed familiarity with .NET and web services: Reiko, Tim, Hua, and Mike.
On Monday morning, the four developers turned up to find that they had been given spare offices in various parts of the Acme building, one of which had just been vacated by Angela. They quickly got busy installing their development software and reading the two thick specification documents that Phil had given them.
"We'll have team meetings every two weeks," he said to them, "but in the meantime, if you encounter any issues or problems, then don't hesitate to come and talk to me about them. My door is always open."
Over coffee that afternoon, the developers decided to divide the work into four big chunks: the user interface and business logic, the database, the web service interfaces, and the infrastructure. Hua had worked on a couple of big database projects, so she volunteered to look after the database access functionality. Reiko took over the user interface and business logic, Tim got the web service interfaces, and Mike was left with the infrastructure. They decided to work individually for two months, and then spend the last two weeks bringing all the pieces together.
Before long, Reiko discovered that the functional specification didn't actually describe how the new application's screens should be laid out. She asked Phil about this.
"Brian said that these requirements would be all that you'd need to build the user interface," was his response. "Why don't you just put together something reasonable, and then update the functional specification to document what you've done?"
Reiko had hoped that she could get away with just writing the code, but she accepted in good grace. When she realized that the error messages hadn't been specified either, she didn't bother to ask Phil, but just made them up herself, and then added them to the functional specification too.
On the other side of the building, Tim was having real difficulties. He had worked on a web services project before, so he was comfortable accepting responsibility for the web services interfaces. However, some of the accounting applications had peculiar requirements for the format of their data, and the interfaces that the .NET tools were creating just didn't work.
He discovered that, rather than relying on his tools to create the interfaces, he would have to create them by hand from technologies he knew nothing about. However, he knew that books about these technologies were easy to find, so he kept quiet and hoped that he could learn enough to get it all done in time.
As Mike started work on the program's internal infrastructure, he realized that the design, although elegant, needed some refinement to allow it to do everything that was needed. In fact, it really could do with a substantial makeover. However, when he mentioned the problem to Phil, his response wasn't very encouraging:
"Angela came highly recommended as a software architect, and I don't want you to change her design any more than is absolutely necessary. I also want you to document precisely what changes you do make. The technical specification is the documentation for this software, and I want it to be complete and accurate at all times."
So instead of the redesign that Mike thought was necessary, he was reduced to putting in a series of quick and ugly fixes for all of the functionality that was missing in the infrastructure design.
At the project meetings, each developer reported steady progress, with another 25 percent completed every two weeks. "Well, it looks like we're staying on track," said Phil.
After two months, the team got back together to integrate their code. Going around the table, they found that each person's work was pretty much complete. Reiko spoke for everyone when she said, "My code is still a bit rough around the edges. It should work fine, but the error handling, for example, could do with a bit more work. I'm sorry, but updating all the documentation has made everything take twice as long."
However, when they tried to compile together the four separate chunks of the system, it failed with a very long list of error messages. They worked throughout the day to sort them out, but it seemed that for every one they fixed, two more would appear. By the end of the day, the list was starting to shrink again, but Hua was still quite panicky:
"You guys go home. I'll stay here. We can't all work on this at the same time anyway. We haven't got long to get it working properly, and the least I can do is to get it to compile."
However, the work went very slowly, and at 2 A.M. Hua gave up with half a dozen serious errors remaining. What was worse, though, was that she had noticed a fundamental incompatibility between Reiko's business logic and Tim's web services. She called the team together for a meeting as soon as they came in the next morning. The mood was tense.
"Guys, we've got a big problem. Reiko has written her code to use transactions, but Tim's web services don't support transactions," she said. "It has to use transactions," said Reiko. "It's in the functional spec. The updates to the database and the updates via the web services have to either all work perfectly, or all be aborted together. Without transactions, how else can you abort one update when another one fails?"
"But web services don't support transactions yet," said Tim. "That technology has been delayed again, and it won't be out until the beginning of next year."
"Is there a workaround?" asked Mike.
"I think so, yes," said Hua. "For every web service, we can add another web service to undo that update. Then if we need to abort the transaction, we just call that undo web service."
"But that means doubling the size of the web service module," said Tim.
"There's no way I can finish all of that in the next two weeks."
"Reiko and I will help you," said Mike. "Hua can finish up the integration; she's been doing really well on that so far. But we really need to get everything done before we have to hand it over to the testers. Is there any way that we can get things done faster?"
"Lose the documentation," said Tim immediately. "That's the biggest overhead. Every time I make the slightest change to the code, I have to spend a lot of time updating all the diagrams. I know it's nice to have good documentation, but surely it's a lower priority than getting the system working properly?" "OK. And no gold plating either. I don't care if it's a bit rough and ready, but it all has to be in place by a week from Monday," said Mike.
The last two weeks were a nightmare. Hua was able to get the program to compile at last, but it immediately crashed when she tried to run it. It was hard tracking down the bugs because each of the developers had written their code in a very different style. The code had few comments, and it was difficult to understand how it worked.
"Oh well," she thought to herself, "we've still got the Debug phase ahead of us. It'll be much easier when we can all work on our own code again." The web service undo code turned out to be trickier than they had expected, and it still wasn't quite complete by the last Friday, so the developers decided to work through the weekend. At the handover meeting on Monday morning, they were at least able to claim that the software was now "feature complete," even if it still had a few bugs. It was now in the hands of the tester, Ian, and Emily, the end user from the accounting team.
The first bug report came in just ten minutes after the handover meeting, and after that they flooded in. Over the rest of that week, the bug list grew to include over 160 serious or critical bugs. On Friday, Phil called a meeting to discuss the situation.
"We were planning to deploy the system in a week's time," he said. "I really need to know whether we can still make that date."
Hua shook her head. "We've fixed nearly 60 bugs this week, which is a tremendous rate, but there's no way that we can get the rest done by the end of next week. We'll need another week, maybe two to be on the safe side."
"Well, buggy code was one of the risks I identified at the beginning of the project, and that's what the contingency is for. I'm comfortable pushing back the release date by two weeks," replied Phil.
"I'm not," said Ian. "There are a lot of features that I can't test yet because the program fails before I can even get to that point. I'm sure that there are still a lot more bugs to be found."
"That's because half of the bug reports are actually change requests," replied Reiko. "Look at this: 'Can't paste a block of data from a spreadsheet into the table.' That's not in the functional specification."
"It's what we do at the moment," said Emily. "I thought this software was supposed to save us time, not make our work slower. If we have to copy over one number at a time, then we'll spend twice as long on each invoice."
"OK, guys," said Phil, "let's slip the date by three weeks, but I really want you to make sure it's done by then. I'd like to keep that last week of contingency time for any issues arising from the deployment. Reiko, I see your point, but the accounting guys have to be OK with the software too. I want you to work with Emily to find the minimum set of changes that will keep them happy. Tim, you work with the developers so they can fix the bugs that are holding back your testing. And I want another project meeting in a week's time."
At the next project meeting the developers were ranged down one side of the table, and the testers down the other. Each side glared at the other. Reiko was the first to speak. "Emily hasn't backed down on any of her change requests. In fact, she's added more. At this rate the software will never be finished."
"I talked it over with Karen, and with the rest of my team, and they all agree with me," replied Emily. "We need software that we can use. Karen's not going to switch us over to the new system until we're 100 percent happy with it."
"Let's talk about that in a moment. How's your end doing, Ian?" asked Phil.
"Sorry, Phil, it's not going well. The guys have managed to open up most of the application, but it seems like every time they fix one bug they break two more things. The number of serious and critical bugs is now 230."
"I'm still confident that we can make the deadline," said Hua. "I think we've fixed most of the really hard bugs, so the rest should go faster." Phil thought for a moment. "I'm willing to give you the whole month's contingency to fix those bugs, and to make the changes that the accounting guys require. But that's the absolute deadline. This project is supposed to save money. There's no way that I'm going to let it go over budget."
The developers worked as fast as they could for the next four weeks, but it became increasingly apparent that they just couldn't make the deadline. The bug list had stopped growing, but it wasn't shrinking fast enough, and every new feature that Reiko and Tim finished added its share of bugs to the total. At the five-month mark, Phil finally allowed the project to slip beyond its planned contingency. He rationalized that it was better to accept a small loss than to throw away everything they'd worked so hard for.
Mike left the team at about the same time. He'd become increasingly unhappy as the project got into more and more trouble, so he arranged another contract to start as soon as the one with Acme was due to expire. He was replaced at short notice by Deepak, a recent graduate who had studied .NET at college.
However, Deepak found it very hard to work on the software. The code had had so many hasty changes that it was disorganized and messy, and the documentation was so out-of-date that it was almost useless. He alternated between spending hours peering over the shoulders of the other developers and spending hours in his own office making very little progress and getting increasingly frustrated.
The release date kept slipping, and eventually both Salim and the CEO, Cathy, got involved. Salim asked Angela to return to Acme to provide solid estimates for all the "essential" changes that Emily had asked for. Angela suggested that they allow another month for these changes.
At the six-month mark, the project was already 30 percent over budget, and it needed at least another 60 percent to complete, including a month's work to clear up the remaining bugs (Figure 4-2). Cathy had no hesitation in canceling the project.
Figure 4-2. The financial position when the project was canceled.
"This project is in very bad shape, and there's no guarantee that it will ever be completed successfully. I'm just not willing to put any more money into it." Salim lost his bonus over this project, and Phil missed out on the promotion that he had been expecting. Both were chastened by the experience, but neither of them really understood where the project had gone wrong.
So where did the project go wrong? If we compare the case study to the list of project management assumptions we identified in Chapter 3, we can see that most of the problems in the project occurred because the project plan relied on these assumptions, and because they turned out to be incorrect:
- Scope can be completely defined.
- Scope definition can be done before the project starts.
There was no opportunity to reevaluate or adjust the scope of the project. Emily was right to point out the flaws in the requirements, but she was only able to do so after all of the functionality had been created.At this point, the changes to the requirements meant that some of the existing code had to be thrown away and rewritten, which was wasteful and increased the cost of the project.
- Software development consists of distinctly different activities.
- Software development activities can be sequenced.
- Team members can be individually allocated to activities.
The lack of overlap between the requirements gathering, design, construction, and testing activities meant that communication between the individuals performing these tasks was extremely limited. The specification documents were all they had to go on, and there was no way to ask questions or give feedback. The developers couldn't discuss the design with the software architect, and the testers couldn't discuss the requirements with the business analyst.
If the developers had begun testing as soon as they started writing code, then the quality issues would have been apparent much earlier, and could have been systematically addressed. And if the developers had also been responsible for the design of the software, then it could have been refined as required once they were able to see how the software was shaping up.
- The size of the project team does not affect the development process.
The team was very small, and could have worked more efficiently by adopting a less formal development process. Face-to-face conversation is a much less laborious way to communicate information than via documentation.
- There is always a way to produce meaningful estimates.
- Acceptably accurate estimates can be obtained.
- One developer is equivalent to another.
The project was estimated before the team members were identified, so no allowance was made for individual variations in skill—such as Tim's limited knowledge of web services. The team had never worked together before, so there was no way to check the estimates against the outcomes of earlier projects. In hindsight it's obvious that the contingency reserve of 25 percent was grossly inadequate.
- Metrics are sufficient to assess the quality of software.
The team used two metrics to assess their progress. In the Construction phase they estimated the proportion of the functionality that they had completed. In the Testing phase they counted the number of bugs that had been reported. The sheer number of bugs made it clear that the software was of low quality, but this metric misled the team into believing that fixing these bugs would be enough to fix the software. But the more frantically they worked on the software, the more messy and fragile it became. Their efforts just reduced the quality even further.