Open Source Is Better Than the Closed Stuff (Until You Hit 1 Million Lines)

In 2006, the Department of Homeland Security partnered with a software code analysis company called Coverity to examine open source code for security vulnerabilities and software defects. Each year since, Coverity has published a report on the quality of open source code, and each year, the company has found that it isn't that different from proprietary software. That seemed to settle the issue. But the latest report, published on Wednesday, found something new: the code quality of open source projects tends to suffer when they surpass 1 million lines of code, whereas proprietary code bases continue improve when they pass that mark.
JavaScript by Dmitry Baranovskiy
Dmitry Baranovskiy/ Flickr

In the dark old days of the late 1990s and early 2000s, debates would rage about whether open source software is as good as proprietary software. And it was all a matter of opinion.

Then, in 2006, the Department of Homeland Security partnered with a software code analysis company called Coverity to examine open source code for security vulnerabilities and software defects. Each year since, Coverity has published a report on the quality of open source code, and each year, the company has found that it isn't that different from proprietary software. That seemed to settle the issue.

But the latest report, published on Wednesday, found something new: the code quality of open source projects tends to suffer when they surpass 1 million lines of code, whereas proprietary code bases continue improve when they pass that mark.

The Coverity Scan tool performs automated static analysis of code bases, looking for defects such as resource leaks, illegal memory access, and control flow issues. It's free for open source projects and available to proprietary software vendors for a fee. Coverity drew on its user base for the report, analyzing 118 active open source projects and 250 proprietary projects.

The study found that open source projects have an average of .69 defects per 1,000 lines of code, while proprietary projects have about .68 defects per 1,000 lines. But when projects were compared based on the total number of lines, some intriguing differences emerged.

Open source projects with 500,000 to 1 million lines of code had, on average, .44 defects per 1,000 lines of code. Proprietary projects in the same range had .98. But opens source projects with over one million lines of code had .75 defects per 1,000 lines. Proprietary projects in the same range had only .66.

Image: Coverity

The report speculates that the reason for the discrepancy is that when open source projects are young, they're developed by a small group of dedicated volunteers. As the project grows and new developers start contributing code, it becomes harder to manage. But on proprietary projects, the process is initially haphazard, but becomes more rigorous once the project grows.

"But this doesn’t mean that the quality of the codebase suffers," the report cautions. "These are typically projects that are heavily adopted in the industry, have the backing and support of a commercial company and still have above average software quality."

But it's an important issue as open source projects continue to grow. Only 13 projects were over the 1 million mark, but the average size of the open source projects Coverity analyzed was 580,000 lines in 2012, up from 425,179 in 2008. In fact, the report suggests it's this growth that made the average defect density increase from .45 in 2011 to .69 in 2012.