This week I was asked if there are any statistics there showing

- the number of defects (on average) produced during a software elaboration project or

- the number of defects produced versus the number of lines of code per programming language or technological stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publically available. Any company, publishing such statistics, could damage its reputation. However, internally, any company should collect these statistics for the risk management purpose.

Still, we can apply maths to do some estimation, can't we? For example, let's assume the following:

**1.** A project consists of one or few iterations.

**2.** Ideally, code from each iteration is deployed with 0 defects. As a result, we consider what was fixed during the iteration(s). We also assume that what was deployed with the previous iterations is free of defects in the current one.

**3.** Most of the trivial defects are spotted during the compilation or build process (far before the test team gets engaged) as a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model

**4.** The unit tests are free of defects

Going further:

**5.** The *X*-th iteration delivers *N* new units.

**6.** Each unit must have at least 2 Unit Tests, for Expected Pass and Expected Failure cases.

**7.** As a result the *X*-th iteration has *2⋅N* Unit Tests.

**8.** The probability of a single Unit Test to fail is *1⁄2* (Unexpected Pass or Unexpected Failure).

**9.** The iteration can have from *0* to *2⋅N* defects as a result. The probability that the number of defects is *m* (*0≤m≤2⋅N*) is

**10.** The mean value or the average number of the defects is *2⋅N⋅1⁄2 = N*.

So, *N* units with *N* defects or roughly *1* defect per unit.

Few words about the maths used. It is the binomial distribution where *p=1⁄2* and the mean value is *E(X) =∑m⋅P(m) = n⋅p*, where *n=2⋅N*.

This formula also tells us that if we reduce the probability for a Unit Test to fail (*p<1⁄2*), then we will also reduce the number of defects. Sounds logic, doesn't?

I will also provide a quick proof for the mean value because it is indeed a very elegant piece of mathematics, so

*E(X) =∑m⋅P(m) = P(1)+2⋅P(2)+...+n⋅P(n)==C ^{1}_{n}⋅p⋅(1-p)^{n-1}+2⋅C^{2}_{n}⋅p^{2}⋅(1-p)^{n-2}+...+n⋅C^{n}_{n}⋅p^{n}==p⋅[n⋅(1-p)^{n-1}+2⋅C^{2}_{n}⋅p⋅(1-p)^{n-2}+...+n⋅p^{n-1}]==p⋅n⋅[(1-p)^{n-1}+C^{1}_{n-1}⋅p⋅(1-p)^{n-2}+...+p^{n-1}]==p⋅n⋅(1-p+p)^{n-1}=n⋅p*