Capers Jones is the founder and former chairman of Software Productivity Research, a software development consultancy.
Software quality suffers as application size increases. We've known that since the 1970s but still haven't solved the problem. The National Institute of Standards and Technology found in a 2002 study that more than 60% of manufacturing companies reported major defects in software they bought, with just under 80% reporting minor defects.
Part of the problem involves how and what you measure. A common approach is the lines-of-code metric (referred to as KLOC, for thousands of lines of code), but it ignores important stages of the software development life cycle such as requirements and design. IBM dealt with this shortcoming back in the '70s by developing two metrics to gauge software quality: defect potentials and defect removal efficiency. Both are still highly relevant today and have had the greatest impact on software quality, costs, and schedules of any measures.
Defect potentials are the probable numbers of defects that will be found in various stages of development, including requirements, design, coding, documentation, and "bad fixes" (new bugs introduced when repairing older ones). Defect removal efficiency is the percentage of the defect potentials that will be removed before an application is delivered to users.
Defect potentials can be measured with function points, units of measurement that express the amount of business functionality an information system provides to users. Function points don't measure the number of lines of code because most serious defects aren't found in the code but instead occur in the application's requirements and design (see the sidebar "A Better Gauge of Quality").
The range of defect potentials typically scales from just less than two per function point to about 10. Defect potential correlates to application size: As size increases, defect potential rises. It also varies with the type of software, CMMI levels, development methodology, and other factors.
Function Points: A Better Gauge Of Quality
Function points express the amount of business functionality an information system provides to users. The advantage of this approach over the defects per thousands-of-lines-of-code metric, or KLOC,is that it can measure defects in requirements and design as well as in code. More important, KLOC penalizes modern high-level programming languages when measuring quality and productivity because they can use fewer lines of code to achieve the same results as older languages.
To illustrate, assume an application written in Java requires 1,000 Java code statements and has 10 coding bugs, equal to 10 bugs per KLOC. If the same app written in C requires 3,000 C statements and contains 30 coding bugs, it also has 10 bugs per KLOC. The Java version has better quality since it delivers the app with only a third as many bugs as the C version. But when measured using lines of code, the Java and C versions appear to be identical in terms of quality at 10 per KLOC.
When these apps are measured with function points,we assume that they're both 10 function points in size. The Java version contains one coding bug per function point, while the C version has three. In other words, function points match standard economic definitions for measuring productivity, and they measure quality without omitting requirements and design bugs, and without penalizing modern high-level languages.
Comparing the quality from different software methodologies is complicated. However, if the applications being compared are of similar size and if they use the same programming languages, then it's possible to compare quality, productivity, schedules, and other areas. Table 1 compares the defect potentials and removal efficiencies of several software development methodologies.
The examples shown here are based on a combination of the C and C++ programming languages. Although the actual sizes varied, the sizes were converted mathematically to exactly 1,000 function points or 75,000 logical code statements.
The "defect potentials" are the total numbers of defects found in five sources:
- Source code
- User documents
- Bad fixes or bugs in defect repairs
The "high severity" defects are those ranked as #1 and #2 using the classic IBM Defect Severity Scale:
- Severity 1 indicates total stoppage of the application; software does not operate at all.
- Severity 2 indicates that major features cannot be used; disabled or incorrect.
- Severity 3 indicates minor features are disabled or incorrect.
- Severity 4 indicates cosmetic error that does not affect operation in significant ways. .
The "defect removal efficiency" was the total number of defects found and eliminated before the software applications reached clients. After 90 days of usage, client-reported defects were added to internal defects in order to calculate the percentage of defects eliminated prior to release. (There will of course be more defects found after 90 days, but a standard time interval is necessary to have consistent calculations.)
Compared to the poor results associated with CMMI level 1, all of the other methods listed in Table 1 demonstrate significant improvements. With larger samples there would be somewhat different results, but the basic concept of measuring defect potentials and defect removal efficiency levels would remain the same.
Based on studies of about 13,000 applications the approximate average for defect removal efficiency in the U.S. is only about 85%. Therefore all of the methods that top 85% can be viewed as being better than average.
The upper limit of measured defect removal efficiency is only about 99%. Projects that top 95% tend to use sophisticated combinations of inspections, static analysis, and multiple test stages. Projects that achieve only 85% or less in defect removal efficiency typically use only testing, and are also low in test coverage.
If you have a scientific calculator handy, take the size of the application in function points and raise it to the 1.25 power. The result is the approximate number of defects that will occur. Try this for 10, 100, 1000, and 10,000 function points, and you can see that as code gets bigger, you get dramatically more functional defects. (While you have your calculator out, raise the size to the 1.2 power to see how many test cases you'll need. If you raise size to the 0.4 power, you get the number of months it will take. Divide size by 150, and you get how many people are needed on the development team.)