Lior is chief technology officer at OSTnet and can be contacted at http://www.ostnet.com/. Jan is a software engineer at Inovant and can be contacted at http://www.visa.com/.
Software reuse has long been on the radar of many companies because of its potential to deliver quantum leaps in production efficiencies. In fact, basic, or ad hoc software reuse already exists within most organizations. This reuse of documents, coding styles, components, models, patterns, knowledge items, and source code is rarely discussed because it usually starts and ends as an informal grass roots effort, with management having little understanding of how it started, why it persists, and how they might proactively extract larger benefits from it.
With an understanding that some form of reuse very likely already exists within most, if not all, software development organizations, the questions emerge, "how can we measure the level of reuse that already exists?", "what can be done to increase reuse benefits?", and "how can we track our progress along the way?".
Finding a Unified Unit of Measure
The first step to being able to measure how instances of software reuse are impacting operations is to define a base unit of measure that you can use across all instances.
The primary issue in finding such a unified unit of measure lies in the fact that reuse is not limited to the source or machine code. In fact, all of the assets associated with software development are possible targets for reuse (components, source code, documents, models, web services, and the like). As a result, artifact-specific measures such as lines of code, pages of documentation, or classes in a diagram are simply not generic enough to be useful, and for the most part do not readily translate into real corporate costs. We suggest using hours as the base unit of measure. Work hours translate directly into costs for a software organization, are easily measurable, and can be universally applied across all artifacts.
Some have used "average developer hours" as the base unit of measure with average developer hours defined as the number of productive hours that an average developer typically spends directly on software development (15 hours per week, for example). Because the organization pays for software developers, regardless of whether they work on a task directly or indirectly related to the software project, we propose sticking to easily measurable worked hours as the base unit of measure since it is less subjective.
Another reason cited for using "average developer hours" is that there are cases of certain developers within the organization who are much more (or less) productive than the "average" developer. Presumably, however, developers who are extra productive will be recognized as such, and this trait will be reflected in their salary. Salary, which is a market-determined metric, is therefore likely the best and only unbiased measure that we can use to compensate for productivity differences between developers. As long as we use salaried rates for each resource as opposed to some average for the group, we should be able to implicitly keep track of these differences in productivity as we measure dollar cost savings.
Measuring Ad Hoc Software Reuse
Because there are no set-up or other costs associated with ad hoc reuse, the only costs to the enterprise relate to the time spent searching for and analyzing whether a particular reuse candidate can in fact help accelerate the development of a current task. If the search yields a positive result, there are also subsequent costs associated with modifying/integrating the reusable item into the current project. The risks associated with ad hoc reuse initiatives relate to the time spent to determine whether reuse candidates exist because this time is nonrecoverable and is added to the total development time in the event that no reuse item is located.
Over multiple search and reuse iterations, the combined time spent searching, understanding, and integrating the found content into the current project must be less than the time to develop all of the integrated content from scratch for the reuse efforts to be judged as successful.
In mathematical terms, this is written as follows, where the expression on the left signifies the total time to develop the content over all reuse or attempted reuse iterations, and the expression on the right indicates the actual or expected time required to build all of the combined content from scratch:
(TLR+U)*N + i*SR*MOD +
i*(1-SR)*BUILD < i*BUILD
In this case, TLR = Time to locate each potentially reusable item; U = Time to understand suitability of each potentially reusable item for current task; N = Number of items that were examined, including each of the items that finally get reused (if any); i = number of attempted instances of reuse; SR = Search hit rate. Percentage of i that yielded a positive search result (for instance, the user discovered a suitable reuse candidate that gets incorporated into the project); MOD = Time to integrate/modify the reused item for current purposes; and BUILD = Time to build an element from scratch. This is the actual or estimated time spent building the software. To calculate expected time to project completion, developers can use any estimation methods currently in place internally (project plans, function point analyses, black magic voodoo).
Similarly, by taking the percentage difference between the no reuse and ad hoc reuse scenarios (for instance, (no reuse-ad hoc reuse)/no reuse*100), you can arrive at the percentage of savings generated by ad hoc reuse in the enterprise. After simplifying, this equation looks as follows:
% Savings = [SR - (TLR+U)*(N/i)/BUILD -
In this instance, (TLR+U)*N/i is the average time spent searching for a reusable item before an item is found or the user decides to build from scratch. This number is typically less than five minutes. If the average size of the item that you are looking to build is over eight hours (a reasonable assumption), then this term is negligible compared to BUILD, and the ratio of the two terms is essentially 0.
MOD/BUILD is the relative cost of integrating an element versus building it from scratch. This value has been determined over numerous empirical studies to be in the range of 0.08 for black box component reuse to 0.85 for a small snippet of code.
We'll use an average search hit rate SR of 20 percent (for example, a user finds a useful item one out of every five times that he actually tries to locate something) and 0.75 for an average MOD/BUILD value. The MOD/BUILD value is on the high end of its normal range since the granularity of the things being used in an ad hoc reuse initiative is typically small, as are the incremental benefits achieved. This is a fair assumption because the reuse initiative is not being managed and the developers' source for the content being reused is not optimized (that is, the content is taken from the Internet, friends, and other unmanaged sources).
Plugging the aforementioned assumptions into the equation, we find that ad hoc reuse generates savings equal to 5 percent of development costs. Although it appears small on a percentage basis, this number can actually be quite large in dollar terms given the high total cost of the development.
For example, if a company's total IT salaries are $5 million, the 5 percent increase in productivity would equate to $250,000 in annual savings.
Evolutionary Software Reuse
Regardless of the process or processes used to develop software within an organization, there are easy to implement improvements that can be initiated to enhance the returns currently being realized with ad hoc reuse. Although the tasks and the ways of measuring results will not change from one process to the next, the artifacts to be reused and the point at which the reuse-related tasks intervene in the process will vary. By way of example, companies following an RUP process will typically reuse such things as use cases, subsystems, components, source code, and documents, and these will be accessed at various points during the elaboration, construction, and transition phases.
Without significantly altering their core development process, companies can begin to benefit to a greater degree by actively managing their existing software assets. In an "evolutionary reuse" practice, users are encouraged to identify all potentially reusable items and make them available to others in the organization, without investing any time up-front to make them "reusable." During each instance of reuse, the individual reusing the asset is encouraged to refactor the reusable artifacts and republish the upgraded asset, thereby evolving it towards black box reuse.
By following this reuse methodology, no initial investment is required to generalize the asset in anticipation of reuse that may not ever occur. Each asset is only extended to the extent needed to accommodate the current requirements, thus there are no sunk costs on assets that were created for reuse but never reused.
To implement a more structured evolutionary reuse effort, companies need to:
- Provide better access to their own internal software content.
- Promote the development of well-factored software (a process that is already quite familiar to most software developers).
- Measure results and gradually refine the reused content to ensure growing incremental benefits with each new instance of reuse.
Looking at how we model and measure evolutionary software reuse, we first need to identify all incremental costs and benefits that are not present in an ad hoc initiative. These are:
- Users who locate a reusable asset will typically need to refactor the asset for current purposes. Most of this effort is captured in MOD, but there may occasionally be additional effort involved with restructuring the asset to ensure that it remains well-factored. Since this effort is only necessary when something is to be reused, the total incremental cost is i*SR*FACT, where FACT is the average incremental time to refactor assets for entry into the asset repository.
- In addition to ensuring that the reusable artifacts are well factored, there are additional costs associated with creating assets from your reusable artifacts (for instance, attaching metadata to make the artifacts easier to find, understand, and reuse) and managing a repository of assets, although selecting the right repository tool for your organization can minimize these costs. These costs are accounted for as REP for each new asset.
Inserting these terms into the ad hoc reuse equation and taking the percentage of savings, we get (after simplifying):
% Savings = [SR - (TLR+U)*(N/i)/BUILD -
As before, (TLR+U)*(N/i)/BUILD is approximately 0 and can be ignored. Interestingly, the term (MOD+FACT)/BUILD in the evolutionary reuse scenario continues to vary between 0.08 and 0.85 and, as an average, actually is smaller than MOD/BUILD in an ad hoc scenario. By way of example, in an ad hoc reuse scenario, if two developers reuse the same artifact on separate occasions, their efforts will likely be duplicated because the improvements made by the first developer reusing the artifact will likely not be available to the second developer (unless they know of each other's work). If one spends 20 hours modifying the artifact for reuse, the other will also likely spend a similar amount of time, resulting in a combined MOD of 40 hours.
In an evolutionary reuse scenario, the first developer will likely spend a few more hours modifying and refactoring the artifact to make sure that its interfaces are clean and easily consumable. Because the first developer publishes this asset after he is done, the second developer will reuse the improved asset, thus requiring only a fraction of the time to understand, modify, and refactor it (eight hours, for instance). So if the first developer spent 22 hours modifying and refactoring the artifacts, the total of MOD+FACT over the two reuse instances under the evolutionary reuse scenario will be only 30 hours. With over hundreds of reuse instances, it is easy to see how the average of MOD+FACT will continue to trend lower as the repository of software assets grows and matures. At the limit, when an asset in the repository is black boxed, (MOD+FACT)/BUILD will equal 0.08 because it will no longer be necessary to refactor the asset (FACT=0).
The term REP/BUILD in the equation relates primarily to the time required to publish assets as they are located. This time will vary depending on the workflow process used to publish assets and on the amount of metadata that the organization determines is necessary to accurately describe the asset. In general, this time is very small and its costs are more than offset by the reduction in the time others spend trying to understand what an artifact does when it is located.
By following an evolutionary reuse practice, the company very quickly has at its disposal a rich asset repository filled with reusable company content that:
- Is exclusively focused on its particular domain of operation.
- Has been tested and approved for use within the company.
As a result, developers looking to reuse will quickly be able to determine whether useful reusable artifacts exist and will also be able to locate more content, with greater precision, thus increasing the search hit rate. While we will use an increased search hit rate of 40 percent in the aforementioned equation, it should be noted that the search hit rate will continue to increase as the repository grows and more content becomes available for reuse.
We will use 0.5 for an average (MOD+ FACT)/BUILD value, which is high since the most popular assets will be reused multiple times, resulting in many cases of black box reuse (0.08) and driving down the average. Plugging in the stated numbers, we find that the evolutionary reuse scenario generates very respectable savings of 20 percent. This will amount to a dollar savings of $1 million using salaries of $5 million, as above. Interestingly, this value can be extracted without a material initial investment in time and effort to get started.
Systematic Software Reuse
When people refer to software reuse without qualifying further, they are typically speaking about traditional "systematic software reuse." Systematic software reuse is a highly structured practice that involves architects and developers identifying potentially reusable components in a project or family of projects in advance of their development.
Systematic software reuse efforts include "standards police," review committees, and/or special "tools teams" responsible for specifically developing reusable assets. Because it is believed that future modifications can be foreseen, developers practicing Systematic software reuse build in abstractions to cover any number of possible mutations and implement "hooks" for future iterations.
The end goal of all of this up-front effort is to reduce the time required to integrate the reusable component into a new project by enabling black-box software reuse to the largest extent possible (for instance, MOD=0.08). However, over-abstracting components ahead of time can make code harder for others to read and understand and is an inadvertent problem associated with this practice.
While the leverage associated with systematic software reuse is very large because each additional instance of reuse provides enormous benefits, the added up-front costs dramatically increase the risks associated with its implementation.
To properly measure the impact that systematic software reuse can have on a development environment, we begin with the ad hoc reuse approach and add all additional tasks and their resulting benefits into the equation. Of particular note:
- Because reusable components are "built" to be reusable, there are costs associated with building these components over and above what it would otherwise cost to build them for a given set of software requirements. Industry accepted figures are that it typically costs anywhere between 50 percent and 150 percent extra to build a component for reuse, versus building it for a single use. We'll use RCOM to identify this extra effort in our equations (to be shown shortly). In the case of evolutionary reuse, this extra effort to make an asset reusable is only done at the time of consumption by the person who is looking to reuse the component, and this effort is captured in the term (MOD+FACT).
- The cost of reusing a component built for reuse will be much lower than in other types of reuse with MOD/BUILD ranging between 0.08 and 0.2.
- Because systematic reuse components are built for reuse, there will typically only be a small number of them available for reuse. Also, the availability of these components should be fairly easy to communicate within the organization meaning that the Search hit rate will be much higher in a Systematic reuse effort, although the actual number of reuse instances i will be dramatically lower, especially in the early years.
For a systematic software reuse effort to be profitable, therefore, the following equation representing a systematic software reuse initiative must hold true:
(TLR+U)*N + i*SR*MOD + i*(1-SR)*
BUILD + j*REP+ j*(1+RCOM)*
where j = number of reusable software components that have been built, and RCOM = extra time required to build a reusable software component versus building one with equivalent functionality but that is not designed to be reusable.
Taking the percentage difference between the no reuse and Systematic software reuse scenarios, we can arrive at the % Savings generated by systematic software reuse in the enterprise. After simplifying, this equation looks like:
% Savings = [SR - (TLR+U)*(N/i)/BUILD -
SR*MOD/BUILD - (j*REP)/(i*BUILD) -
For demonstration purposes, and to simplify this equation, assume that the search hit rate SR approaches 1 and that RCOM is 50 percent, the low end of its industry accepted value. As well, we'll use a favorable MOD/BUILD value of 0.08 and will assume that (TLR+U)*(N/i)/BUILD approximates 0, as was the case in each of the aforementioned scenarios. Finally, we'll assume that the expression (j*REP)/(i*BUILD) is also equal to zero, which should be the case unless j (the number of reusable components that have been built and inserted into the catalog) is orders of magnitude greater than i (the number of reused elements), which should be the case in all but the most disastrous scenarios.
Plugging in the favorable values just listed for each expression and reducing the equation, we get:
% Savings = [0.92 - 1.5*j/i]*100
What we can interpret from this equation is that the extra 50 percent spent to build each reusable component adds up very quickly and needs to be amortized over multiple reuse iterations for systematic software reuse to generate positive savings. In fact, if each item built is not used on an average of 1.63 projects (i/j 1.63, for instance), then the reuse effort will fail to generate a positive return.
Overall, systematic software reuse has the potential to generate very large savings (theoretically as high as 92 percent if one magical component were built that could be reused everywhere, which of course, is not really possible). On the negative side, systematic software reuse is highly sensitive to the ratio of j/i, meaning that participants in the initiative need to be highly skilled at predicting which reusable components need to get built to amortize them over the largest number of reuse instances. Failing to accurately pick the right components to build or mismanaging the Systematic software reuse initiative have the potential to very quickly generate costly negative results.
Using the aforementioned methods for calculating the costs and benefits of each of the three reuse implementation methods covered and deriving an ROI from each, we arrive at the ROI graph in Figure 1.
Again, systematic software reuse has the potential to be highly negative if the assets that are built are not quickly reused on multiple projects. Systematic reuse does, however, have the highest slope in the early days, meaning that it can provide a very quick ROI if properly implemented.
Evolutionary reuse starts off with low incremental benefits to the organization but quickly begins to generate increasing value as content is refactored and made available to end users. It provides a nice compromise for companies looking to enhance the benefits they are currently getting from their ad hoc reuse efforts but who are unwilling or unable to invest the time required to set up and manage a structured systematic reuse effort.
Finally, ad hoc reuse currently generates modest benefits to an organization and it will continue to do so, although these benefits grow slowly and are far from being optimized.
Measuring productivity and changes in productivity are important when implementing any new software tool or initiative. To that end, the overall techniques just used to determine the costs and benefits related to different reuse practices can also be applied to measure savings associated with other initiatives. It is only in comparing these different returns using standard methods and units of measure that you will be able to make informed decisions and set quantifiable milestones for your company.
As a starting point, additional work needs to be done by most companies to gain a better understanding of where development efforts are currently being focused, which tasks are the most costly, which are being duplicated, and can be altered to generate the highest incremental returns.
The returns just quantified relate directly to the savings that organizations can hope to gain through developer productivity enhancements. These savings are the minimum benefits realizable since they exclude all other costs (such as overhead) and tertiary benefits such as increased IT agility, reduced defects and maintenance costs, and the ability to deliver new products and services at an accelerated rate to establish or maintain key strategic competitive advantages. As we have seen, depending on the path chosen, establishing this advantage through reuse does not necessarily require a huge up-front investment in time and human resources.