Channels ▼
RSS

C/C++

Object-Oriented Software Configuration Management

Source Code Accompanies This Article. Download It Now.


OCT91: OBJECT-ORIENTED SOFTWARE CONFIGURATION MANAGEMENT

Richard is president of Software Maintenance and Development Systems Inc., makers of Aide-de-Camp software. He can be reached at P.O. Box 555, concord, MA 01742


One of the most frustrating and least understood aspects of software development is keeping track of software changes over time. Solving this problem is the goal of Software Configuration Management (SCM). While in recent years much of the design and coding of software has been automated, managing the output of that process is still largely a manual task. This situation has begun to change.

All developers manage their software at some level, if only to sift through the lists of source files on their directories. This approach to SCM is much like flipping through the cards in a Rolodex, except that it's done on a computer. The programmer relies on his or her own memory, plus perhaps a naming convention, to know which files belong to which software versions. While workable on small projects, manual methods can often leave the developer with too many unanswered questions.

For example: Which files, out of possibly hundreds, implement a new feature in the software? What other files are affected by a given change and must also be revised? Which lines of code were deleted, added, or rewritten--and why?

SCM becomes really difficult if several people are working on different parts of a program at once. What happens, for example, if you and I are both working on products which include File A, but I change my copy of the file? If you like the features I have added to File A, how will you know if you can bring my version of the file into your product?

Physical and Logical Differences

Managing change is difficult because a single change may have many consequences--at both a physical and a logical level. The term "logical change" refers to a change in what the program does, while "physical change" means a change in what the program is. Logical changes include a new feature or function. Physical changes include edited source lines, rewritten documentation, and rebuilt executables.

Developers must know how a logical change affects other logical changes. (If I change what the program does here, how will it affect what the program does over there?) They must also know which physical changes result from a single logical change (such as altered source lines, revised documentation, rebuilt executables, and so on). A complete SCM system, in short, must help the developer understand the full impact of a logical change.

Formal SCM models have improved on the Rolodex approach by providing methods to track software changes automatically. Two broad classes of formal models are in use today: difference models and an object-oriented model. Difference models represent the traditional approach to SCM. They rely on the physical differences between successive versions of source files to construct and store software. The object-oriented model, on the other hand, relies more heavily on the logical changes that cause these physical differences.

There are two classes of difference models: the Update Model and the Integrated Differences Model. In the former, the deltas are discrete files that exist apart from the base version, while in the latter, the deltas are embedded within a common file (called a master file") with the base version. Control records are interleaved with deltas in the Integrated Differences Model to indicate to which version a delta belongs. Examples of the Update Model are IBM Update and CDC Update. Examples of the Integrated Differences Model are the Source Code Control System (SCCS) packaged with Unix System V, and Digital Equipment Corporation's CMS program. Currently, the only example of an object-oriented SCM system is my company's Aide-de-Camp.

Limitations of Difference Models

One major limitation of the difference models is that to specify a software release, the developer must identify and name a collection of files and a version number for each file. In the difference models, a file version is conceptually constructed with a single pass algorithm from a base version of the file and a sequence of differences between the base version and successive versions. These differences are called "deltas."

A complete software release is built from the base versions of all the files, and the application of all the deltas implied by the specified file versions. Logically, a software release can be described by the formula

  Release[N] = {V[b], Deltas[n]}

In this formula, Deltas[n] means a set of some number of deltas, with each delta containing lines which are added to or subtracted from files in the base version (known as V[b]).

The next release in the sequence would then be

  Release[N+I] = {V[b], Deltas[n], Deltas[n+1]}

Or in other words, a set of sets of deltas and the base version. To distinguish between the two layers of sets, we will use the term configuration to refer to the base version plus all the sets of deltas that comprise a release.

It would be a genuine benefit if the SCM system could use knowledge of existing releases to define new releases, as in

  Release[N+I] = {Release[N], Deltas[n+1]}

Difference models do not support this level of SCM automation, because deltas have no names, and releases are only collections of versions of files.

The lack of an independent identity for deltas presents three handicaps to the developer. First, the developer cannot use the model to define other releases that may not have been anticipated when the named releases were created. Say, for example, that we wanted features that could be implemented by a new configuration of deltas, and suppose that the current configurations are defined as

  Deltas[n] = {D1, D2, D3} and   Deltas[n+1] = {D4, D5, D6}

We might want to say that the desired, yet unnamed, release would consist of a new configuration of deltas, such as

  Release[N+1] = {D2, D3, D6}

Unfortunately, the difference models offer no intelligence with which to directly select a new set of deltas from existing configurations.

A second handicap is that future releases must always carry with them the deltas from current and past releases, even if those deltas are no longer active in the program. In other words, the source file for Release[N+3] must include the deltas that defined Release[N+2], while Release[N+2] must include the deltas that defined Release[N+1]. Each release depends on the presence of all prior releases in order to register as a valid release with the SCM model. If a previous link in the chain is broken (a delta left out), later releases may no longer be valid. If, for example, somewhere down the road we wanted to remove the features implemented by Deltas[n], we would do so by writing a Deltas[n+2] that removes the logical effects of D1, D2, and D3 in the program, but not the actual code. Therefore, in the difference models,

  Release[N+2] = {V[b], Deltas[n], Deltas[n+1],                                    Deltas[n+2]}

Contrast this with the simpler and more logical

  Release[N+2] = {V[b], Deltas[n+1]}

This simpler expression is not allowed in the difference models because if we physically remove Deltas[N], we can no longer go back and build Release[N] or subsequent releases which incorporate it. Optimally, we would like to keep Del tas[N] available but be able to include or exclude it from the software as needed.

To some extent, this need in the difference models to carry prior changes into the future, whether they are needed or not, defeats the purpose of SCM. The developer ought to be able to decide whether or not to retain old code in a program. Moreover, the developer should not have to expend resources storing, compiling, managing, and dealing with code that is no longer relevant to the current release.

Linearity Constraints

The third and final handicap of not being able to deal with deltas independently from named releases is that software change can only be managed along a linear path. This linearity constraint flies in the face of the real-world need to pursue development along multiple paths. Most software development efforts consist of teams which work on different parts or versions of software at the same time. One team, for example, may work on a Unix version of a product, while others work on DOS and OS/2 versions.

When working under linearity constraints, however, an SCM system can support only one path at a time. An example would be when software is developed along a time line in which each release is defined by adding its own deltas to previous ones.

But in situations where multiple releases of the same product undergo parallel development, not all releases share all prior deltas. In such a situation, the deltas that comprise one release may not include deltas along a separate branch or path of development. As far as the SCM system is concerned, the deltas along one branch have no meaning to the deltas along a different branch. This means that the SCM cannot build a release along one path that includes deltas from a separate path.

This may not cause a problem as long as the development paths remain separate. Development is simply managed independently as two distinct chains which happen to share a base version and some early deltas. A problem will occur when the developer needs to migrate a change from a release on one development path to another path. At that point, what the developer may have is perhaps hundreds of isolated deltas with no way to logically tie them together. The following example illustrates this.

Change Along Two Paths

Here is an example of migrating an intermediate change from one development path to another development path. The baseline release, shown in Example 1, is a toy program which computes the sum of its arguments. Examples 2, 3, and 4 show changes made to the baseline version. I've marked the lines that were changed with arrows.

Example 1: The base version of a simple example program

  void main(int argc,char *argv[])
  {
         int sum,i;

         for (i=0;i<argc;i++)
         {
             sum += atoi(argv[i]);
         }
         printf("Sum = %d\n",sum);
  }

Example 2: Change A1 fixes a bug in initialization of a variable.

  void  main(int argc,char *argv[])
  {
           -> int  sum = 0;
           -> int  i;

              for (i=0;i<argc;i++)
              {
                  sum += atoi(argv[i]);
              }
              printf("Sum = %d\n",sum);
  }

Example 3: Change A2 is some compiler-dependent speed optimization.

  void main(int argc,char *argv[])
  {
          -> register int  sum = 0;
          -> register int  i;

          -> for (i=argc; --i >=0; )
             {
                 sum += atoi(argv[i]);
             }
             printf("Sum = %d\n",sum);
  }

Example 4: Change B1 modifies the function of the program to sum squares.

  void main(int argc,char *argv[])
  {
             int  sum,i;
           ->int  k;

             for (i=0;i<argc;i++)
             {
               ->k    = atoi(argv[i]);
               ->sum += k*k;
             }
             printf("Sum = %d\n",sum);
  }

There are two modification paths for this program: path A and path B. The modifications along path A consist of bug fixes and performance enhancements. Change A1 (shown in Example 2) fixes a problem with an uninitialized variable. Change A2 (shown in Example 3) tries some compiler-dependent code optimization for speed. The modification in path B (change B1, shown in Example 4) alters the program to compute the sum of the squares of its arguments.

Now let's suppose that we want to migrate the first change made in the path A (that is, the code in Example 2) over to path B. Example 5 shows the file after migration. This result cannot be achieved with difference-oriented models. An object-oriented SCM, however, can represent these changes in a way that migration is possible.

Example 5: Merging path B with the first part of path A

  main (int argc, char *argv[])
  {
            -> int  sum = 0;
            -> int  i;
               int  k;

               for (i=0; i<argc; i++)
               {
                   k    = atoi (argv[i]);
                   sum += k*k;
               }
               printf ("Sum = %d\n", sum);

  }

In traditional SCMs, the disk file is the primary data type that gets managed. The basic properties of the disk file are its name, size, location, and content. By contrast, an object-oriented SCM uses a range of data types with properties such as program functionality, hardware dependencies, and foreign languages supported. The advantage here is that object properties have meaning outside the physical definitions of the software.

File-oriented methods can only say where the physical chunks of a program reside. An object-oriented system can also say where the functions of a program reside and which of those functions satisfy specified criteria -- such as which changes go with which hardware platform or program feature.

Going back to our example, an object-oriented SCM can migrate a change from one development path to another because each change is made in a separate, distinct step. Each change is known to the system as an individual object. Furthermore, changes can be migrated without worrying about changing line numbers because each line in the file is a distinct object with a permanent identity. This allows us to formulate the change in terms of logical rather than physical constructs.

Change Sets

In an object-oriented SCM, all source lines are uniquely named by the system as they are entered. Lines in the database of program code are either original or associated with a change set. A change set can either add lines, delete lines, or both. Although a line is listed as being deleted by a change set, the line is never physically removed from the database; it is simply excluded from any software versions which include that change set. To build a version, the developer selects the change sets to be included. Change sets can be selected according to any criteria established by the developer, regardless of whether or not the changes were named in a previously defined configuration.

A change set has both properties and attributes. Properties are the contents of the change and can include its name, a text abstract, the author, associated source code, object code, and binary images. An advantage of organizing information as objects, rather than as source files, is that no inherent restriction exists regarding the type of information represented. Differences in machine-readable files can be handled as readily as differences in text files.

Attributes are tags that label the change set. They can be both system-and user-supplied and are used to group changes according to meaningful criteria, such as all the changes that implement a particular accounting rule. At tributes allow developers to deal with a change in terms of meaning, rather than as a chain of deltas. User-supplied attributes can include operating system names, hardware platform names, and customer names. System-supplied attributes include release names and dependencies. Dependency attributes, for example, identify which software modules call, and are called by, other soft ware modules.

The coexistence of properties and attributes in a single object is a powerful combination. Changes now have an identity independent of any releases in which they participate. When defining a release, developers are no longer locked into selecting predetermined change configurations. Later releases no longer include all changes from early releases.

Changes to a program's code can be represented logically in terms of software features. They can be selected or deselected at will and grouped according to whatever criteria are meaningful to the developer.

Developers are now free to migrate changes between parallel development paths rather than proceeding down a linear chain of differences, in which a break in the chain results in files being set adrift by the SCM system. If so desired, a developer can still maintain a linear path simply by selecting all the change sets that belong to the previous version. But in addition, a software release can simply be a base version plus any selected changes. A specific feature can be migrated from one development path to another by selecting only those change sets tagged as implementing the particular feature. The developer can even define an entirely new release as a new selection of change sets.

Finally, the whole notion of SCM can be expanded to include whatever properties might be included within an object -- not just source lines, but machine readable code, documentation, and even front-end CASE design data.



_OBJECT-ORIENTED SOFTWARE CONFIGURATION MANAGEMENT_
by Richard Harter


Example 1: The base version of a simple example program.

void  main(int argc,char *argv[])
{
        int sum,i;

        for (i=0;i<argc;i++)
        {
            sum += atoi(argv[i]);
        }
        printf("Sum = %d\n",sum);
}



Example 2:  Change A1 fixes a bug in initialization of a variable.

void  main(int argc,char *argv[])
{
---->       int   sum = 0;
---->       int   i;


            for (i=0;i<argc;i++)
            {
                sum += atoi(argv[i]);
            }
            printf("Sum = %d\n",sum);
}


Example 3. Change A2 is some compiler-dependent speed optimization.

void  main(int argc,char *argv[])
{
---->       register int   sum = 0;
---->       register int   i;

---->       for (i=argc; --i >=0; )
            {
                sum += atoi(argv[i]);
            }
            printf("Sum = %d\n",sum);
}



Example 4.  Change B1 modifies the function of the program to sum squares.

void  main(int argc,char *argv[])
{
            int   sum,i;
---->       int   k;

            for (i=0;i<argc;i++)
            {
---->           k    = atoi(argv[i]);
---->           sum += k*k;
            }
            printf("Sum = %d\n",sum);
}


Example 5.  Merge of path B with the first part of path A.

main(int argc, char *argv[])
{
---->       int   sum = 0;
---->       int   i;
            int   k;

            for (i=0; i<argc; i++)
            {
                k    = atoi(argv[i]);
                sum += k*k;
            }
            printf("Sum = %d\n",sum);

}


Copyright © 1991, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video