Structure Charts

By Joe Celko, June 01, 1990

June 1990/Structure Charts

In the old days, when programmers lived in trees and ate their young, we were supposed to design programs with flowcharts. Today, we no longer use flowcharts; in fact the ANSI standard for flowchart symbols expired several years ago and was not renewed.

Why did flowcharts fall out of favor as a design tool and what replaced them? The first obvious problem with a flowchart is that it became too physical, too soon in the design process. The symbols were largely for physical devices, not logical functions, so the design went to implementation immediately.

Secondly, the relationships between flowchart elements were always ambiguously defined. Do connecting arrows between symbols send data or control or both? You could guess in many cases by reading the block captions, but more often than not, you really had no idea.

Thirdly, flowcharts gave only a minimal hint about packaging the modules. By a module, I mean a unit of code which can be referenced by a name. By packaging, I mean how a module is implemented in the system. For example, a sorting module could be packaged as a program, a procedure, an included text file, a task, or a COBOL paragraph.

A fourth problem was that no two programmers ever used flowcharts the same way. One programmer would give such detail that his final code was in the blocks on a statement by statement basis. At the other extreme there were programmers who would use such high level concepts in their blocks that the flowchart was useless (imagine a single block with the phrase "do payroll" in it). This meant that flowcharts neither provided support for top down design nor for any sort of hierarchical leveling of the problem.

Yet another problem was that programmers rarely used flowcharts as they were intended. Flowcharts were a necessary evil, drawn after the program was written and not before. Even Daniel McCracken admitted that he did the flowcharts in his Fortran, Algol and Cobol books after the fact. The industry finally collectively admitted this and began to sell flowcharting programs which would produce an overly detailed flowchart from the finished source code.

Finally, flowcharts were virtually impossible to update on paper. This not true any longer, since we have cheap graphics packages on microcomputers, but it was a real hassle which people avoided by never updating them during the maintenance phase. That was not as big a problem as you might think; research showed that maintenance programmers got no help from them anyway.

In the 1970's Larry Constantine introduced structure charts as replacement for flowcharts. The structure chart method sets up an abstract model, which gives the programmer manageable levels of complexity, and the ability to decide on packaging as part of the design. Today, most CASE packages support a version of structure charts.

The basic structure chart symbols (see Figure 1) are evolved from the old flowcharts, so that they have the advantage of least surprise to the user. The most basic symbol is a simple rectangle for a module. (Modifications to the module symbol will be discussed later.) Every structure chart starts with a single, topmost module which is the entry point to the process being described. Subordinate modules fan out from this module, creating a diagram which resembles an organizational chart. At the very bottom of the diagram, the arrows will point to a relatively small number of common library modules which are used by many of the higher level modules. The overall shape will tend to be a sort of "onion dome" which may be tall or flat depending on the nature of the problem.

Notice the word "invocation", rather than "procedure call"; invoking a module simply means causing it to execute. You can invoke a mocule by calling it, running it, or dropping into it in normal program flow.

Invocation between modules is shown by downward pointing arrows and only one module is assumed to be active at a time. The number of arrows leaving a module is its "fan out" and the number of arrows entering is its "fan in". No order of execution is implied by the order of the modules at each level. No internal logic is shown for the modules either. All those considerations are implementation and not design.

As in flowcharts the invocation lines in a structure chart are pipes for the flow of both data and control, but unlike a flowchart, the things flowing are explicitly shown beside the invocation line. Data is represented as an arrow with an open circle on one end. Control is represented as an arrow with a solid circle on one end. As a mnemonic, I think of the data arrow as a spoon with a load of data in it. Some older texts will use simple arrows without circles for both purposes.

Another symbol which you might see is hybrid data, which has a circled dot on the end of the arrow. Hybrid data acts as both data and control. In early versions of Fortran, there was no way of recognizing the end of a card file. So, we would put a final card in the deck with all nines in one field. The READ statement was then followed by an IF statement which would jump to an exiting module when the nines were read. That field was hybrid data. Using hybrid data is now considered such bad programming practice that you might never have seen it, especially if you are younger than 25.

Module Symbols

A library module is shown by a rectangle with parallel lines on the sides, another carryover from the old flowcharts. A library module might be a subroutine library program, an include file, or a program copied from a magazine — you don't have to write it or understand it's insides.

A small diamond on the lower edge of the invoking module (also borrowed from flowcharting) indicates that the invoking module has the option of making a selection from several subordinate modules.

Iteration, or loops, are shown with a horizontal loop on the bottom edge. Some programmers attach a numeral to the loop to show how many times the module executes. Recursion is shown with an invocation arrow which loops back into the top of the module from which it came. The mistake most often made in a recursive module is not showing the parameters on the recursive invocation.

Co-routines are shown by a set of invocation arrows, each going toward the other routine and making a loop back to the start. This notation is not often used, because co-routines are rare outside of operating systems or other machine level programs.

Global Data

The symbol for a global data structure is a circle drawn on the edge of the module with data and control arrows going in and out of it as needed. These data areas can be labeled, if it is appropriate. For example, Fortran COMMON areas and a call to the system time clock would be shown this way.

In the early days of structured programming, we were told to avoid global data structures because they were dangerous. The data could be changed by any module without the knowledge and consent of any other module. Larry Constantine now reports that research shows that parameters should be used for one level of ordination, that the programmer has a choice at two levels, and that if data must be passed over three or more levels then global structures are probably easier to maintain.

A module which is all data, such as a table, can be shown as a rectangle with outward bulging sides. I like to use the old flowchart device symbols (paper output, disk files, etc.) if I know how the data is stored. Yes, I know this is getting into implementation in the design phase, but old habits die hard and those old symbols are easy to read.

Once you become familiar with the graphic symbols, drawing the diagrams is really pretty easy; you can get stock templates and graph paper from any office supply house. Most CASE packages also support most of the symbols discussed here.

Structure Chart Vocabulary

The scope of control of a module are all of the modules which it invoked beneath it. The scope of effect of a module are all of the modules to which it passes control decisions. Scope of effect is not always a subset of scope of control, but we would prefer that it were.

Flow that goes uphill is called efferent and flow that goes downhill is called afferent — confusing at best. Look up the difference between "affect" and "effect" in a grammar book. Transform flow sends a flow from a boss to a subordinate and back again; we can assume that the subordinate module changed it in some way. Coordinating flow takes a flow from one subordinate and passes through the boss to another subordinate.

Design Metrics

Coupling is a measure of how modules are stuck together. The good news is that everyone agrees that modules should be loosely coupled. The bad news is that no one agrees upon the scale to use for coupling.

Gane and Sarson use only three kinds of coupling; data, control and a catch-all external/content/pathological coupling. Data coupling directly passes data between the modules, and control coupling directly passes control. The final catch-all class uses external data structures or strange invocations. Data coupling was best, afferent control coupling next best, efferent control coupling a poor third and the reminding ones so bad as to never be used.

Glenford Myers uses a different scale in his work and has attempted to assign numeric values to different types of coupling. These numbers let him estimate the probability that a change in one module would require a change in other modules of the system. In order of preference, the Myers scale is:

Identity. Two modules are really the same module. This is handy for some calculations, and you cannot get a tighter coupling.

Content. One module contains the code for the other.

Common. The modules share a common data area, which other modules can also access. Example: FORTRAN COMMON.

External. The modules share an external data item, which only they can access. Example: C externals.

Control. The first module directly controls execution of the second. Look for a control flag being passed.

Stamp. First module passes a data structure to the second module. Example: a user typed parameter in Pascal.

Data. One module passes a data item to the second. A simple scalar parameter in almost any programming language.

Yourdon and Constantine suggest this taxonomy:

Common environment coupling — all modules share a global data structure.

Content coupling, which comes in two flavors:

Lexical inclusion — the code of one module is completely inside the other.
Partial content overlap — both modules share an area of common code.

Data coupling — also called input-output coupling, equivalent to Gane and Sarson's data coupling.

Control coupling — the same as Gane and Sarson's control coupling.

Hybrid coupling — one module modifies the statements of the other.

A second important design metric attempts to measure how well a module is "put together". Glenford Myers called this quality module strength and Larry Constantine calls it cohesion.

In the early days of structured programming we tried to keep module size small. The magic number was either 50 or 100 lines of code, corresponding to a single or double page printout. The idea was that small modules were easier to maintain and could be reused more easily.

However, when researchers finally got around to looking at what modules were reused or modified from one system to another, they found that module size has little to do with maintainability and almost nothing to do with reusability. The cohesion or strength of the module is far more important.

The Yourdon and Constantine scale of cohesion, from worst to best:

Coincidental — Module does several unrelated jobs.

Logical — Module does several logically related jobs.

Temporal — Module does jobs executed at the same time.

Procedural — Module does jobs are part of the same procedure.

Communicational — Module does jobs which share data.

Functional — Module does one and only one clearly defined job, like a mathematical function.

Glenford Myers' scale of strength is almost identical, except that he assigned numeric values to the scale for use in the estimating algorithm mentioned earlier. His scale calls temporal strength "Classical", and places Informational strength between Communicational and Functional strength. A module with Informational strength has many entry points to functions which operate on a common data structure within the module — an Ada package, for example.

Design Overview

The overall design of a system can be broadly classified as either transform or transaction centered. In a transform centered design, there is a subsystem which concerns itself with editing input, a subsystem which concerns itself with formatting output, and a subsystem which transforms the input into the output. Look for a transform flow in the structure chart.

In a transaction centered design, there is a subsystem which accepts a range of inputs, decides what to do with each type and dispatches it to the proper sub-subsystem. Look for a diamond at a high level in the structure chart.

As a sample problem, consider the puzzle contest in the 1989 February-March issue of GAMES magazine. The puzzle came in two parts. First, you were to construct a four by four crossword puzzle grid with six black cells. The grid must be symmetric about the center point and must have at least one black cell in every row and column.

Then you were to use the digits zero to nine and fill in the grid. The numbers formed in the rows and columns were totaled right to left, left to right, top to bottom, and bottom to top. Finally, the sum of the two horizontal numbers was divided by the sum of the vertical numbers. The score closest to ten wins the contest. The sample grid in Figure 2 shows the calculations for one possible arrangement.

The reasoning that produced my first attempt (see Figure 3) was that we have a transform centered design, which will build empty grids, fill them with digits, score them and save the results. So we need a module to build empty grids, one to test it, one to fill it, one to score it and one to save it. Since this is my first sketch and will probably be significantly revised, I have not bothered to show any flows or draw any lower levels. This is just to get something on paper. Note that filling the grid once it is built will take a lot of work — there are 10! possible ways to fill each possible grid.

The immediate problem with this first design is that the main module is too busy; it needs to delegate authority to subordinates. My second attempt (see Figure 4) has only three subordinates, (one to get an usable empty grid, one to fill it, and one to save it). In this revision, I also begin to worry about some details, like tests for a valid grid.

The second attempt shows some important design principles. Notice that iterations have been moved to lower levels, rather than nesting them in the main module. The "Get Grid" module uses a coordinate flow to pass a candidate grid to a testing module before it passes it to a super-ordinate module. You will often find it handy to build ladders of such modules to convert raw input into usable data in a step by step fashion.

The final attempt (see Figure 4) is based on some observations about the problem itself. The row and column test are really the same except for a rotation; they can probably use a common subroutine for counting spaces. If one of these tests fails then we do not have to bother with the other tests, so I show a diamond on the structure chart. The symmetry test can be removed completely if we change the grid generating module so that it always blacks out the cells opposite the center.

Inserting the digits will involve generating permutations of the ten digits, so we ought to get a library routine for that. Once a grid is filed, we can rotate it about the major and minor diagonals to give us new grids which do not have to be re-calculated from scratch.

We do not want to save all the grids, just the best one found so far, so we need to make another test.

With these refinements, we get the chart in Figure 4. At this point, the code can be written as soon as we design the interfaces between the modules. The interface design will answer questions such as, "Should we use just one work space or pass grids as a user defined data structure between modules?" The answer will depend on the language used and just how fast we want to make the program run. With these issues resolved, the programmers can be given easy-to-understand descriptions of the modules.

A Best Solution

GAMES magazine ran the best solution to the puzzle in their 1989 June/July issue.

   *837
   25**
   **40
   619*
left to right total = 1,521 down total = 153
right to left total = 1,710 up total = 171
Horizontal total = 3,231 Vertical total =324
Score = 9 and 35/36 or 9.972222...

How well did your program do?

Dr. Alan Davis of Vienna, VA, found some methods to stop the scoring when the answer will clearly be outside of consideration, further improving performance. He also found that one particular grid produced the best results and was able to eliminate others. His program took a little over seven hours to run on a PC.

References:

Constantine, Larry. "Beyond Methodologies," presented at "Software Development '89" Miller-Freeman Seminars, San Francisco, Feb 1989.

DeMarco, Tom. Structured Analysis And Systems Specification, Yourdon Press (ISBN 0-917072-07-3) 1978.

Jane, Chris and Sarson, Trish. Structured Systems Analysis, Prentice-Hall (ISBN 0-13-8544547-2), 1979.

Higgins, Gail. "Structure Charts", Computer Languages, Vol. 4, No. 1, Jan 1987.

Myers, G. J. Reliable Software Through Composite Design, Van Nostrand-Reinhold, 1971

Yourdon, Ed and Constantine, Larry. Structured Design, Prentice-Hall (ISBN 0-13-854471-9), 1979.

Gordon, Peter, "Imperfect Ten," Games Magazine, February- March, 1989, page 10.

Figure 5

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.