Dr. Dobb's | Letters | October 01, 1989

Letters

October 01, 1989
URL:http://www.drdobbs.com/architecture-and-design/letters/184408228

OCT89: LETTERS

When Is a Standard Really a Standard?

Dear DDJ,

We write C using the medium model, which provides lots of room for code and 64K bytes for data. A lot of the data space is used up by ASCII strings, messages, keyword lists, and so on.

Most of these strings are used only in one procedure, so it would be nice to have the space they take up not be in the precious data space, where they are as bad as globals.

Happily, the "draft" ANSI standard now (and has for more than two years) specifies that arrays of class 'auto' can be initialized. So, in standard C code, you can say

     void print_error message(int error-number){
         byte message1[] = "This is a long error message";
         byte message2[] = "Here is another message";
             /*...*/
         byte message999[] = "And so on";
         ...

Any reasonable implementation of a compiler would initialize the messages by doing a move out of the code space into variables with space allocated on the stack, and permanent data space would not be squandered on all the messages.

Unfortunately, of the 12 C compilers I have tried out, only one, the Corporate Computer Systems compiler for the Hewlett Packard HP-3000, supports this feature.

I have talked to Microsoft and someone there finally admitted that yes, it is in the current draft ANSI standard but as the standard has not been finalized yet they are not necessarily implementing all the aspects of the C language.

Now I see that Microsoft is claiming that their compiler is compatible with the draft standard.

Joe Weisman

Computers for Marketing Corp.

San Francisco, Calif.

Infant Damnation Isn't Our Cup of Tea

Dear DDJ,

The more I see your publication, the more optimistic I become that I may eventually learn something about programming. While they have me scratching my head in bewilderment at times, the articles you publish avoid both the superficiality of some publications and the rancidly pedantic turgidity of others that too often approach the pudding-like quality of a 19th century Welsh theologian expounding on predestination or infant damnation. When I don't understand what you publish, I feel that I simply haven't yet established a sufficient groundwork in my own mind.

In the special C issue, I found the discussion of Smalltalk interesting, I learned that the graphics format of PC Paintbrush is in the mainstream, which is important to me because I'm about to buy a digitizer which comes with it, and "Building Your Own C Interpreter" will be closely studied. I'm looking forward to the upcoming treatment of using assembly language to create one's own minilanguage. My areas of interest do not require a large number of functions or procedures, but the ones they do require are not provided adequately in the languages I have examined.

Billy R. Pogue

Lake Havasu City, Ariz.

GUI Programming Guidelines

Dear DDJ,

Yesterday I got my copy of the July issue and the first article I read was the one on the Presentation Manager. Since I do some programming under Digital Research's GEM, I am very interested in how things are done under various graphic user interfaces. But what I have trouble with is the misuse of the right mouse button. In MFIT, it is used to terminate input and to start an action. In some drawing programs, it is used as a replacement for the shift-key. And some other systems, like the Smalltalk environment, use it as the button that pops up a menu.

This is not easy to understand for the average user of these applications. Just think if every automobile firm would place the brake on a different pedal or even on a switch behind the steering wheel! To avoid such misuses I would strongly recommend that every developer of applications that run under graphic user interfaces read the Human Interface Guidelines published by Apple. This book gives a philosophy that puts the user in the middle of the development process of the external representation of an application. The only one who can win by such a view of the end user is the end user.

The second comment I have to make is on the programming of graphic interfaces in general. I do not understand why a relatively small program like MFIT requires about ten files to compile. The actual code for the processing of the points and the drawing of the workspace has only about half a page, the rest just implements the STANDARD behaviour of dialog boxes, menus, and other objects on the screen. This overhead makes development of user-friendly applications this complicated and is the cause for the slow learning curve and the long development time required for even small programs like MFIT.

It is this overhead that is responsible for the minor acceptance of graphic user interfaces among software developers. I know this because it took me over two months (about 100 hours, you see, I am a hobbyist) to create an invoice printing program with an easy-to-use graphic interface with GEM. The actual code is only about 2K bytes, but the overall size of the source file is about 70K bytes. After this experience, two friends of mine and I decided to develop our own language and development environment. But soon we noticed that this is too big a project for three hobby programmers. Now we have a lot of ideas for common interface in which applications can be used on Unix, DOS, or the Mac without changing the compiled program. In such an environment the user could run the same application under several windowing systems. The system would care about the differences in the user interface and OS. So if there are other people interested in our thoughts, just drop a line to DDJ, who will surely pass your letter on to us.

Andreas Berger

Neu-Isenburg, W Germany

It's All in the Numbers

Dear DDJ,

Your magazine continues to provide enjoyable articles and accompanying code. A comment on Al Stevens's August 1989 column in which he restates Robert Benchley "the world is divided into two kinds of people -- those who divide the world into two kinds of people and those who do not." As a compuphobia counselor for ten years, I have learned that the world is divided into three kinds of people -- those who count and those who can't.

mickRacky

Oakland, Calif.

DDJ: 12 out of a dozen times, you're right.

AWK-Like Extensions Revisited

Dear DDJ,

I much appreciated Jim Mischel's article "Writing AWK-Like Extensions to C" in the June 1989 issue of Dr. Dobb's Journal. I have used the pattern matching routines to construct an alternative grep that shows matching lines within a window of lines before and after the matching one. I had a shell version of this tool from the March 1989 issue of UNIXWorld, but my C version obviously goes much quicker.

However, I have found one problem, the functions will not find a match for the empty line pattern ^$. This occurs because the gets( ) library function returns just an empty string, the \0 character. In this circumstance the re_match( ) function makes no attempt to find a match, as the body of the while loop is never entered, and thus returns NULL. I have cured this by changing the loop to be:

    _s_end = NULL;
    do
   {
          if( match_term(c - s, c, pat)!= FALSE)
          {
              RSTART = c - s;
              RLENGTH = _s_end - c;
              return(c);
          }
          if(*c)
          {
              c++;
          }
    } while(*c!= ENDSTR);

which always does at least one pass through the line to be matched.

John M. Howells

West Lancs, England

Jim responds: Thank you for your letter. You did indeed find a problem with the code and I appreciate your response. I did some testing and found that you can eliminate the awkward "if (*c)" in your code by modifying the while statement: while (*c++ != ENDSTR);

Forth, an Object of Affection

Dear DDJ,

I enjoyed Jeff Duntemann's "Structured Programming" column in the July issue of Dr. Dobb's.

A lot of what he said about object-oriented programming sounded like good factoring in Forth. With good factoring, each word is reusable and you can hide the details of the lower-level words.

Perhaps in a future article you could address OOP in the Forth environment, such as the Forth object-oriented programming extension to HS/Forth.

Ramer W. Streed

Kato Engineering

Mankato, Minn.

Pascal Hints

Dear DDJ,

I have greatly enjoyed reading Jeff Duntemann's articles in DDJ; as my primary language is Pascal, his column is inherently the easiest for me to understand. I would like to make a few points about his June 1989 column: 1. When assembly routines are short -- as in the case of BIOS hooks -- it is better to code them in assembly, as Turbo adds much housekeeping code which swamps the functional code. These calls are by definition machine dependent so there is no reason not to optimize them. 2. Jeff's calendar program goes to too much effort to do its work! As you are restricting yourself to dates after 1980, we don't need to know about Julian and Gregorian calendars; all we do need to know is on which day of the week falls a given date. The function to serve this purpose is called Zeller's congruence, which I learned about from Computer Language, March 1988, pp. 9-10. I hope that you find this information useful.

Norman Newman

Israel

P.S. What is KI6RA?

DDJ: KI6RA is Jeff's ham radio call sign.

More on RLE

Dear DDJ,

I refer to your two articles, "Run-Length Encoding," by Robert Zigon, February 1989, and "RLE Revisited," by Phil Daley, May 1989. While Mr. Daley's compression technique certainly solves the problems associated with inefficient coding of streams of data with frequent character changes, it causes another problem potentially much worse in terms of data integrity.

Consider the situation where the algorithms are used for the compression of data where there is potential to lose a byte (or where a byte may be corrupted due to transmission or storage media).

The loss of a byte using the former technique is susceptible to the worst case loss of the whole of the rest of the file, if the incorrect byte is one which marks the stream length. Since by definition, compression algorithms are used to save storage or transmission time on large files, the losses may be considerable.

I would be interested to read a description of an algorithm which combines the 'best of both worlds' by combining Daley's better compression technique, with properly framed sequences which minimize the effects of inaccurate reproduction.

Luke E. Murphy

French's Forest, Australia

Notes from Down Punder

Dear DDJ,

I am still shocked about your July 1989 editorial concerning escort agency services being logged as "software" by credit card users.

I feel that I must chastise you, as editor, for not warning your readers concerning the riscs associated with using recursive, or ill-behaved, procedures in such an obvious multiuser environment.

I see the lack of such warnings as likely to spread bugs and viruses that may severely limit some users' forthcoming endeavours. Such bugs and viruses may disable their stack probes, or make them Unix.

P. Butterworth

New South Wales, Australia

Brute Force vs. Boyer-Moore

Dear DDJ,

Thanks for Costas Menico's article on faster string searches. The article was well written, and the algorithm he presented is interesting and useful. However, he seems to present this approach as being superior to the brute force method in the general case, which it clearly isn't. The weakness in the algorithm is the need for constructing a 256-byte skip array at every cell. This is equivalent in CPU cycles to doing a brute force scan on a string of about 80 bytes, and it occurs before the actual string search even begins.

Also, it should be pointed out that the timing benchmark program comparing his POSBM to the brute force method tended to skew the results. First of all, Turbo Pascal's built-in POS function is an extremely poor example of the brute force method. I ran Mr. Menico's benchmark program to compare execution times among POS, POSBM, and my own brute-force FIRSTPOS function, which is included in my STRINGS.TPU package (available on the Borland Forum on CompuServe). Not to toot my own horn, but on my IBM XT, FIRSTPOS was about 10 times faster than POS, twice as fast as POSBM! So by this benchmark, POSBM seems to be inferior to a well-implemented brute force approach.

Another skewing factor in the benchmark is the use of a 255-byte string, from which the last five characters are selected as the search string. This maximizes the distance over which the algorithm can skip. I ran the benchmark using an 80-character string (probably closer to a real-world application) with POS, POSBM, and FIRSTPOS, and the ratio of execution times was about 4.5:2.5:1, respectively.

So the question is, under what conditions is POSBM superior to the brute force method? A long pattern string would work in its favor by maximizing the average distance per skip. A long string to search in would also tend to magnify the effects of skipping, but here we're limited by the 255-byte length of Turbo Pascal strings. Eliminating the overhead of constructing the skip array at every call would go a long way in speeding things up, so it seems an ideal situation for using the Boyer-Moore method would be in searching a series of strings for the same pattern, so that the skip array need be constructed only once. Unfortunately, the way POSBM is written, there is no way to save the skip array between calls, making it impossible to capitalize on this situation.

Rich Winkel

Harrisburg, Missouri

C Multidimensional Arrays

Dear DDJ:

Your annual C Issue was quite enjoyable. I should like to comment on two articles in particular.

The first, "C Multidimensional Arrays at Run Time" by Paul Anderson, is both instructive (with respect to the use of pointers) and useful in application. It should be noted that the techniques developed for two- and three-dimensional arrays are easily extended to use the Far Heap as well as the (near) Heap. One has merely to use farcalloc( ) and farmalloc( ) with a cast to long for their numerical arguments and of course farfree( ) for releasing the memory blocks.

In Listing Three Mr. Anderson has assumed that the determinants to be evaluated have positive definite (non-zero) elements in the application of the algorithm (which is misnamed). As written, a non-vanishing determinant with a zero in a diagonal element would return a divide by zero error.

The method he has used to evaluate the determinant is known in the literature as "Upper Triangularization." The determinant (matrix) is converted into an upper triangular form (one in which all the elements below the diagonal are zero) and evaluated using the theorem that the determinant of an upper triangular matrix is the product of its diagonal elements. Note that any zero elements on the diagonal of the initial determinant are transformed in the upper triangularization process to non-zero elements if the determinant is non-vanishing.

This technique was originally developed (and primarily used) in the solution of systems of linear algebraic equations and the related problem of matrix inversion. In developing that solution the determinant of the coefficients of the equation is required. In many applications (particularly statistical) the equations are inherently positive definite and the solutions presented in the literature do not contain any tests for singularity. I presume that Mr. Anderson had referred to those in obtaining his algorithm.

I've included the correct code (see Listing One), translated from the pseudo code on page 143 of Numerical Methods for Computer Science, Engineering, and Mathematics, by John H. Mathews, Prentice Hall 1987. The algorithm given there is embedded in a slightly larger problem and I have extracted out only that part relevant to evaluating the determinant.

Jeff Duntemann has, in my opinion, written the most intelligent article by far on Smalltalk. He has really placed it in the proper perspective. It really makes you wonder why it hasn't been written before. This article alone is well worth the price of the issue many times over.

Reading it and noting my own reaction, reminded me of an event I witnessed many years ago at a Physics Department Colloquium. Professor E.P. Wigner was giving a talk on some new work he had recently done and when he had concluded there was the usual question and answer session. A well-known physicist (not quite as well known as Wigner) in the audience arose and said, "Well Professor Wigner, that was all very nice but it seems rather elementary." And Professor Wigner (after a momentary reflection) replied: "Yes -- when something has been explained to you and you do understand it, it is indeed rather elementary."

Given that framework, any other comments on my part would be superfluous.

Morton F. Kaplon

Easton, Maryland

_LETTERS TO THE EDITOR_

[LISTING ONE]



/***********************************************************************/
/* Determinant Calculator Using Near Heap and Upper Triangular Matrix  */

double det(arg,n)                 /* arg = array name, n = # elements */
char *arg;
int  n;
{
  register int i,t,k,p;
  double   **a, ret, x;     /* Array, Return value of Det, Temp variable */
  char     **sdim2();       /* defined in Listing Three */
  int      * row;           /* row pointer for pivoting */

  /* dynamically create 2 dim "array " array a from arg */
  a = (double **)sdim2(arg,n,n,sizeof(double));

  row = (int *)malloc(n*sizeof(int)); /* row pointer for pivoting allocation */
  if (row == (int *) NULL) {
    fprintf(stderr,"No heap space for Row Pointers for Pivoting ");
    exit(1);
  }

  /* creating upper triangular matrix with test for 0 values on diagonal */

  /* first initialize pointer vector */
  for (i = 0; i < n ; i++)
    row[i] = i ;

  /* find pivot elements */
  for (p = 0; p < n - 1; p++) {
    for (k = p + 1;k < n ;k++) {
      if ( fabs(a[row[k]][p]) > fabs(a[row[p]][p]) ) { /* switch index */
        t = row[p];
        row[p] = row[k];
        row[k] = t;
      }
      }
      /* In usual application this would be an error message that the
       * matrix is singular and the program would exit here          */
      if ( a[row[p]][p] == 0 )  /* Determinant is 0 */
        break;                  /* No need to continue on */

       for (k = p+1;k < n; k++) {
        x = a[row[k]][p]/a[row[p]][p];
          for ( i = p + 1; i < n ;i++)    /* do Gauss Jordan elimination */
            a[row[k]][i] = a[row[k]][i] - x * a[row[p]][i];
       }
       }

/*   if ( a[row[n-1]][n-1] == 0 ) Determinant is 0 - This would in
 *   normal application be a message Matrix is singular and an exit */

 /* value of determinant is product of diagonals of upper triangular matrix */
  for (ret = 1,i = 0;i < n; i++)
      ret *= a[row[i]][i];  /* if any of diagonals are 0 Det = 0 */

  free(row);
  free(a);

  return ret;
}