Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Doorknob Arguments


May99: C Programming

Al is a DDJ contributing editor. He can be contacted at [email protected].


In January of this year Judy and I visited our daughter and her family in Virginia. Wendy lives there with her husband, Lester, and two sons, Landon and Woody. The occasion was Landon's seventh birthday. To celebrate the milestone, Landon was permitted to host a Friday night slumber party with several of his friends, mostly small boys his own age, and, of course, Woody, who is four. The boys romped in the downstairs recreation room while we adults visited upstairs. Wendy had allowed that just this once the boys could carry on all night until they dropped one by one from sheer exhaustion. A wise decision since no other outcome would have been remotely feasible. After Wendy and Lester and Judy went to bed, I stayed up for a while in the living room, reading a bit, but mostly listening to the boisterous sounds coming from downstairs. Every now and then, amidst the endless chatter and giggles, came a chorus of shouts in unison followed by gales of laughter. The shouts were always the same word, "Doorknob!" I wondered what it meant.

After a while, the celebration died down. I slunk down the stairs, peeked in on them, and found the boys all asleep on the floor scattered around the room among piles of pillows and blankets and toys. Still not knowing the significance of "doorknob," I called it a night and turned in.

The next morning found parents dropping in for coffee and to pick up their young charges. I quietly asked each of the parents about "doorknob," but nobody had a clue. When all the small guests had departed, I waited for Landon to come up for breakfast. He was late, sleeping in after his night of celebration and merriment. At last we were alone at the kitchen's breakfast bar eating our cereal, and I asked him what it means when someone says, "doorknob."

"It means you didn't do something bad," he answered cautiously.

"Something bad like what?" I asked. Landon and I have no secrets from one another and can speak openly at all times.

"It means that you didn't..." Landon leaned closer, looked around to make sure that no one else was listening, and in a conspiratorial whisper said the word, the noun and verb, that describes a particular human action that is generally not acceptable behavior in polite society but that small boys, and big ones too, find to be hilarious for some reason. Having said the forbidden word, he stifled a giggle and continued, "Whenever a guy leaves one, everybody else says 'doorknob' so that people will know who didn't leave it."

I struggled to maintain a modicum of composure. It wasn't easy. "How did you learn that?" I managed to ask.

"Cody told us," he said, referring to his older cousin. "He's 10 and knows about stuff like that. When somebody leaves one, the one who doesn't say 'doorknob' is the one who left it, and that's how you know who to blame."

I excused myself and hastily left the room.

A couple of weeks have passed, and I have had time to reflect on the significance of "doorknob." How simple and intuitive. Leave it to a bunch of little boys to coin a language element that solves a problem as old as mankind itself. When something bad is done, whomever did not do that something bad says "doorknob," and everyone knows. Now it's up to us adults to extend the solution to other appropriate idioms. I'll have to think about this for a while.

What's in an argv?

The April 1999 issue of DDJ included an article by Brian Kernighan and Rob Pike about parsing regular expressions. To demonstrate the technique, the authors included an example grep program. While editing the article and its code for technical content, I found code similar to that shown in Example 1. Accompanying the text were examples of command lines that invoke the program like this:

grep sometext *.txt

My initial reaction was that the code does not work. I was only partially right. How right I was depends on how many readers develop under MS-DOS and how many develop under UNIX. That ratio directly correlates to the degree to which I was right. Okay, so maybe I was mostly wrong, but you don't expect me to admit it without a struggle, do you?

The problem, as I saw it, was that the line of code that calls the Standard C fopen function passes a pointer to a string with the value "*.txt," which is not a file specification that fopen recognizes. MS-DOS programmers will immediately understand this problem. UNIX programmers will not see any problem at all.

I called Brian Kernighan and asked him about it. After all, the "K" in K&R ought to know how to write C code. I was sure I'd found an oversight and that he would resoundingly thank me. Brian did recognize the problem right off, which was not that the code was wrong but that he was talking to an MS-DOS programmer. He patiently explained that the UNIX command processor shell expands ambiguous file specifications into a list of filenames that the shell passes to the program in the argv array. The MS-DOS COMMAND.COM command processor makes no such expansion and passes to the program whatever the user enters on the command line.

Just to be sure, I compiled a program like the one in Example 1 at the UNIX site where I develop CGI programs. I put a printf into the program to display each of the argv arguments on the console. Sure enough, when I ran the program with "*.c" as the command-line argument, the program displayed all the C source-code filenames in the current directory. Not that I didn't believe what Brian Kernighan said, you understand. Just had to see it for myself.

This dialog between two internationally famous C gurus (me, your humble yet revered "C Programming" columnist, and Brian, a genuine C authority who actually deserves recognition) raised two questions. First, if I'm such a hot shot guru, why didn't I know what UNIX shells do with ambiguous file specifications? Second, why doesn't COMMAND.COM, now the world's most widely used command processor, expand them like the Bourne shell and others do?

Let's start with my excuses. My C programming began years ago with Leor Zolman's BDS C compiler for CP/M and continued with the Aztec C compiler on that platform, which implemented classic K&R C. Later I used most of the C compilers, K&R and Standard C, that were implemented for the PC MS-DOS platform, and, until GUI processing became the preferred way to write software for the PC, command-line processing was a major part of that experience. If the user was to enter ambiguous file specifications with wild cards on the command line, you had to include a function to parse them into lists of unambiguous file specifications. That requirement was a given, and programmers wrote and published general-purpose command-line option parsers and expanders. I wrote one, too, and reused it many times.

I thought I understood the command line inside and out. But during all this time, I wrote an occasional UNIX program, too. How come I never knew about file-specification expansion by the shell? The answer is I don't really know why except that I taught myself everything I know about programming on both platforms, and somewhere along the way the teacher let the student down. To offer a lame excuse, I will explain that none of my UNIX programs (that I remember) used the command line for file specifications. Those programs were mostly to support database engines and, more recently, CGI applications. If I would have needed filename expansion in one of those programs, I probably would have included the expander function I mentioned earlier. It wouldn't have harmed the program, but it wouldn't have done any good either. The expander function would simply never have seen a command-line option with a wild card character to expand.

This new, yet old, piece of knowledge answers one of my questions about the <stdio.h> part of the C Standard. Why aren't there standard functions such as the typical findfirst and findnext that most PC compilers define in a <dir.h> header file? Now I know.

Now, let's try to answer why COMMAND.COM does not similarly disambiguate file specifications on the command line. Brian observed that this problem was solved and the solution defined by the UNIX developers about 30 years ago. It does not seem reasonable that the framers of MS-DOS would not have taken their example from those who pioneered the technology and paved the way for the operating systems to come.

First some background. DOS, a 16-bit operating system designed to run on 8086/ 8088 microcomputers, was originally a close clone of the CP/M operating system that ran on 8080 and Z80 microcomputers. Its development was made necessary because CP/M-86 was late being released and its author needed something right away for an 8086 platform his company was building. Later, Microsoft acquired DOS, renamed it MS-DOS, and persuaded IBM to use it on its newly introduced PC in 1981. After a few upgrades, MS-DOS looked much like it does today. All of which is ancient folklore for the historians to muddle over.

The MS-DOS operating environment for running programs was constrained by the memory limitations of the PC platform and the fact that MS-DOS is a single-tasking operating system with no task swapping. (The inherent paucity of the operating system when paired with the requirements of contemporary applications later gave rise so such kludges as DOS extenders, terminate and stay resident programs, extended memory, expanded memory, high memory, and so on.) As a result of such operating system limitations, the COMMAND.COM command processor is divided into resident and transient parts. The resident part contains only the code that the operating system needs to break into the running program, to enable the running program to terminate, and to reload the transient part when the running program terminates. The text that the user types on the command line is initially stored in the transient memory and then copied to the running program's Program Segment Prefix (PSP), which, among other things, contains a 128-byte so-called "command tail" data space to contain the command-line data. (This approach allows a running program to use one of the nonstandard exec -- or spawn -- functions or the standard system function to launch subprocesses with different command lines. But that's hindsight; it probably wasn't a design objective.)

Obviously the command tail's 128-byte limit is not big enough to support unlimited filename expansion. If the developers of MS-DOS were trying to emulate the UNIX operating environment, they might have chosen a different way to implement the command-line expansion to enable variable length filename lists. They might have allocated memory for the arguments from the system heap, for example. But they did not. Why not?

We sometimes forget that MS-DOS was not developed to be an operating system to support a UNIX style of programming in the C language. When I saw my very first IBM PC in a computer store in 1981, there was no compiled language available -- C or otherwise. You could code in interpreted Basic or in ASM. That was it. (There was talk of an alternative Pascal operating environment, but that concept never saw significant light of day.) The developers of MS-DOS were trying to get something working in the limited space provided (the first PCs had 64 K of RAM) with the odd memory architecture that IBM had contrived for their PC. It would be up to the C compiler builders to take the platform and make something meaningful out of it.

This is where, I think, we had a division of responsibility with a big gap at the division. The first C compilers on the PC were ports of UNIX compilers. Those folks were accustomed to having the shell take care of command lines. COMMAND.COM and the 128-byte command tail in the PSP did not -- and could not -- support what the UNIX shells could do. The C compiler builders had to take over responsibility for the command line, which was handed to them in a single-byte string just as the user types it in. A C program expects argv to point to an array of pointers to NULL-terminated strings, with each string being one entry on the command line. COMMAND.COM doesn't do that, so the compiler builders had to implement it in the run-time startup code. And they chose not to expand file specifications. And what we do not have now is what they chose not to provide.

But, you ask, what about now? Aren't the MS-DOS component of Windows 98 and its NT and OS/2 counterparts all big time, grownup 32-bit operating systems? Don't they have the resources available to support real shells à la UNIX? Sure they are and sure they do, and such shells are indeed available; they're just not a part of the distributed operating systems. And why not? I can only guess that the purveyors of those operating systems judge that today's developers target (or should target) GUI applications exclusively, that command-line options are old hat, that everything today is done with dialog boxes anyway, so why bother? They have decided that the 30 year old legacy of UNIX is not contemporary enough for the developers of the New Millennium and need not be paid the respect us oldtimers think it deserves.

And so we unearth all the culprits. The MS-DOS developers are not completely at fault, although they share some of the blame. Sure, they chose not to write a shell that handled filename expansion, but the compiler builders made a similar decision when they were implementing the argc, argv logic in their startup code. And all those developers and pathfinders agreed that those decisions were adequate for the rest of us. What can I say? Oh, yeah...

"Doorknob!"

DDJ


Copyright © 1999, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.