Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

idldoc: Automatic Documentation for CORBA IDL


Dr. Dobb's Journal June 1997: idldoc: Automatic Documentation for CORBA IDL

Ernest and Robert are senior members of the technical staff at Sandia National Laboratories in Livermore, California. They can be contacted at [email protected] and [email protected], respectively.


The Common Object Request Broker Architecture (CORBA) is a collection of standards for the construction of distributed, object-based, plug-and-play software systems adopted and promulgated by the Object Management Group (OMG). A CORBA-based system consists of a collection of cooperating software components (distributed objects) that provide various computational services to each other. The public interfaces to these distributed objects are described using the OMG Interface Description Language (IDL), which is similar to the declarations part of C++. Since OMG IDL describes interfaces only, there are no implementation constructs in the language at all.

Documentation of evolving interfaces is one of thorniest problems related to distributed-system development. Since automatic generation of browsable hypertext documentation from the interface-specification source code itself can be a documentation aid, we wrote a program, idldoc, that reads in IDL source and generates a set of HTML pages documenting the interfaces described in the IDL. idldoc generates nested indexes of all interfaces, exceptions, types, constants, modules, and the like, as well as individual pages describing each construct and all its contained subconstructs. idldoc contains a full parser for OMG IDL. Because idldoc understands the entire IDL language, its output can reflect a fairly deep understanding of the entities being described by including summaries and cross-referenced links to definitions located in other files. idldoc's output is therefore more useful than reading the IDL source itself, even with the author's comments in the source code.

You can supplement idldoc's output by including specially formatted "doc comments" in your IDL source. These doc comments are syntactically valid IDL, so they do not interfere with normal IDL compilers. Doc comments look like regular IDL comments, enclosed by /*...*/ characters, but the first * is always doubled, as in Example 1. This syntax is borrowed from doc comments as used in the Java language by the tool Javadoc (see "Automatically Generating Java Documentation," by Gary Aitken, DDJ, July 1996). Doc comments immediately precede the entity being documented. Although you can provide a doc comment for any syntactic entity in IDL, idldoc formats and displays such comments only for definitions of modules and interfaces, operations and attributes, types (typedef, enum, union, struct), exceptions, and constants.

You can embed standard HTML tags within a doc comment, although it's not a good idea to use heading tags like <h1> or <hr>. idldoc creates structured documents and has its own ideas about where headings belong, and conventional HTML structural tags can interfere with the formatting of the generated document.

The idldoc program also parses special "doc tags" that can be embedded within a doc comment. The tags begin with an @ character and must start at the beginning of a line. Doc tags let you supplement idldoc's information about the relationships between the constructs in your IDL and how they should be used. idldoc doesn't examine the context around a tag, so any tag can be used in any place. This permits you to do silly things if you like. You can, for example, provide a @returns tag for an interface. This probably can't mean anything useful, but you can do it if you want.

The recognized set of doc tags includes @see, which adds a hyperlinked "See Also" entry to an entry; @version and @author, which let you specify meta-information about the IDL; @parm, which is used to describe the purpose of individual parameters to IDL methods; @returns, which should include a description of a method's return value; and @raises, which describes the circumstances under which exceptions will be raised. Each doc tag lets you specify information for inclusion in the HTML, and many of them automatically cause the generation of hyperlinks to other relevant information. Figure 1, for instance, shows part of the output generated from Listing One

Architecture

idldoc uses Sun's IDL Compiler Front End (CFE), a publicly available reference implementation of an IDL parser and compiler driver written entirely in C++. The CFE can be customized to implement particular applications by providing a back end that can interpret the parse tree produced by the CFE. In our case, the back end emits an HTML description of the interfaces described in the parse tree. The CFE parse tree itself is constructed from a set of C++ classes representing each of the constructs in the IDL language. The Sun CFE also formed the basis for JIDL, the Java IDL compiler Ernest Friedman-Hill wrote in 1995 (see http://herzberg.ca.sandia.gov/jidl/).

We tried to implement idldoc using only the customization hooks provided by the CFE (that is, without hacking the CFE source). We succeeded for the most part, although we did need to make minor changes in the lexer to save idldoc comments when they appeared in the token stream. The original lexer (quite sensibly) discarded comments. In the end, we also need to rewrite part of the driver to make idldoc work on Win32 as well as on UNIX.

Sun's CFE uses the Factory design pattern to obtain new nodes while building the parse tree. The back-end writer supplies a factory that manufactures node objects of all the types used in the parse tree. By writing the factory so that it returns instances of user-defined subclasses of the standard node types, the back-end writer can obtain a parse tree built out of customized nodes. This is the approach we took in writing idldoc.

Each of idldoc's specialized parse-tree node classes publicly inherits from one of the CFE node classes and our own base class be_node (Listing Two). The most fundamental task of be_node is to capture and store any doc comment associated with a node. This is accomplished in be_node's constructor. The second important role of be_node is to provide the declaration of the virtual function emit(). Most subtypes of be_node define the emit() method to produce HTML text describing themselves. For constructs that define scopes (interface, module, struct, and the like), emit() also calls emit() on any contained constructs. Lastly, be_node serves as a place to put code that parses and stores doc comment tags and for utility routines for emitting HTML and managing scoped names. This code is used by most redefinitions of emit(). As a result of this architecture, the entire set of idldoc documentation can be generated from a parse tree by calling emit() on the root node of the tree.

The single be_node constructor takes one Boolean argument, which indicates whether or not doc comments are valid for the derived type of this node. If this argument is True, the be_node being constructed will check for, copy, and clear any text in a buffer maintained by the lexer. This buffer always contains the last doc comment seen; the lexer merely copies any doc comments it scans into this buffer. Doc comments are never returned to the parser from the lexer.

This strategy relies on the LALR(1) nature of the yacc parser used in the CFE. In an LALR(1) parser, the source text is scanned in normal lexical order (left-to-right, the LR of LALR) with a look-ahead (LA) of at most one token. No backtracking is done, and at any point in time, no more than one token beyond the current token has been accepted by the parse from the lexer. This guaranteed linear parsing allows the CFE to be designed so that a parse tree node is constructed as soon as a particular IDL construct is recognized. At that point in time, the be_node constructor is guaranteed that the last doc comment seen (the one in the buffer) should be associated with the node under construction, because under no circumstances can the lexer have advanced past one doc comment, past the beginning of a new construct, and past a second doc comment without be_node::be_node() being invoked.

The OMG IDL is ideally designed for LALR(1) parsing, because virtually every construct is introduced by an unambiguous keyword. Those that are not (methods in interfaces are the most important example) can be recognized merely because they begin with a type name instead of a keyword. This makes IDL parsers much simpler than those for C and C++; in fact, the first even approximately correct LALR(1) grammar for C wasn't produced until 1983, as part of the ANSI standardization process (see Bjarne Stroustrup's The Design and Evolution of C++, Addison-Wesley, 1994).

There is one potential flaw in the aforementioned scheme: IDL contains the forward keyword, used to declare the existence of an interface prior to the appearance of its definition. The Sun CFE would create a parse tree node to represent such interfaces at the point where the forward declaration appeared. Because any doc comments attached to that node will appear elsewhere in the IDL source, and because the be_node constructor has the job of acquiring the doc comment for a node, doc comments were dropped. Removing this behavior was the one modification that we had to make to the original CFE parser to produce idldoc.

Embedded Images

The HTML pages generated by idldoc include colorful GIF images to serve as headings. Because we wanted idldoc to be easy to use and install, it had to be self-contained and not rely on a special library directory to hold the GIFs. Therefore, we devised a scheme to embed the GIF images in the idldoc executable itself, so that running idldoc could create the images in the user's directory. In addition, we wanted the mechanism to be portable from UNIX to Win32, and for it to be easy to replace the GIFs with custom ones while compiling idldoc. The mechanism we used is general enough to be useful in other applications. The relevant source is in the docgifs subdirectory of the idldoc source distribution.

Three C utilities -- tob64, txt2c, and writehinc -- are compiled and used during the idldoc build process to accomplish the GIF embedding trick. The source for these utilities is included in the idldoc distribution, so their use presents no portability problems; you don't need to have the tools already to build idldoc. A list of all the GIFs to be embedded appears only in the docgifs makefile.

In the first step, the makefile runs the base64 encoding utility tob64 on each GIF file (base64 is a 7-bit ASCII encoding scheme for 8-bit data). The result of this step is a set of *.b64 text files corresponding to each of the binary *.gif files.

Next, make runs txt2c on each of the *.b64 files. Given the name of a text file and a valid C identifier name, txt2c generates a file containing a single C definition char * <identifier> = "<contents of the text file>";.

txt2c is careful about special characters within the text string -- newlines, quotes, and backslashes are all specifically escaped. And, as we discovered, one more character must also be treated specially -- the question mark, when it appears in pairs, can introduce the ANSI C character escapes called "trigraphs." (If you are unfamiliar with trigraphs, they are a mechanism by which European character sets, which frequently lack the '{}' and '[]' characters, can be used to write C and C++ code. These trigraph escapes all begin with "??". For example, "??(" for '[' and "??<" for '{'.) Trigraphs turned out to accidentally appear with surprising frequency in base64-encoded GIFs!

The last utility routine, writehinc, produces a number of other C declarations in the form of a header file. This header initializes an array of character pointers to a list of all the "text-ified" GIF strings, and another array to the list of original GIF filenames. These arrays are used by base64 decoding code compiled into idldoc itself. This allows idldoc to turn the static string data embedded into its text segment into binary GIF files.

Portability

We wanted to be able to use idldoc source on several UNIX platforms as well as on Win32. The major stumbling block to this portability was the Sun CFE itself, which had only previously been compiled on UNIX. We therefore needed to port the CFE to Win32, along with our own code. The finished Win32 port doesn't require any special tools to compile other than Microsoft Visual C++. We have supplied makefiles for Microsoft's nmake. Many of our experiences porting idldoc to Win32 made idldoc better on all platforms:

  • We replaced a fork/exec of UNIX utilities used in index-building (sort | uniq) with equivalent C++ code.
  • We rewrote several build-time utilities, originally shell and Tcl scripts which invoked UNIX utilities, as C++ code.
  • We added canonical translations of the lex and yacc output to the source tree. Both the grammar and lex specification use C++ actions, and therefore do not work correctly with all versions of lex and yacc. The lex input in particular is not portable to non-SYSV machines; it produces incorrect output with flex. Including these files helped make idldoc portable to Linux and other non-SYSV UNIX platforms.

Only one nontrivial section of platform-specific code remains in the final verison of idldoc. The semantics of process management differs substantially across the two operating systems, so the code that executes the preprocessor and captures the error status must be substantially different. On UNIX, it suffices to call fork() to create a child process, open an output file in the child process and attach it to standard output, and then call exec() on the preprocessor. exec() overlays the child process with a running image of the preprocessor, which automatically inherits the output file and returns its exit status to the parent process via the wait() system call; see Listing Three. Notice that the new process starts out as a copy of the parent process in every respect except for its process ID number. In particular, it is trivial to execute some C++ code in the new process before the preprocessor overlays it via the exec() call. Also notice that setting the child process's standard output to point to a file does not affect the parent, and when the child exits, this file is closed automatically. Listings Three and Four have been simplified by the removal of much of the error-handling code.

On Win32, the roughly equivalent code (see Listing Four) looks quite different. None of the idldoc application code runs in the forked process, so the output file must be opened by the parent (although Win32 does not have a call like fork() that copies an entire process, it does allow individual features of a parent process, like file handles, environment variables, and security features to be inherited by a child process). As a result, we don't get the "automatic cleanup" typical of the UNIX style; we have to keep a copy of any file handles we change so that we can restore them after the CreateProcess call. Because the environment inherited by the child process must be specified to the CreateProcess call, the Win32 code is much longer than the equivalent UNIX code. Nevertheless, it is indeed possible to attain the equivalent functionality.

Conclusion

We are using idldoc to document the IDL interfaces for some ongoing projects at Sandia National Labs. At last count, there were more than 500 registered idldoc users around the world. We have seen a number of interesting idldoc-generated sites on the Web, but perhaps the most interesting is at http://www.eurecom.fr/~blum/proto/orbdoc/index.html, where Christian Blum of the Institut EURECOM has made available idldoc-generated documents generated from the complete CORBA2.0 specification and COSS (Common Object Services Specification). Idldoc is available for free download from http://herzberg.ca.sandia.gov/idldoc/.

DDJ

Listing One

/** * If mass is an input, then you may set it in grams with
 * this routine.
 * Legal values for the input string species name are:
 * Al, Fe, Ni, Co, Sc
 * <p>
 * Some of the following "@see" tags are illegal.  This is
 * intentional to make sure we don't do something embarassing
 * (like crash).  
 * @see GetGrams
 * @see grindle::GetGrams
 * @see grindle
 * @see GNDLX::enumMod
 * @see ::GNDLX::enumMod
 * @see GNDLX
 * @see GNDL
 * @raises genericException - if something goes wrong.
 * @see GNDL::grindle1
 * @see ::grindle::GetGrams
 * @see ::grindle::GetGrams (for more discussion)
 * @see ::grindle::GetGrams::
 * @raises specificException - if something else goes wrong.
 * @see 
 * @see SetMoles
 * @parm sz - The string selecting the desired species.
 * @parm lr - The new value for the number of moles
 */


</p>
void SetGrams(in string sz, in double lr)
  raises (genericException);

Back to Article

Listing Two

// These variables are used by the lexical analyzer to publish// the most recently parsed doc comment
extern char idldoc_text[];
extern int idldoc_valid;


</p>
class be_node {
protected:
    char*   idldoc;
    char*   idldoc_tags;


</p>
   int fTagsParsed;


</p>
public:
    be_node(int snarf_idldoc=0);
    virtual void emit(EMT, ostream&);
    String idlDocString(UTL_Scope*, EMT);
    };


</p>
be_node::be_node(int snarf_idldoc) : idldoc(0),
  fTagsParsed(0),
  idldoc_tags(0)   
{
  if (snarf_idldoc)
    {
 if (idldoc_valid)
   {
// don't let anyone else have this comment
idldoc_valid = 0;
idldoc = strdup(idldoc_text);


</p>
/**
 * In case idldoc comments look like this,
 * remove whitespace-star-whitespace
 * sequences an the font of each line.
 */
stripStars(idldoc);


</p>
// Split the "tag" parts off from the idldoc string.
idldoc_tags = splitTags(idldoc);
   }
    }
}

Back to Article

Listing Three

int wait_status;switch (child_pid = fork()) {
case 0: /* Child - call cpp */
  /* Open the output file */
  int fd = open(tmp_file, O_WRONLY | O_CREAT | O_TRUNC, 0777);
  /* Attach this file to our standard output */
  dup2(fd, 1);
  /* Execute CPP with a predetermined set of arguments */
  /* A successful call to exec never returns */
  execvp(arglist[0], arglist);
  cout << "Failed to exec CPP!" << endl;
  exit(99);


</p>
default:    /* Parent - wait */
  /* Wait until this child process terminates */
  while (child_pid != wait(&wait_status));
  if (wait_status != 0) {
    cerr << "Preprocessor returned non-zero status." << endl;
    exit(99);
  }


</p>
} 
/* Preprocessing went OK; parent can now process output */

Back to Article

Listing Four

SECURITY_ATTRIBUTES saAttr; HANDLE hTmpFile;


</p>
/* Set the bInheritHandle flag so file handle is inherited. */ 
saAttr.nLength = sizeof(SECURITY_ATTRIBUTES); 
saAttr.bInheritHandle = TRUE; 
saAttr.lpSecurityDescriptor = NULL; 


</p>
/* 
 * The steps for redirecting child's STDOUT: 
 *1.  Save current STDOUT, to be restored later. 
 *2.  Create file to be STDOUT for child. 
 *3.  Set STDOUT of parent to be the file, so 
 *    it is inherited by child. 
 */ 


</p>
/* Save the handle to the current STDOUT. */ 
HANDLE hSaveStdout = GetStdHandle(STD_OUTPUT_HANDLE); 


</p>
/* Create temp file for the child's STDOUT. */ 
hTmpFile = CreateFile(tmp_file, GENERIC_WRITE, 0, &saAttr,
  CREATE_NEW, FILE_ATTRIBUTE_NORMAL, 0);


</p>
/* Set a write handle to the file to be STDOUT. */ 
SetStdHandle(STD_OUTPUT_HANDLE, hTmpFile)


</p>
PROCESS_INFORMATION piProcInfo; 
STARTUPINFO siStartInfo; 


</p>
/* Set up members of STARTUPINFO and PROCESS_INFORMATION structures. */ 
memchr((void *) &piProcInfo, '\0', sizeof(PROCESS_INFORMATION));
memchr((void *) &siStartInfo, '\0', sizeof(STARTUPINFO));


</p>
siStartInfo.cb = sizeof(STARTUPINFO); 
siStartInfo.lpReserved = NULL; 
siStartInfo.lpReserved2 = NULL; 
siStartInfo.cbReserved2 = 0; 
siStartInfo.lpDesktop = NULL; 
siStartInfo.dwFlags = 0; 


</p>
/* Create the child process. */ 
if (!CreateProcess(NULL, 
 szCmdLine,/* command line   */ 
 NULL,/* process security attributes   */ 
 NULL,/* primary thread security attributes */ 
 TRUE,/* handles are inherited    */ 
 0,   /* creation flags */ 
 NULL,/* use parent's environment */ 
 NULL,/* use parent's current directory*/ 
 &siStartInfo,  /* STARTUPINFO pointer */ 


</p>
&piProcInfo)   /* receives PROCESS_INFORMATION  */ 
 )
  {
   cerr << "Couldn't run preprocessor." << endl;
  exit(99);
  }


</p>
DWORD stat = WaitForSingleObject(piProcInfo.hProcess, INFINITE);
GetExitCodeProcess(piProcInfo.hProcess, &stat);
if (stat != 0)
  {
  cerr << idl_global->prog_name() <<
 GTDEVEL(": Preprocessor returned non-zero status ") <<
 stat << "\n";
  exit(stat);
  }


</p>
CloseHandle(hTmpFile); 
/* Restore our old stdout */
SetStdHandle(STD_OUTPUT_HANDLE, hSaveStdout)

Back to Article


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.