Dr. Dobb's | Examining Doxygen

Examining Doxygen

Doxygen is a tool for generating formatted documentation from comment blocks in the source code.

October 01, 2004
URL:http://www.drdobbs.com/cpp/examining-doxygen/184405849

October, 2004: Examining Doxygen

Automatic documentation generation

Al is DDJ's Java newsletter editor. He can be contacted at alwal-williams.com.

As a consultant, I've worked with many different programming languages over the years. However, I don't think I've ever found a language I liked as well as C (and, by extension, C++). However, while I prefer C++, I still see features in other languages that I wish were in C++.

Another language I frequently use is Java. Nevertheless, most of the things I like about Java have nothing to do with the language itself. I do like Java's library and several of its ancillary tools, and one of my favorites is the Javadoc tool. This program snatches comments from source code and automatically generates HTML documentation. Since I write lots of libraries for other developers, this is a great convenience. With Javadoc, you can generate collateral documentation from comments in your code with almost no extra effort.

I've often wondered if I could coax or coerce Javadoc into working with C++ code. However, a better answer is Doxygen (http://www.doxygen.org/), an open-source Javadoc clone that understands C and C++ code. In fact, it also works with Java. Besides its multilingual capability, it also has a variety of output options including HTML, LaTeX, XML, RTF, PostScript, PDF, and even the UNIX man page format. With all those output formats, even Java programmers might want to try Doxygen.

But since I program in a variety of languages, can Doxygen help when I use something much different from C? Surprisingly, the answer is "yes"—but with some work. Some languages are close enough to C that most Doxygen features will work with them. However, for languages that are completely different, you can still write a filter that massages your target language into something close enough to C so that Doxygen understands it.

In this article, I first show how to use Doxygen in a C++ program. I then present a filter to process a Basic-like language (PBasic) used by Basic Stamp microcontrollers and make it available for Doxygen (for more on Basic Stamp, see http://www.parallax.com/). Basic is certainly very different from C. Not all of Doxygen's features make sense for a Basic program, but the results are still useful.

The Results First

So, what's the value to Doxygen? Figures 1 and 2 show sample HTML pages generated by Doxygen for a C++ program and a PBasic library. If you've ever seen Javadoc-generated pages, these won't surprise you.

Some of Doxygen's output is extracted from the semantics of the source programs. Other parts are from special comments you embed in your code. The entire process requires a special configuration file that specifies exactly what you want to do. Just as your development directory usually has a makefile that controls the build process, you also have a Doxyfile that contains the options used by Doxygen. This is simply a free-form text file that has options variables and their values. Doxygen automatically generates an example Doxyfile for you on request.

The source comments can use any of three formats:

Special comments can mimic a Javadoc comment by using an extra asterisk in a multiline comment (/** A special comment */).
You can also use an exclamation point instead of the second asterisk (/*! Another special comment */).
Two or more single line comments (//) that have at least one extra slash or exclamation point following the comment marker also form a special comment, for example:

////////////////////
/// A special comment
///////////////////

Grouping

You can group related things by defining a group with a unique name. This is simple to do with a defgroup tag. Each group name must be unique. If you prefer to merge like-named groups, you can use addtogroup instead of defgroup.

Once you have a group, you can use ingroup to place items in that group. Of course, in your typical C++ source file, that means every item would have an identical ingroup statement. To simplify this common case, Doxygen recognizes that any items defined between { and } tags all belong to the last group you defined. So, for example:

/** defgroup DDJ Dr. Dobb's Group
{
*/

/** Subscribe. Processes a subscription request.
param req A request block.
return True if successful. False if an error
occurs (check g_error for cause).
*/
Boolean Subscribe(ReqBlock req);

/** Deliver electronic copy.
/** Sends current copyvia e-mail.
param issue An issue object
return True if successful. False if an error occurs
(check g_error for cause).
*/
Boolean Deliver(IssueObj issue);

/** } */

In this example, both Subscribe and Deliver are parts of the DDJ group. Of course, there might be other elements that are part of this group as well. You can also use the same special braces to group related items together (but only to a single level—you can't nest these groups). However, since Doxygen normally groups items together by their semantics, I rarely find this necessary.

Other Features

Doxygen can perform a variety of other tasks. You can write formulae that Doxygen will typeset for you, for example. It can even generate graphs of your program's dependencies, call tree, and class hierarchy. Another facility lets you include example code and test cases in the documentation output.

There are many ways to format things in Doxygen. In particular, there are several simplified formats for creating lists. However, since I am comfortable with HTML, I just use the HTML list mark up instead of Doxygen's custom tags.

It takes a while to get used to all the features Doxygen presents. If you have any doubt that it can generate sophisticated documentation, though, look at the Doxygen manual on its web site. The manual was created using Doxygen (of course)! You probably should start with a simpler project while learning Doxygen, but it is nice to know that the capability to produce something that complex is there once you master the tool.

A C++ Example

To illustrate Doxygen's features, I wrote a simple program in C++ (the program is available electronically; see "Resource Center," page 5) with three classes: Vehicle, Car, and Truck. Figure 3 shows the class hierarchy automatically generated by Doxygen.

Before using Doxygen on a project, you can request that it make a configuration file in the directory by running the program with the -g option. This produces a file with various default values that you can edit. In this case, I changed PROJECT_NAME and JAVADOC_AUTOBRIEF. I also turned off the option to produce LaTeX output.

After you have completed the configuration file, you can run Doxygen with no options to produce the output. The HTML output is in the HTML directory (although you can change the output directory in the configuration file). Simply open a browser and check it out. Class names are automatically cross referenced. This allows you to click on a name anywhere in the document and jump to the related documentation.

Extending Doxygen's Reach

If you use C, C++, or even Java, Doxygen is great. But what about other languages? Doxygen can't work with every possible language, but it does have an escape hatch that you can use to make many of its features work with other languages.

In the configuration file, there is an INPUT_FILTER option. You can set this option to a filter that preprocesses each file before Doxygen reads it (you also have to set FILTER_SOURCE_FILES to YES). In my case, I wanted to write a filter to translate PBasic code (a dialect of Basic) into something that was close enough to C that Doxygen could create documents from it.

Of course, some Doxygen features won't make sense with a PBasic source file. There are no classes, for example, so there is no class hierarchy. In addition, Doxygen's ability to show you source files will be useless since the source files won't look like the ones the end user sees. The key is to customize the configuration file so that Doxygen doesn't generate unwanted output.

Fortunately, Doxygen is forgiving and you don't have to do much translation to make a file "C-like"—at least enough to get meaningful documentation. I wrote my filter using awk (see Listing One). It handles a few transformations:

Basic comments become C comments.
Constant declarations (CON) become C-style initializers.
Variable declarations (VAR) become C-style declarations.
Each line receives a semicolon at the end.
Labels become functions that end with the first GOTO or Return encountered.

The processing of the labels is probably the oddest part of the transformation. Although the results are not always perfect, they are usually good enough to get a meaningful document from the source code and that's the goal.

I've included a library (available electronically) that I wrote to interface the Basic Stamp to a floating-point math coprocessor and the HTML output from Doxygen. Although you would not want to view the filtered source code, the output from Doxygen is quite useful and that's what counts (you can see some of the output in Figure 2).

Wrap Up

Too often documentation is left for the last part of a project and that means it is usually the first thing to get cut when deadlines loom or newer projects start clamoring for attention. With tools like Doxygen, you can document "just in time" (to borrow a management phrase). Not only does this make document creation less tedious, it also helps other developers use your code during the project. Because there is little impact to adding Doxygen comments, there isn't much to lose and quite a bit to gain.

If you work by yourself, you might be tempted to simply not document. When I feel that temptation, I think about three things:

In six months, I won't remember what I did or why I did it.
What if someone else has to wade through my code because I'm out of town or ill?
Well-documented code is an asset that I could sell if I had to raise cash. Undocumented code is certainly not a salable asset and could even be a liability.

At first glance, you may think Doxygen isn't as useful for user-level documentation as it is for developer documentation. To some extent, this is true. However, as the Doxygen manual demonstrates, you can produce actual user documentation with a little effort. Of course, its true strength is in developer documentation. Doxygen should be an essential part of every programmer's toolkit.

DDJ

Listing One

BEGIN {
  cflag=0;  # comment in progress
  ocflag=0; # old comment in progress
  inblock=0;  # block flag
  afterbrace=0; #brace at end?
}

# user function to make a string have no regexp metachars
function escape(s) {
  gsub(/([\[\]\\\^\$\.\|\(\)\*\+\?\{\}])/,"\\\\&",s);
  return s;
  }


#Transform constants
$2=="con" || $2=="CON" || $2=="Con" || $2=="cOn" || $2=="coN" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)
  sub(b,"")
  sub(apat "[ \t]+" cpat,a " = " c);
}

# Transform x var y => y x; (should preserve comments)
$2=="var" || $2=="VAR" || $2=="Var" || $2=="vAr" || $2=="vaR" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)

  sub(b,"")
  sub(apat "[ \t]+" cpat,c " " a);
}


# Assume this is not a comment
  {
    cflag=0;
  }


# Whole line comment, start or extend
/^[ \t]*'/ {
  cflag=1;
# start
  if (ocflag==0)  sub("'","/*"); else sub("'","");
  ocflag=1;
  print;
  next;
  }

# process partial line comments and semicolons
  {
  if (!/[ \t]*#/) {
  if (/'/) {
    sub("'","; /*");
    $0=$0 " */"
  } else {
    if (!/:[ \t]*$/) $0=$0 ";"
    if ($0==";") $0=""
    }
   }
  }
   
# process all lines  
  {
  if (ocflag==1) {  # close pending comment
     print "*/";
     ocflag=0;
     }
  }

# handle label
match($1,/.*:/) {
  if (inblock==1) print "}";
  sub(":","",$1);
  print $1 "() {";
  inblock=1;
  sub($1,"");
}

# handle end of quasifunction
toupper($1)=="GOTO" || toupper($1)=="RETURN" {
  inblock=0;
  afterbrace=1;
}

 {
  print;
if (afterbrace==1) { 
  print "}";
  afterbrace=0;
   }
  }

Back to article