Examining Doxygen

Doxygen is a tool for generating formatted documentation from comment blocks in the source code.


October 01, 2004
URL:http://www.drdobbs.com/cpp/examining-doxygen/184405849

October, 2004: Examining Doxygen

Automatic documentation generation

Al is DDJ's Java newsletter editor. He can be contacted at alwal-williams.com.


Doxygen with Windows


As a consultant, I've worked with many different programming languages over the years. However, I don't think I've ever found a language I liked as well as C (and, by extension, C++). However, while I prefer C++, I still see features in other languages that I wish were in C++.

Another language I frequently use is Java. Nevertheless, most of the things I like about Java have nothing to do with the language itself. I do like Java's library and several of its ancillary tools, and one of my favorites is the Javadoc tool. This program snatches comments from source code and automatically generates HTML documentation. Since I write lots of libraries for other developers, this is a great convenience. With Javadoc, you can generate collateral documentation from comments in your code with almost no extra effort.

I've often wondered if I could coax or coerce Javadoc into working with C++ code. However, a better answer is Doxygen (http://www.doxygen.org/), an open-source Javadoc clone that understands C and C++ code. In fact, it also works with Java. Besides its multilingual capability, it also has a variety of output options including HTML, LaTeX, XML, RTF, PostScript, PDF, and even the UNIX man page format. With all those output formats, even Java programmers might want to try Doxygen.

But since I program in a variety of languages, can Doxygen help when I use something much different from C? Surprisingly, the answer is "yes"—but with some work. Some languages are close enough to C that most Doxygen features will work with them. However, for languages that are completely different, you can still write a filter that massages your target language into something close enough to C so that Doxygen understands it.

In this article, I first show how to use Doxygen in a C++ program. I then present a filter to process a Basic-like language (PBasic) used by Basic Stamp microcontrollers and make it available for Doxygen (for more on Basic Stamp, see http://www.parallax.com/). Basic is certainly very different from C. Not all of Doxygen's features make sense for a Basic program, but the results are still useful.

The Results First

So, what's the value to Doxygen? Figures 1 and 2 show sample HTML pages generated by Doxygen for a C++ program and a PBasic library. If you've ever seen Javadoc-generated pages, these won't surprise you.

Some of Doxygen's output is extracted from the semantics of the source programs. Other parts are from special comments you embed in your code. The entire process requires a special configuration file that specifies exactly what you want to do. Just as your development directory usually has a makefile that controls the build process, you also have a Doxyfile that contains the options used by Doxygen. This is simply a free-form text file that has options variables and their values. Doxygen automatically generates an example Doxyfile for you on request.

The source comments can use any of three formats:

////////////////////
/// A special comment
///////////////////

Tags

Special comments can contain Doxygen tags that let you specify particulars about your program. By default, comments refer to the next lexical element. So in a C header file, you might have:

/** This comment applies to function a */
int a(void);
/** This comment applies to function b */
int b(void);

However, you can also force a comment to apply to the previous element. You do this by starting the comment with a less than (<) sign. This is useful when you have a short comment after a variable declaration:

int i; /**< General-purpose variable */

The actual tags start with a backslash or the at sign () (you can use these interchangeably).

So, here's a function definition in a C header file:

/** brief Generate a hash code from the given key.
* This function will take a string key and produce
* a 32-bit hash code using the Q50 algorithm.
* param key The source key (user ID)
* return Hash code.
*/
long GetHash(CString key)
{
. . .
}

This comment uses three tags:

The brief tag provides an abbreviated description used in lists. The rest of the comment is used in detailed descriptions of the function. However, I rarely use the brief tag. Instead, I usually set the JAVADOC_AUTOBRIEF configuration variable to YES. This causes Doxygen to interpret the first line of the comment as the brief comment. If a period occurs before the end of the line, then Doxygen stops the brief comment at the period.

There are many different tags available to use within comments (see Table 1 for a partial list). Some of these tags are not frequently used, but are there for compatibility with other documentation systems. You can also use many HTML markup tags to spruce up your documentation. For example, you might use <B> to make a bold element or an <A> to create a hyperlink.

Grouping

You can group related things by defining a group with a unique name. This is simple to do with a defgroup tag. Each group name must be unique. If you prefer to merge like-named groups, you can use addtogroup instead of defgroup.

Once you have a group, you can use ingroup to place items in that group. Of course, in your typical C++ source file, that means every item would have an identical ingroup statement. To simplify this common case, Doxygen recognizes that any items defined between { and } tags all belong to the last group you defined. So, for example:

/** defgroup DDJ Dr. Dobb's Group
{
*/

/** Subscribe. Processes a subscription request.
param req A request block.
return True if successful. False if an error
occurs (check g_error for cause).
*/
Boolean Subscribe(ReqBlock req);

/** Deliver electronic copy.
/** Sends current copyvia e-mail.
param issue An issue object
return True if successful. False if an error occurs
(check g_error for cause).
*/
Boolean Deliver(IssueObj issue);

/** } */

In this example, both Subscribe and Deliver are parts of the DDJ group. Of course, there might be other elements that are part of this group as well. You can also use the same special braces to group related items together (but only to a single level—you can't nest these groups). However, since Doxygen normally groups items together by their semantics, I rarely find this necessary.

Other Features

Doxygen can perform a variety of other tasks. You can write formulae that Doxygen will typeset for you, for example. It can even generate graphs of your program's dependencies, call tree, and class hierarchy. Another facility lets you include example code and test cases in the documentation output.

There are many ways to format things in Doxygen. In particular, there are several simplified formats for creating lists. However, since I am comfortable with HTML, I just use the HTML list mark up instead of Doxygen's custom tags.

It takes a while to get used to all the features Doxygen presents. If you have any doubt that it can generate sophisticated documentation, though, look at the Doxygen manual on its web site. The manual was created using Doxygen (of course)! You probably should start with a simpler project while learning Doxygen, but it is nice to know that the capability to produce something that complex is there once you master the tool.

A C++ Example

To illustrate Doxygen's features, I wrote a simple program in C++ (the program is available electronically; see "Resource Center," page 5) with three classes: Vehicle, Car, and Truck. Figure 3 shows the class hierarchy automatically generated by Doxygen.

Before using Doxygen on a project, you can request that it make a configuration file in the directory by running the program with the -g option. This produces a file with various default values that you can edit. In this case, I changed PROJECT_NAME and JAVADOC_AUTOBRIEF. I also turned off the option to produce LaTeX output.

After you have completed the configuration file, you can run Doxygen with no options to produce the output. The HTML output is in the HTML directory (although you can change the output directory in the configuration file). Simply open a browser and check it out. Class names are automatically cross referenced. This allows you to click on a name anywhere in the document and jump to the related documentation.

Extending Doxygen's Reach

If you use C, C++, or even Java, Doxygen is great. But what about other languages? Doxygen can't work with every possible language, but it does have an escape hatch that you can use to make many of its features work with other languages.

In the configuration file, there is an INPUT_FILTER option. You can set this option to a filter that preprocesses each file before Doxygen reads it (you also have to set FILTER_SOURCE_FILES to YES). In my case, I wanted to write a filter to translate PBasic code (a dialect of Basic) into something that was close enough to C that Doxygen could create documents from it.

Of course, some Doxygen features won't make sense with a PBasic source file. There are no classes, for example, so there is no class hierarchy. In addition, Doxygen's ability to show you source files will be useless since the source files won't look like the ones the end user sees. The key is to customize the configuration file so that Doxygen doesn't generate unwanted output.

Fortunately, Doxygen is forgiving and you don't have to do much translation to make a file "C-like"—at least enough to get meaningful documentation. I wrote my filter using awk (see Listing One). It handles a few transformations:

The processing of the labels is probably the oddest part of the transformation. Although the results are not always perfect, they are usually good enough to get a meaningful document from the source code and that's the goal.

I've included a library (available electronically) that I wrote to interface the Basic Stamp to a floating-point math coprocessor and the HTML output from Doxygen. Although you would not want to view the filtered source code, the output from Doxygen is quite useful and that's what counts (you can see some of the output in Figure 2).

Wrap Up

Too often documentation is left for the last part of a project and that means it is usually the first thing to get cut when deadlines loom or newer projects start clamoring for attention. With tools like Doxygen, you can document "just in time" (to borrow a management phrase). Not only does this make document creation less tedious, it also helps other developers use your code during the project. Because there is little impact to adding Doxygen comments, there isn't much to lose and quite a bit to gain.

If you work by yourself, you might be tempted to simply not document. When I feel that temptation, I think about three things:

At first glance, you may think Doxygen isn't as useful for user-level documentation as it is for developer documentation. To some extent, this is true. However, as the Doxygen manual demonstrates, you can produce actual user documentation with a little effort. Of course, its true strength is in developer documentation. Doxygen should be an essential part of every programmer's toolkit.

DDJ



Listing One

BEGIN {
  cflag=0;  # comment in progress
  ocflag=0; # old comment in progress
  inblock=0;  # block flag
  afterbrace=0; #brace at end?
}

# user function to make a string have no regexp metachars
function escape(s) {
  gsub(/([\[\]\\\^\$\.\|\(\)\*\+\?\{\}])/,"\\\\&",s);
  return s;
  }


#Transform constants
$2=="con" || $2=="CON" || $2=="Con" || $2=="cOn" || $2=="coN" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)
  sub(b,"")
  sub(apat "[ \t]+" cpat,a " = " c);
}

# Transform x var y => y x; (should preserve comments)
$2=="var" || $2=="VAR" || $2=="Var" || $2=="vAr" || $2=="vaR" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)

  sub(b,"")
  sub(apat "[ \t]+" cpat,c " " a);
}


# Assume this is not a comment
  {
    cflag=0;
  }


# Whole line comment, start or extend
/^[ \t]*'/ {
  cflag=1;
# start
  if (ocflag==0)  sub("'","/*"); else sub("'","");
  ocflag=1;
  print;
  next;
  }

# process partial line comments and semicolons
  {
  if (!/[ \t]*#/) {
  if (/'/) {
    sub("'","; /*");
    $0=$0 " */"
  } else {
    if (!/:[ \t]*$/) $0=$0 ";"
    if ($0==";") $0=""
    }
   }
  }
   
# process all lines  
  {
  if (ocflag==1) {  # close pending comment
     print "*/";
     ocflag=0;
     }
  }

# handle label
match($1,/.*:/) {
  if (inblock==1) print "}";
  sub(":","",$1);
  print $1 "() {";
  inblock=1;
  sub($1,"");
}

# handle end of quasifunction
toupper($1)=="GOTO" || toupper($1)=="RETURN" {
  inblock=0;
  afterbrace=1;
}

 {
  print;
if (afterbrace==1) { 
  print "}";
  afterbrace=0;
   }
  }
Back to article

October, 2004: Examining Doxygen

Figure 1: C++ program documented by Doxygen.

October, 2004: Examining Doxygen

Figure 2: PBasic program documented by Doxygen.

October, 2004: Examining Doxygen

Figure 3: Class hierarchy.

October, 2004: Examining Doxygen

Doxygen with Windows

Since I move often between Linux, various UNIX flavors, and Windows, I use Cygwin (http://www.cygwin.com/) to provide a sane environment under Windows. Because Doxygen is a Linux-based program, it should be easy to run under Cygwin, right?

In fact, it does work well with Cygwin—but with one caveat. When I first tried running Doxygen, all of the generated graphics in the HTML output were showing up as broken links. Eventually, I realized that the target drive was mounted in Cygwin's text mode, which means that Cygwin was manipulating the line endings in the files. Apparently, Doxygen wasn't opening the images in binary mode, so Cygwin was cheerfully converting them into Windows text format! This, of course, corrupted the files and made them worthless.

The answer is to use a Cygwin file system that is mounted in binary mode. This is the default for the Cygwin mount command (a -t or --text overrides the default). You can explicitly use the -b or --binary flag to the mount command to force binary mode. If you run mount with no arguments, you'll see "(binmode)" after each volume mounted in binary mode.

—A.W.

October, 2004: Examining Doxygen

Tag Description
a Displays the next word in a special font.
addtoindex Adds entry to index (LaTeX).
addtogroup Creates a group that will be merged with other groups sharing the same name.
author Identifies the author.
arg Introduces parameters (arguments) to method calls.
b Bolds next word.
brief Provides a brief description of an element.
bug Identifies a known bug.
c Uses typewriter font on next word.
callgraph Requests a call graph.
class Identifies documentation that applies to the class as a whole.
code Starts a block of code (used to insert code into comments); ends with endcode.
copydoc Copies documentation from one member to another location.
date Inserts a date.
def Identifies documentation that applies to a #define.
defgroup Defines a group (which must be unique).
deprecated Identifies members that are deprecated.
e Italicizes next word.
file Identifies documentation that applies to an entire file.
htmlonly Sends HTML text directly to HTML output (ends with endhtmlonly).
if Conditionally includes documentation.
image Inserts an image.
ingroup Adds element to a particular group.
interface Identifies documentation that applies to an interface.
latexonly Sends LaTeX commands directly to LaTeX output (ends with endlatexonly).
link Creates a link to another element.
mainpage Defines the documentation seen on the main page.
manonly Sends text directly to MAN output (ends with endmanonly).
n Forces new line.
page Generates a page of documentation not directly tied to an element.
param Introduces a parameter.
ref Creates a cross reference.
return Documents a function return.
retval Documents a return value for a specific function.
sa Creates "see also" text.
see Creates "see also" text.
since Marks the date or version where an element first appeared.
todo Identifies items not yet complete.
var Documents a variable.
verbatim Sends a block of text directly to both HTML and LaTeX output (ends with endverbatim).
version Identifies a version of the code.
xmlonly Sends a block of text directly to XML output (ends with endxmlonly).

Table 1: Common Doxygen tags.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.