Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Examining Doxygen


October, 2004: Examining Doxygen

Automatic documentation generation

Al is DDJ's Java newsletter editor. He can be contacted at alwal-williams.com.


Doxygen with Windows


As a consultant, I've worked with many different programming languages over the years. However, I don't think I've ever found a language I liked as well as C (and, by extension, C++). However, while I prefer C++, I still see features in other languages that I wish were in C++.

Another language I frequently use is Java. Nevertheless, most of the things I like about Java have nothing to do with the language itself. I do like Java's library and several of its ancillary tools, and one of my favorites is the Javadoc tool. This program snatches comments from source code and automatically generates HTML documentation. Since I write lots of libraries for other developers, this is a great convenience. With Javadoc, you can generate collateral documentation from comments in your code with almost no extra effort.

I've often wondered if I could coax or coerce Javadoc into working with C++ code. However, a better answer is Doxygen (http://www.doxygen.org/), an open-source Javadoc clone that understands C and C++ code. In fact, it also works with Java. Besides its multilingual capability, it also has a variety of output options including HTML, LaTeX, XML, RTF, PostScript, PDF, and even the UNIX man page format. With all those output formats, even Java programmers might want to try Doxygen.

But since I program in a variety of languages, can Doxygen help when I use something much different from C? Surprisingly, the answer is "yes"—but with some work. Some languages are close enough to C that most Doxygen features will work with them. However, for languages that are completely different, you can still write a filter that massages your target language into something close enough to C so that Doxygen understands it.

In this article, I first show how to use Doxygen in a C++ program. I then present a filter to process a Basic-like language (PBasic) used by Basic Stamp microcontrollers and make it available for Doxygen (for more on Basic Stamp, see http://www.parallax.com/). Basic is certainly very different from C. Not all of Doxygen's features make sense for a Basic program, but the results are still useful.

The Results First

So, what's the value to Doxygen? Figures 1 and 2 show sample HTML pages generated by Doxygen for a C++ program and a PBasic library. If you've ever seen Javadoc-generated pages, these won't surprise you.

Some of Doxygen's output is extracted from the semantics of the source programs. Other parts are from special comments you embed in your code. The entire process requires a special configuration file that specifies exactly what you want to do. Just as your development directory usually has a makefile that controls the build process, you also have a Doxyfile that contains the options used by Doxygen. This is simply a free-form text file that has options variables and their values. Doxygen automatically generates an example Doxyfile for you on request.

The source comments can use any of three formats:

  • Special comments can mimic a Javadoc comment by using an extra asterisk in a multiline comment (/** A special comment */).
  • You can also use an exclamation point instead of the second asterisk (/*! Another special comment */).
  • Two or more single line comments (//) that have at least one extra slash or exclamation point following the comment marker also form a special comment, for example:

////////////////////
/// A special comment
///////////////////

Tags

Special comments can contain Doxygen tags that let you specify particulars about your program. By default, comments refer to the next lexical element. So in a C header file, you might have:

/** This comment applies to function a */
int a(void);
/** This comment applies to function b */
int b(void);

However, you can also force a comment to apply to the previous element. You do this by starting the comment with a less than (<) sign. This is useful when you have a short comment after a variable declaration:

int i; /**< General-purpose variable */

The actual tags start with a backslash or the at sign () (you can use these interchangeably).

So, here's a function definition in a C header file:

/** brief Generate a hash code from the given key.
* This function will take a string key and produce
* a 32-bit hash code using the Q50 algorithm.
* param key The source key (user ID)
* return Hash code.
*/
long GetHash(CString key)
{
. . .
}

This comment uses three tags:

  • brief, which provides a brief description of the function.
  • param, which identifies a single parameter to the function (you may have more than one param tag).
  • return, which describes the function's return value.

The brief tag provides an abbreviated description used in lists. The rest of the comment is used in detailed descriptions of the function. However, I rarely use the brief tag. Instead, I usually set the JAVADOC_AUTOBRIEF configuration variable to YES. This causes Doxygen to interpret the first line of the comment as the brief comment. If a period occurs before the end of the line, then Doxygen stops the brief comment at the period.

There are many different tags available to use within comments (see Table 1 for a partial list). Some of these tags are not frequently used, but are there for compatibility with other documentation systems. You can also use many HTML markup tags to spruce up your documentation. For example, you might use <B> to make a bold element or an <A> to create a hyperlink.

Grouping

You can group related things by defining a group with a unique name. This is simple to do with a defgroup tag. Each group name must be unique. If you prefer to merge like-named groups, you can use addtogroup instead of defgroup.

Once you have a group, you can use ingroup to place items in that group. Of course, in your typical C++ source file, that means every item would have an identical ingroup statement. To simplify this common case, Doxygen recognizes that any items defined between { and } tags all belong to the last group you defined. So, for example:

/** defgroup DDJ Dr. Dobb's Group
{
*/

/** Subscribe. Processes a subscription request.
param req A request block.
return True if successful. False if an error
occurs (check g_error for cause).
*/
Boolean Subscribe(ReqBlock req);

/** Deliver electronic copy.
/** Sends current copyvia e-mail.
param issue An issue object
return True if successful. False if an error occurs
(check g_error for cause).
*/
Boolean Deliver(IssueObj issue);

/** } */

In this example, both Subscribe and Deliver are parts of the DDJ group. Of course, there might be other elements that are part of this group as well. You can also use the same special braces to group related items together (but only to a single level—you can't nest these groups). However, since Doxygen normally groups items together by their semantics, I rarely find this necessary.

Other Features

Doxygen can perform a variety of other tasks. You can write formulae that Doxygen will typeset for you, for example. It can even generate graphs of your program's dependencies, call tree, and class hierarchy. Another facility lets you include example code and test cases in the documentation output.

There are many ways to format things in Doxygen. In particular, there are several simplified formats for creating lists. However, since I am comfortable with HTML, I just use the HTML list mark up instead of Doxygen's custom tags.

It takes a while to get used to all the features Doxygen presents. If you have any doubt that it can generate sophisticated documentation, though, look at the Doxygen manual on its web site. The manual was created using Doxygen (of course)! You probably should start with a simpler project while learning Doxygen, but it is nice to know that the capability to produce something that complex is there once you master the tool.

A C++ Example

To illustrate Doxygen's features, I wrote a simple program in C++ (the program is available electronically; see "Resource Center," page 5) with three classes: Vehicle, Car, and Truck. Figure 3 shows the class hierarchy automatically generated by Doxygen.

Before using Doxygen on a project, you can request that it make a configuration file in the directory by running the program with the -g option. This produces a file with various default values that you can edit. In this case, I changed PROJECT_NAME and JAVADOC_AUTOBRIEF. I also turned off the option to produce LaTeX output.

After you have completed the configuration file, you can run Doxygen with no options to produce the output. The HTML output is in the HTML directory (although you can change the output directory in the configuration file). Simply open a browser and check it out. Class names are automatically cross referenced. This allows you to click on a name anywhere in the document and jump to the related documentation.

Extending Doxygen's Reach

If you use C, C++, or even Java, Doxygen is great. But what about other languages? Doxygen can't work with every possible language, but it does have an escape hatch that you can use to make many of its features work with other languages.

In the configuration file, there is an INPUT_FILTER option. You can set this option to a filter that preprocesses each file before Doxygen reads it (you also have to set FILTER_SOURCE_FILES to YES). In my case, I wanted to write a filter to translate PBasic code (a dialect of Basic) into something that was close enough to C that Doxygen could create documents from it.

Of course, some Doxygen features won't make sense with a PBasic source file. There are no classes, for example, so there is no class hierarchy. In addition, Doxygen's ability to show you source files will be useless since the source files won't look like the ones the end user sees. The key is to customize the configuration file so that Doxygen doesn't generate unwanted output.

Fortunately, Doxygen is forgiving and you don't have to do much translation to make a file "C-like"—at least enough to get meaningful documentation. I wrote my filter using awk (see Listing One). It handles a few transformations:

  • Basic comments become C comments.
  • Constant declarations (CON) become C-style initializers.
  • Variable declarations (VAR) become C-style declarations.
  • Each line receives a semicolon at the end.
  • Labels become functions that end with the first GOTO or Return encountered.

The processing of the labels is probably the oddest part of the transformation. Although the results are not always perfect, they are usually good enough to get a meaningful document from the source code and that's the goal.

I've included a library (available electronically) that I wrote to interface the Basic Stamp to a floating-point math coprocessor and the HTML output from Doxygen. Although you would not want to view the filtered source code, the output from Doxygen is quite useful and that's what counts (you can see some of the output in Figure 2).

Wrap Up

Too often documentation is left for the last part of a project and that means it is usually the first thing to get cut when deadlines loom or newer projects start clamoring for attention. With tools like Doxygen, you can document "just in time" (to borrow a management phrase). Not only does this make document creation less tedious, it also helps other developers use your code during the project. Because there is little impact to adding Doxygen comments, there isn't much to lose and quite a bit to gain.

If you work by yourself, you might be tempted to simply not document. When I feel that temptation, I think about three things:

  • In six months, I won't remember what I did or why I did it.
  • What if someone else has to wade through my code because I'm out of town or ill?
  • Well-documented code is an asset that I could sell if I had to raise cash. Undocumented code is certainly not a salable asset and could even be a liability.

At first glance, you may think Doxygen isn't as useful for user-level documentation as it is for developer documentation. To some extent, this is true. However, as the Doxygen manual demonstrates, you can produce actual user documentation with a little effort. Of course, its true strength is in developer documentation. Doxygen should be an essential part of every programmer's toolkit.

DDJ



Listing One

BEGIN {
  cflag=0;  # comment in progress
  ocflag=0; # old comment in progress
  inblock=0;  # block flag
  afterbrace=0; #brace at end?
}

# user function to make a string have no regexp metachars
function escape(s) {
  gsub(/([\[\]\\\^\$\.\|\(\)\*\+\?\{\}])/,"\\\\&",s);
  return s;
  }


#Transform constants
$2=="con" || $2=="CON" || $2=="Con" || $2=="cOn" || $2=="coN" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)
  sub(b,"")
  sub(apat "[ \t]+" cpat,a " = " c);
}

# Transform x var y => y x; (should preserve comments)
$2=="var" || $2=="VAR" || $2=="Var" || $2=="vAr" || $2=="vaR" {
  a=$1
  b=$2
  c=$3
  apat=escape(a)
  cpat=escape(c)

  sub(b,"")
  sub(apat "[ \t]+" cpat,c " " a);
}


# Assume this is not a comment
  {
    cflag=0;
  }


# Whole line comment, start or extend
/^[ \t]*'/ {
  cflag=1;
# start
  if (ocflag==0)  sub("'","/*"); else sub("'","");
  ocflag=1;
  print;
  next;
  }

# process partial line comments and semicolons
  {
  if (!/[ \t]*#/) {
  if (/'/) {
    sub("'","; /*");
    $0=$0 " */"
  } else {
    if (!/:[ \t]*$/) $0=$0 ";"
    if ($0==";") $0=""
    }
   }
  }
   
# process all lines  
  {
  if (ocflag==1) {  # close pending comment
     print "*/";
     ocflag=0;
     }
  }

# handle label
match($1,/.*:/) {
  if (inblock==1) print "}";
  sub(":","",$1);
  print $1 "() {";
  inblock=1;
  sub($1,"");
}

# handle end of quasifunction
toupper($1)=="GOTO" || toupper($1)=="RETURN" {
  inblock=0;
  afterbrace=1;
}

 {
  print;
if (afterbrace==1) { 
  print "}";
  afterbrace=0;
   }
  }
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.