Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Unraveling Optimization in Microsoft C 6.0


OCT90: UNRAVELING OPTIMIZATION IN MICROSOFT C 6.0

Selecting the right switch for the right job

Bruce has worked in the computer industry for more than ten years, holding a variety of technical and marketing positions at corporations including General Dynamics, Tektronix, and Xerox. He is currently an independent consultant and can be contacted at P.O. Box 5703, Bellevue, WA 98006.


Optimizing compilers can be both a blessing and a curse. Certainly, we are all happy to have our code automatically transformed into smaller, faster packages: We spend less time sifting through long source listings to uncover inefficiencies, we can write code in more readable form, and we don't need to resort to assembly language for optimum speed as often.

However, automated code transformation has its drawbacks, including the tendency to decrease diligence on the part of some programmers. There is a temptation to get sloppy because "the compiler will take care of it." The compiler does, indeed, handle many inefficiencies, but not all of them. Understanding exactly what a compiler will do to your code (and what it will not do) is an important step toward producing software that displays both speed and readability. In short, optimizing compilers are a supplement to hand optimization, not a substitute for it.

This article focuses on both the practical and theoretical aspects of code optimization, using Microsoft's recently introduced C Version 6.0 compiler (C6) as the tool.

You usually have several goals in mind when adopting an optimizing compiler. Among them may be to:

  • Decrease the number of processor cycles required by the program.
  • Decrease the number of address calculations performed.
  • Minimize the number of memory fetches.
  • Minimize the size of the object code.
In all optimizing compilers, most of these goals are achieved primarily through "local" optimizations. Local optimizations are improvements to small, mostly self-contained blocks of code such as loops, switches, and if-else constructs. Despite all the hype from compiler vendors on "global optimization," most code transformations are still performed at the local level. Typical local optimizations include elimination of common subexpressions and dead variables, movement of loop-invariant computations, and folding of constants into a single value.

In general, local optimizations are fairly easy to perform, mostly because they are easy to detect and implement. Compilers have little trouble fixing such local inefficiencies because they are reasonably independent of the program's flow of control on a macroscopic level. The behavior of variables and expressions within small blocks is, for the most part, predictable; the number of possible values taken by these items, as well as how they fit within the flow of control, can be understood.

However, programs contain more than just a series of isolated blocks, and until recently, commercial compilers made almost no effort to optimize across entire routines. In an attempt to produce still more efficient code, compilers are now appearing on the market with the ability to perform "global" optimizations. This type of optimization looks at code on a broader scale, seeking inefficiencies that affect multiple blocks. This task is much more difficult to achieve because of the more complex data flow analysis that must be performed by the compiler.

Strategies for Using C6 Optimizations

Although somewhat of a simplification, the "90/10" program execution rule is generally true. That is, 90 percent of a program's execution time is spent in about 10 percent of its code. Optimizing a routine to achieve a 50 percent speed increase sounds impressive, but if your program spends an average of 110 microseconds in that routine for every minute of execution time, the results are hardly noticeable or worth the time to achieve. Simply turning on a variety of optimizing switches (such as with /Ox) at compile time is likely to yield only modest improvements in speed. A small amount of extra work can give you much better results. These results can be achieved, following the 90/10 rule, by implementing an optimization strategy that focuses only on critical sections of code that represent the greatest amount of "execution drag." Uncovering these critical portions of code can be done with a good profiling tool. Unfortunately, C6 does not provide such a tool. For this article, I used Inside! profiler (Paradigm Systems, Endwell, N.Y.) to produce a number of useful execution statistics.

Paradigm's profiler is available in a number of versions that are compatible with debug information and map files generated by C6/QuickC, Turbo C, Zortech C++, and other non-C compilers. This tool has several strengths, including the ability to provide timing statistics at the function, block, and source-line levels. Some profilers can identify which functions are consuming the most cycles, but will provide no insight into which elements within a function represent candidates for optimization. Block- and line-level analysis enables the programmer to identify exactly where cycles are being consumed within a function, eliminating guesswork and saving time.

Inside! also has the ability to isolate time spent during DOS calls, enabling you to measure the impact of servicing various DOS interrupts. It is often the case that DOS I/O represents the largest percentage of execution time within a function. For example, if a loop processes 100 strings and writes each to a file using fputs( ) on every loop iteration, there is no question that this consumes an enormous amount of cycles. A much better strategy would be to buffer all 100 strings in memory during the loop and write them as a block using a single fputs( ) function after the loop completes. Optimizations such as this can yield execution times that are orders of magnitude faster. Profilers make optimizations such as this more obvious.

For maximum performance, the following strategy is recommended, assuming your code is already in a reasonably stable state:

    1. Disable all optimizations on your compiler (/Od) and produce an executable.

    2. Use a profiler to determine which functions consume the majority of cycles. Most profilers provide a "percentage of total runtime" value for each function. Record all statistics in a file.

    3. Recode the major algorithms in these functions (if possible) for more efficiency, and make local and global optimizations (such as elimination of redundant subexpressions and unfolded constants) by hand.

    4. Declare all major functions with the _fastcall keyword, and use based pointers for access to far code and data. Elimination of aliasing, if possible, will enable more aggressive compiler optimization when you get to that stage.

    5. Produce another executable, again with compiler optimization disabled, and run your profiler. Compare the new statistics to your previous values to measure progress.

    6. Compiler optimizations can now be used. Place pragma-level switches in those files containing the cycle-hungry functions, and/or use option file-level switches where appropriate.

    7. Run the program several more times under control of the profiler using different combinations of pragmas and/ or switches to determine which yield the best results. Save all profiler statistics, and compare these with the original statistics to ensure that the compiler is helping, not hurting. A "switch table" (see Table 1) will help you record your times.

Table 1: The "switch" table

  Switch    d        l        c        g        e        eg
  ------------------------------------------------------------

  Listing
  One       385.3    234.2    279.3    215.7    239.9    189.3
  Two       312.7    228.0    241.1    208.0    207.4    179.5
  Three     231.2    167.8    172.6    164.9    149.0    145.4

  Switch    lg        le      leg      x        a        al

  Listing
  One       258.3    196.2    177.0    180.2    279.3    234.2
  Two       259.9    185.6    175.3    181.9    241.1    227.9
  Three     203.5    139.7    138.0    140.4    172.5    167.8

  Switch    ag       age      aleg     ax       axz      zl

  Listing
  One       215.8    189.3    177.0    180.1    180.1    234.2
  Two       208.1    179.5    175.3    181.9    181.9    228.1
  Three     164.8    145.4    138.0    140.4    140.4    167.9

  Switch    zlg      axz G2

  Listing
  One       258.3    177.2
  Two       259.9    180.8
  Three     203.6    144.0

    8. Optimize all of the minor routines of the program using only file-level switches or pragmas. Hand-optimization of these minor routines will probably not be worth the effort.

Although this approach may seem overly thorough, it usually yields the best results. Step 3 might seem to defeat the purpose of an optimizing compiler, but there is no amount of compiler optimization that can compensate for a poorly designed algorithm. Typi cally, the best that can be expected from an optimizing compiler is a 10-50 percent improvement in speed for well-written code. Recoding an algorithm may boost performance many times this amount.

The Role of Aliasing in Optimization

Most compilers have trouble dealing with aliased variables. Using aliases will inhibit most of the optimizations that could potentially be performed -- and there are many. The /Oa switch is one of the most effective optimizations because it enables many local code improvements that otherwise could not be done.

Consider the code in Example 1. In this example, *p is an alias of the variable x. In MS C5.1, if the /Oa option is used during compilation, the compiler will assume that no variables have been aliased (which is not true). The expression x+y+z will be seen by the compiler as being constant (or "loop-in-variant") for the duration of the loop. There is thus (apparently) no need to compute x+y+z 100 times, and it will be hoisted out of the loop and replaced with its assumed constant value. However, the assignment of k to *p (which is really x), means that x+y+z is not constant, and the program's results will be incorrect when executed. This problem would not have happened if aliasing had been declared (/Oa is not specified), because subexpressions are not factored (hoisted) from loops in this case.

Example 1: *p is an alias of the variable x.

  void main (void)
  {
           int x, y, *p;
           p = &x;
           x = 1; y = 2; z = 6
           for(k = 1; k <= 100; k++)  {
                                       k += x + y + z;
                                       *p = k;
              }
  )

More Details.

C6 does a much better job of handling aliased variables. In the code sample, even if the programmer specifies /Oa at compile time, the C6 preprocessor will notice that the address of x has been taken and will assume that its value might not be constant within the loop. Because y and z have not had their address taken, the compiler assumes that these variables are constant, and will remove them while leaving x correctly within the loop. C6 therefore makes the /Oa option safer than its former implementation in C5.1.

In some cases, aliasing can occur without the explicit declaration of the address operator (such as when two pointers are equated), and the preprocessor will not detect it. However, if an option is selected that invokes the global optimizer (such as /Oe, /Og, or /Ol), the optimizer itself will perform a data flow analysis that can detect aliases which occur without the address operator. Thus, switch combinations such as /Oae and /Oal represent an additional layer of safety above /Oa.

It should be mentioned that the "a" and "z" options have been eliminated from /Ox in C6. Microsoft explains that this step was taken to increase the safety of /Ox. For some code, /Ox might actually produce slower execution times in C6 than in C5.1. However, /Ox will also invoke the new global optimizer, meaning that execution times may still be faster in C6, even with the elimination of "a" and "z." Simply specifying /Oxaz will include these two switches during compilation.

Another step Microsoft has taken to deal with the aliasing problem is the new /Ow switch, which tells the compiler to assume no aliasing except when calling functions (which typically pass pointers). This switch flushes all variables held in temporary locations (such as registers) back to memory before each function call, producing only one instance of a variable that can be modified by a function. This prevents the assignment of two different values to the same variable -- one in memory and one in a register.

The Effectiveness of Compiler Switch Combinations

Considering the wide variety of switches and options in today's compilers, how do you know which switches are appropriate for a particular program? Some may yield good results, while others are disappointing. Because of the infinite variety of programs, there is no chart that indicates which combination of switches should be optimal for your software. However, there are a number of rough guidelines that can be followed. Some sample code will help illustrate the process.

Consider Listings One through Three (page 104). These are all different forms of a program that approximates the definite integral of three functions and prints the results. The three functions are:

y(x) = 2x<SUP>2</SUP> + 46x + 10
y(x) = 3x + 54
y(x) = x<SUP>3</SUP> - 72x<SUP>2</SUP> + 14

The definite integral may be thought of as the area between the curve of the function (as plotted in the xy plane) and the x axis. This particular program measures the area for the interval between x = 0 and x = 100. The area is approximately equal to the sum of 100 rectangles, each of length 1 and height y(x), where x is the midpoint of the base of each rectangle.

Listing One was designed to test several optimization features. The bulk of the test is provided by the compute_areas function, which contains the following challenges:

  • There are more register variable candidates than there are registers.
  • There are several subexpressions (such as a*b) that are both locally and globally redundant.
  • Several constant expressions (such as a*b+c+d) are not folded into a single value.
  • The program is loop oriented, and some of the inefficiencies center around these loops.
Notice that none of the variables in this listing have been explicitly declared as "register." The algorithm itself is also inefficient. Three of the loops can really be combined into one, and there is no need to sort the array elements in order to compute an integral approximation. In addition, two of the loops are repeated 100 times in order to provide more significant digits in the output from the profiler. They are not meant to test the compiler.

Listing Two is an attempt to anticipate compiler optimizations and perform them by hand. None of the algorithmic inefficiencies are fixed. Listing Three combines the improvements made in Listing Two, and makes one algorithmic enhancement --two of the loops are combined into one.

Table 1 lists the execution times for different combinations of switches on each code listing. The test was performed on a Compaq 386/20 with no floating-point hardware (which would not have been used anyway on this integer-based code). Test results are in milliseconds, and the accuracy of the profiler is within 10 microseconds (+/- .01 millisecond). The last digit was rounded to the nearest hundredth of a millisecond.

In most cases, the test results were predictable, but there are some surprises. The predictable results are as follows:

  • On the average, the optimizations performed by hand in Listing Two did not help much when it came to execution time. At least this proved that the compiler did its job well in Listing One, and that certain hand optimizations are a waste of time.
  • Loop optimization (/Ol) almost never improved performance when used in tandem with g or c. Because the major problem with this program's loops is redundant subexpressions, loop optimizations are effectively equivalent to reduction of common subexpressions.
  • Specifying /G2 (use 80286 instruction set) did not improve performance. I suspect that this was because the generated code was fairly simple and handled equally well by the 8086 instruction set.
  • The /Og switch (global subexpression reduction) was much more effective than /Oc (local subexpression reduction). For this code, c optimizations were a subset of g optimizations.
The surprises were as follows:

  • It is not surprising that the so-called "maximum optimizations" such as /Ox and /Oaxz /G2 did not produce the fastest times. The surprise is that they were beaten slightly by /Oleg and /Oaleg. (I am unable to explain this because /Ox supposedly includes the /Oleg optimizations. It may be useful to try /Oleg on loop-intensive routines to see if it outperforms other optimizations.) Although only two to four percent better than the maximum, this extra performance may be critical in some applications such as animation, where every millisecond is important.
  • Although not shown in the matrix, I declared all integer variables as "register" in all three listings and ran the tests. Performance was equal to or worse than the times shown in the matrix for all switches. I am unable to explain this.
  • The /Oe option (global register allocation) was extremely effective. Specifying only /Oe produced very fast times. Generally, any combination of switches that included option e outperformed any other combination of switches without option e.
These code listings are isolated examples. Although this was not a benchmark, C6 performed admirably, providing up to a 54 percent increase in performance over unoptimized code. Different code will undoubtedly give different results, and experimentation is important.

Conclusion

C6 is much more than just a code generator. It is a powerful tool that requires a certain amount of skill to use effectively. The popular technique among C5.1 programmers of consistently using the /Ox option (which enables many optimizations at once) has less merit in C6. The point is that optimization is the responsibility of both the compiler and the programmer. Programmers who understand that compilers are really a supplement to hand optimization, rather than a substitute for it, will benefit the most. Experimentation with these new optimizing features may be time consuming, but in the end, it is very much worth the effort.

Types of Optimizations Provided by C6

Among the more important optimization features in Microsoft C 6.0 are pragma-level optimization, global optimization, new peep-hole optimizations, based pointers, and in-register parameter passing (_fastcall). With the exception of based pointers (which are described in DDJ, August 1990, page 85), I'll briefly discuss each of them.

Pragma-Level Optimization

In C 5.1, some optimizations had to be disabled because the behavior of the optimized code could be unpredictable. For example, imagine a program file that consists of three functions, the first of which uses aliased variables. In this case, you must disable alias-sensitive optimizations across all three functions, and the ability to optimize the two functions which do not use aliasing is lost.

In C6, you can fully optimize the two non-aliased functions while still preventing alias-sensitive optimizations within the first function. This is accomplished by placing new "pragma" statements within the program module.

Pragmas are not optimizations per se, but are control directives parsed by the compiler. With pragmas, virtually all optimizing switches can be toggled on or off at the function level. You can also use these constructs to toggle combinations of switches such as #pragma optimize("lec", off), which will turn off loop optimization, global register allocation, and reduction of common subexpressions.

Of the many new optimization-related enhancements offered by C6, pragmas could have the greatest overall impact on program performance. They allow a finer level of control over the compiler than was possible in C5.1, and take greater advantage of the programmer's own knowledge of the code.

Global Optimization

With C6, programmers can use the /Oe switch, which tells the compiler to manage register variables on a function-wide basis. When this switch is specified on the command line (or in a pragma), the frequently used register keyword is ignored, and the compiler takes over. Although it is a good bet that this switch will result in speed improvements, it is a mistake to assume that the compiler can always do a better job of register-variable allocation than you can. For functions that already declare register variables, use a good profiling tool to measure the program's performance both with and without the /Oe switch in order to verify that performance really does improve. The same advice applies to any optimizing operations -- verify to be sure. Another C6 global optimization factors common subexpressions across entire functions. The corresponding switch is /Og, which differs from its predecessor (/Oc) by searching for such subexpressions across entire functions. Consider this line of code: x = y[k] + (c * d + 100 * e); If the subexpression (c * d + 100 *e) is found to exist within two loops, three if-else constructs, and two switch-case blocks, it will not be factored 2 + 3 + 2 = 7 times, but just once. This, of course, is if the subexpression is completely invariant. If it is variant, it will be factored by the number of variances encountered.

Peephole Optimizations

In the world of compilers, a peephole optimization is simply a specialized improvement on a small (but potentially important) piece of code, typically just a few instructions. This is where assembly language expertise pays large dividends. Peephole optimization is performed by looking for special code patterns that may be replaced with more efficient equivalents. A simple example is "constant folding" which can be illustrated with this line of code: x =

12*(y + 25) - z*(y + r + 9 + 20) + 10;

The three computations which can be performed by the compiler before the program is actually run are: 12 * 25 = 300, 9 + 20 = 29, and 300 + 10 = 310. A good compiler will replace this code with its more efficient equivalent: x = 12*y + 310 - z*(y +r + 29); This optimization saves one multiply and two adds. In a loop of 100 iterations, this amounts to 100 multiplies and 200 adds, not to mention the associated register operations. C6 includes bit-shifting and switch operations. Code blocks having these two types of statements should run faster in C6 than in C5.1.

In-Register Parameter Passing

To minimize the large amount of memory I/O incurred by a function call, C6 includes the _fastcall convention which generates faster code by passing selected arguments through registers, rather than pushing and popping from the stack. _fastcall is a "strongly typed" calling convention, meaning that the register selected for a particular variable is related to the variable's data type (which defines its size).

_fastcall produces different results depending on the function. Functions passing only a few characters and/or integers will probably see the greatest speed increases because all parameters can be passed via registers. You control which functions should use the _fastcall convention by placing the keyword in the function declaration. For example: int_fastcall func(int, long, long, int, long); This feature can also be implemented on a file-wide basis with the /Gr option, which converts each function in your file to the _fastcall convention. It should be noted that _fastcall cannot be used with functions having variable-length argument lists, nor can it be used with functions containing _asm blocks. -- B.D.S.


_UNRAVELING OPTIMIZATION IN MICROSOFT C 6.0_ by Bruce D. Schatzman

[LISTING ONE]

<a name="0204_0011">

#include <stdio.h>
long area[3];
void compute_areas(void);

main()
{
   int i;
   extern long area[3];
   compute_areas();
   for(i = 0; i <= 2; i++) printf("The approximate area of function %d is %ld\n", i, area[i]);
}

void compute_areas(void)
{
   long temp, y0[100], y1[100], y2[100];
   extern long area[3];
   int i, k, a, b, c, d, n, m, j;

   a = 2; b = 5; c = 1; d = 3;
   area[0] = area[1] = area[2] = temp = 0;

       /*  compute 100 initial y values for all functions and approximate area under curves. Do this 100 times. */
   for (i = 0; i <= 99; i++) {
      for (j = 0; j <= 99; j++) {
         n = j-1;
         m = j+1;
         y0[j] = a*j*n + 48*j - (c+d);
         y1[j] = d*(j+10) + a*b;
         y2[j] = c*j*m + 73*j*j + n;
      }
   }
/* add a*b+c+d to all y values. Do this 100 times. */
   for (i = 0; i <= 99; i++) {
      for (j = 0; j <= 99; j++) {
         y0[j] += a*b+c+d;
         y1[j] += a*b+c+d;
         y2[j] += a*b+c+d;
      }
   }

/* bubblesort each array */
   for (i = 0; i <= 99; i++) {
      for (k = 99; k >= 1; k--) {
         if (y0[k-1] > y0[k]) {
            temp = y0[k];
            y0[k] = y0[k-1];
            y0[k-1] = temp;
         }
         if (y1[k-1] > y1[k]) {
            temp = y1[k];
            y1[k] = y1[k-1];            
            y1[k-1] = temp;
         }
         if (y2[k-1] > y2[k]) {
            temp = y2[k];
            y2[k] = y2[k-1];
            y2[k-1] = temp;
         }
      }
   }

   /*  now compute areas */
   for (j = 0; j <= 99; j++) {
      area[0] += y0[j];
      area[1] += y1[j];
      area[2] += y2[j];
   }

   return;
   }




<a name="0204_0012"><a name="0204_0012">
<a name="0204_0013">
[LISTING TWO]
<a name="0204_0013">

#include <stdio.h>
long area[3];
void compute_areas(void);

main()
{
   int i;
   extern long area[3];
   compute_areas();
   for(i = 0; i <= 2; i++) printf("The approximate area of function %d is %ld\n", i, area[i]);

}
void compute_areas(void)
{
   long temp, y0[100], y1[100], y2[100];
   extern long area[3];
   int i, k, a, b, c, d, n, m, j;

   a = 2; b = 5; c = 1; d = 3;
   area[0] = area[1] = area[2] = temp = 0;

 /*  compute 100 initial y values for all functions and approximate area under curves. Do this 100 times. */
   for (i = 0; i <= 99; i++) {
      for (j = 0; j <= 99; j++) {
         n = j-1;
         m = j+1;
         y0[j] = a*j*n + 48*j - 4;
         y1[j] = d*j +40;
         y2[j] = c*j*j*m - 73*j*j;
      }
   }

/* add 14 to all y values. Do this 100 times. */   
   for (i = 0; i <= 99; i++) {
      for (j = 0; j <= 99; j++) {
         y0[j] += 14;
         y1[j] += 14;
         y2[j] += 14;
      }
   }

/* bubblesort each array */
   for (i = 0; i <= 99; i++) {
      for (k = 99; k >= 1; k--) {
         if (y0[k-1] > y0[k]) {
            temp = y0[k];
            y0[k] = y0[k-1];
            y0[k-1] = temp;
         }
         if (y1[k-1] > y1[k]) {
            temp = y1[k];
            y1[k] = y1[k-1];
            y1[k-1] = temp;
         }
         if (y2[k-1] > y2[k]) {
            temp = y2[k];
            y2[k] = y2[k-1];
            y2[k-1] = temp;
         }
      }
   }

   /*  now compute areas */
   for (j = 0; j <= 99; j++) {
      area[0] += y0[j];
      area[1] += y1[j];
      area[2] += y2[j];
   }

   return;
}



<a name="0204_0014"><a name="0204_0014">
<a name="0204_0015">
[LISTING THREE]
<a name="0204_0015">

#include <stdio.h>
long area[3];
void compute_areas(void);

main()
{
   int i;
   extern long area[3];
   compute_areas();
   for(i = 0; i <= 2; i++) printf("The approximate area of function %d is %ld\n", i, area[i]);
}

void compute_areas(void)
   {
   long temp, y0[100], y1[100], y2[100];
   extern long area[3];
   int i, k, j;

   area[0] = area[1] = area[2] = temp = 0;

 /*  compute 100 initial y values for all functions and approximate area under curves. Do this 100 times. */
   for (i = 0; i <= 99; i++) {
      for (j = 0; j <= 99; j++) {
         y0[j] = 2*j*j - 50*j +10;
         y1[j] = 3*j + 54;
         y2[j] = j*j*j - 72*j + 14;
      }
   }

/* bubblesort each array */
   for (i = 0; i <= 99; i++) {
      for (k = 99; k >= 1; k--) {
         if (y0[k-1] > y0[k]) {
            temp = y0[k];
            y0[k] = y0[k-1];
            y0[k-1] = temp;
         }
         if (y1[k-1] > y1[k]) {
            temp = y1[k];
            y1[k] = y1[k-1];
            y1[k-1] = temp;
         }
         if (y2[k-1] > y2[k]) {
            temp = y2[k];
            y2[k] = y2[k-1];
            y2[k-1] = temp;
         }
      }
   }

   /*  now compute areas */
   for (j = 0; j <= 99; j++) {
      area[0] += y0[j];
      area[1] += y1[j];
      area[2] += y2[j];
   }

   return;
}


[Example 1.  ]

void main(void)
{
   int  x, y, *p;
   p = &x;
   x = 1; y = 2; z = 6
   for(k = 1; k <= 100; k++) {
         k += x + y + z;
         *p = k;
   }
)












Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.