Dr. Dobb's | Footprints in the Butter: Part II

Footprints in the Butter: Part II

In his previous installment, Matthew focused on code bloat in source code. This month, he looks at object code size.

September 01, 2005
URL:http://www.drdobbs.com/footprints-in-the-butter-part-ii/184402011

September, 2005: Footprints in the Butter: Part II

Matthew Wilson is a software development consultant for Synesis Software, creator of the STLSoft libraries, and author of Imperfect C++ (Addison-Wesley, 2004). He can be contacted at http://stlsoft.org/.

Recall that I said in the previous installment of "Positive Integration" that I'd be focusing on the topic of code bloat in this two-part column. In the last installment, I examined source-code size. This month, I look at object-code size.

Binary Size

The first thing to do to trim the binary size was to establish a firm basis for comparison. The 1.5.3 and 1.6.2 libraries were built using the same settings. Tables 2 and 3 show the results. (Table 1, which lists the recls src directory contents appeared in the previous installment.)

The settings used for the five compilers were for minimal size and global/whole-program optimization: -O1 -Oz (Borland); -o+space (Digital Mars); -O1 -Og -GL (Intel); -O1 -Og (Visual C++ 6.0); and -O1 -Og -GL (Visual C++ 7.1).

Given that the 1.6.2 test files contain additional code for UNC paths and the new stat() functionality, the results demonstrate that there's no effective difference for the generated binaries. The library size has dropped, but the size of a statically linked library is not, in and of itself, important, rather the size of the symbols that are selected from it into a binary, hence the comparison of the C and C++ test files.

The values in Table 3 represent a base for all remaining figures, which is expressed relative to these. To make these meaningful, you need to find out how much of the sizes of the executables are a result of the client code and the runtime libraries, and how much to the recls library components linked into the executable. This was done by creating a single file containing empty implementations of all the recls API functions, and linking the compiled client C and C++ program object files to that stub. Table 4 shows the sizes of the executables when linked to the stub API file.

String Handling

Judging from Scenario A (Table 4), it looks like that code rationalization hasn't yet paid off. The first thing is to consider the effects of linking to standard libraries. Specifically, the recls libraries by default compile using the STLSoft basic_simple_string, to minimize coupling to the Standard Library so that they can be used in small footprint exes and dynamic libraries. (For example, recls is incorporated as a small part of a standard DLL of my company, built with Visual C++ 5 and not linking to the Standard Library, and coming in at less than 70 KB.)

But because the C++ test program links to the Standard Library, the instantiations of the member functions of the std::basic_string template are already paid for, so the next test Scenario B (Table 5) is to determine the effect of using std::basic_string throughout. Table 5 shows the results. (For this and the remaining scenarios, the percentage figures in the fourth and sixth columns represent the relative sizes of each test client's code after subtracting the "stub-size" in Table 4. In other words, it's a good indicator of the actual size of code in the executable that is part of recls itself, rather than the test program as a whole.)

Except Borland, for which there is a significant saving in the use of std::string, the other compilers show an increase, rather than a reduction, in binary code size. (In several cases, tables averages with and without Borland are presented because it tends to have widely different results to the other compilers, and can skew the results.)

Pattern Validation

In playing around with applying #if 0 here and there, I discovered some surprising amounts of code bloat. Amazingly, in Listing 1, when #ifdef'd out, it reduces the object code size of the containing file by 24 KB (with VC++ 6). Listing 1 is used to ensure that a multipart pattern does not contain a dots directory. This got me to experimenting with the pattern validation. The first thing I tried was using STLSoft's basic_string_view, which is a string-like class template that presents a view onto a C-style string, rather than allocating and writing its own copy. It's perfectly suited to the code previously mentioned because the results of the tokenization are not preserved, just manipulated within the find() algorithm. Using basic_string_view instead costs a slight increase (Scenario C, Table 6).

So there's 20 KB or so of code to go at. Changing the implementation to one based purely on strstr() and strncmp() results in Table 7 (Scenario D).

That's a significant gain in the size of the libraries—about 7 percent—but a modest gain in the executables—about 3 percent. But we're pretty much down to the maximum gain for that particular area of the code because removing pattern validation entirely (Scenario E, not shown) results in only an approximate additional 1 percent saving across the board.

Customization Versus Generality

Looking at the individual object sizes gives a clearer picture of where the bloat is, as can be seen in Figure 1. The left-most data point for each file represents the average object size for the base 1.6 build (Table 3). Clearly, ReclsFtpSearchDirectoryNode_win32 has the largest footprint.

In the September 2004 installment of this column, I described the use of the searchspec_sequence template:

[M]ultipattern filtering is handled via the...STLSoft searchspec_sequence class. It is parameterized with the underlying sequence type whose model it emulates...and uses the string_tokenizer template to split the pattern. The iterators that it presents...iterate over a number of ranges of the underlying search sequences, corresponding to the number of patterns. In other words, it linearizes N ranges into a single one...I wanted to be able to insert a sequence class into the recls implementation with minimal disruption to the existing, and working code. Because searchspec_sequence presents a functionally identical interface to its underlying sequence, this was almost a no brainer.

The searchspec_sequence is an adapter that presents a view spanning multiple traversals of an underlying filesystem sequence. Unfortunately, this generalization suffers from code bloat to a significant degree. Scenario F (Table 9) uses an updated implementation for inetstl::basic_findfile_sequence that tokenizes the search patterns internally, based on a delimiter passed to the constructor along with the other arguments. By skipping the adapter searchspec_sequence you save on object code size and on runtime cost, at the expense of having to write an enhancement to the basic_findfile_sequence class template. Table 8 shows the significant size savings to be had, roughly about 17.5 percent in object code size.

This translates to about 4 percent saving on the library size and around 7 percent for the C++ test program (see Table 9); the C test program does not exercise the recls FTP functionality.

Best Cases

Combining all these advances, you arrive at Scenario G (Table 10), which represents a saving of about 10 percent in library size and 3-9 percent in executable sizes.

Scenario H (Table 11) summarizes the relative gains for Scenarios B-F, in terms of the library, C, and C++ test programs, with and without Borland. I've managed to make savings of a total of about 10 percent on library and test programs.

Miscellaneous Optimizations

There are other optimizations possible, such as the 195 bytes saved by collapsing the switch in lookup_error_string_() (in recls_api.cpp) into a struct table, but such things are grasping at straws, and are probably below the limit at which point it's worthwhile. When you've reached such measures, it's usually clear that a bigger answer is needed.

Conclusion

In broad terms, you can see that:

Using stlsoft::basic_string_view costs about 1 percent in code size. Because it only comes into play during pattern validation at the start of a search, any memory/speed efficiencies are irrelevant, so it is not used. If it were code that was invoked repeatedly throughout the search process, then the increased speed efficiency would be worth the slight increase in code size, but as it is, it's best left out.
Using std::basic_string instead of stlsoft::basic_simple_string costs 4-8 percent. The library will continue to use stlsoft::basic_simple_string by default and provide customizability for compilers, such as Borland, for which the opposite relation holds.
Manual pattern validation saves 1.5 percent. Because it can be implemented with relative ease in terms of Standard C functions, it will be adopted.
Using the enhanced InetSTL findfile_sequence saves up to 7 percent, so it will be used henceforth. Coupled with the fact that the new implementation also eliminates the memory allocation and the complexity of the searchspec_sequence adapter, I intend to fit this new functionality into the UNIXSTL and WinSTL components, for (local) filesystem searching. You can reasonably guestimate that this brings an additional reduction in code size of, perhaps, 3-6 percent.

In some senses this activity has succeeded, and in others, it has failed. Where it's failed is in bringing significant reductions in source code size. In part, this has been because the refactoring has naturally caused some parts of the software to be segregated into separate files. Each source file has nontrivial overhead, and rightly so—trimming this down to the bare minimum would mean that the source would become impenetrable. We must accept that a medium-sized library, such as recls consists of a large amount of source code, even when it compiles down to a small amount of object code.

It is the object-code area in which I think there's been a measured degree of success. Sure, the refactoring did not result in any significant savings, but the subsequent directed attempts at reduction have identified some areas, specifically, the string tokenization and the searchspec_sequence adaptation, that have resulted in significant amounts of bloat. This is a shame because the STLSoft string_tokeniser component is easy to use and highly speed efficient, and the searchspec_sequence class, well, was a lot of work! But it goes to show that, for many compilers, optimization of complex template code does always compare with manually coding equivalent algorithms. It also shows that generalized component adapters may introduce unexpected amounts of overhead compared with making enhancements to the class(es) being adapted. Whether you choose to eschew simple, tested, reliable, and already written library components depends on factors outside the scope of this article, but it's worthwhile thinking about before you do.

Writing open-source libraries is, more than anything else, an exercise in battling hindsight. Before I did the research that constituted this installment, I was pretty sure, in light of empirical responses from recls users, that recls 2 would have to be written in C to avoid code bloat. Interestingly, now I am less sure because I've seen that, for some compilers, the entire footprint of recls within a compiled executable is no more than around 20 KB—a heartening statistic, to be sure. One thing I am certain of is that the implementation will have to take into consideration factors of code size on a component-by-component basis, to keep the sizes down.

Next Time

I'd intended to cover the recls/Python mapping—also featured in recls 1.6.1—but space and time have not been my friend. Although the Open-RJ/Python mapping was covered in the March 2005 installment, there are more lessons to be learned from the recls/Python.

September, 2005: Footprints in the Butter: Part II

Figure 1: Average object code sizes for recls 1.6.2.

September, 2005: Footprints in the Butter: Part II

Listing 1

// "." can only be specified as a pattern in nonrecursive searches
if(RECLS_SUCCEEDED(rc))
{
  using stlsoft::basic_simple_string;
  using stlsoft::string_tokeniser;
  typedef basic_simple_string<recls_char_t>     string_t;
  typedef string_tokeniser<string_t, string_t>  tokeniser_t;

  tokeniser_t   toks(pattern, Recls_GetPathSeparator());
  if(toks.end() != std::find(toks.begin(), toks.end(), "..") ||
      ( 0 != (flags & RECLS_F_RECURSIVE) &&
        toks.end() != std::find(toks.begin(), toks.end(), ".")))
  {
    rc = RECLS_RC_DOT_RECURSIVE_SEARCH;
  }
}

September, 2005: Footprints in the Butter: Part II

Table 10: Scenario G. Binary sizes of recls 1.6.2 (stlsoft::simple_string throughout; manual pattern validation; enhanced InetSTL findfile_sequence).

Compiler	lib (MT)	C Test Program (MT)		C++ Test Program (MT)
Borland	395,776	126,976	78.0%	245,248	73.1%
Digital Mars	385,536	99,356	96.4%	119,836	91.7%
Intel	404,358	71,680	96.2%	86,016	91.3%
Visual C++ (6)	220,844	65,536	97.9%	74,752	89.8%
Visual C++ (7.1)	1,672,984	60,416	97.6%	70,656	91.7%
Average	615,900	84,793	89.8%	119,302	81.8%
Average (no Borland)	670,931	74,247	96.8%	87,815	91.2%

September, 2005: Footprints in the Butter: Part II

Table 11: Scenario H. Relative gains for Scenarios B-F.

	Lib	Lib (w/o Borland)	C Test	C Test (w/o Borland)	C++ Test	C++ Test (w/o Borland)
B	127.5%	133.1%	97.3%	104.0%	74.7%	108.8%
C	103.6%	104.8%	95.7%	100.7%	97.0%	101.1%
D	93.3%	93.7%	89.8%	96.8%	92.4%	97.2%
E	93.2%	93.5%	88.0%	94.6%	91.7%	96.1%
F	96.2%	96.8%	—	—	88.7%	93.3%
G	89.6%	90.4%	89.8%	96.8%	81.8%	91.2%

September, 2005: Footprints in the Butter: Part II

Table 2: Binary sizes of recls 1.5.3.

Compiler	lib (MT)	C Test Program (MT)	C++ Test Program (MT)
Borland	487,936	145,408	297,472
Digital Mars	432,128	94,236	125,468
Intel	505,930	72,704	91,136
Visual C++ (6)	261,584	64,512	76,800
Visual C++ (7.1)	1,783,032	59,904	73,216
Average	694,122	87,353	132,818
Average (no Borland)	745,669	72,839	91,655

September, 2005: Footprints in the Butter: Part II

Table 3: Binary sizes of recls 1.6.2 (common settings).

Compiler	lib (MT)	C Test Program (MT)	C++ Test Program (MT)
Borland	470,528	145,408	287,744
Digital Mars	401,408	101,404	123,932
Intel	488,640	73,216	90,112
Visual C++ (6)	252,100	66,048	77,312
Visual C++ (7.1)	1,825,996	60,928	72,704
Average	687,734	89,401	130,361
Average (no Borland)	742,036	75,399	91,015

September, 2005: Footprints in the Butter: Part II

Table 4: Scenario A. Sizes of test executables linked to the stub form of library.

	C Test Program (MT)	C++ Test Program (MT)
Borland	61,440	130,048
Digital Mars	44,572	74,780
Intel	33,280	43,008
Visual C++ 6	41,984	52,224
Visual C++ 7.1	39,936	48,128

September, 2005: Footprints in the Butter: Part II

Table 5: Scenario B. Binary sizes of recls 1.6.2 (std::basic_string throughout).

Compiler	lib (MT)	C Test Program (MT)		C++ Test Program (MT)
Borland	433,664	133,632	86.0%	198,144	43.2%
Digital Mars	510,464	106,012	108.1%	132,636	117.7%
Intel	515,144	73,728	101.3%	91,648	103.3%
Visual C++ (6)	264,894	66,560	102.1%	78,848	106.1%
Visual C++ (7.1)	2,661,210	60,928	100.0%	73,728	104.2%
Average	877,075	88,172	97.3%	115,001	74.7%
Average (no Borland)	987,928	76,807	104.0%	94,215	108.8%

September, 2005: Footprints in the Butter: Part II

Table 7: Scenario D. Binary sizes of recls 1.6.2 (stlsoft::basic_simple_string throughout + manual pattern validation).

Compiler	lib (MT)	C Test Program (MT)		C++ Test Program (MT)
Borland	429,568	126,976	78.0%	268,800	88.0%
Digital Mars	392,192	99,356	96.4%	122,396	96.9%
Intel	436,376	71,680	96.2%	89,088	97.8%
Visual C++ (6)	239,154	65,536	97.9%	76,800	98.0%
Visual C++ (7.1)	1,712,018	60,416	97.6%	71,680	95.8%
Average	641,862	84,793	89.8%	125,753	92.4%
Average (no Borland)	694,935	74,247	96.8%	89,991	97.2%