Footprints in the Butter: Part II

In his previous installment, Matthew focused on code bloat in source code. This month, he looks at object code size.


September 01, 2005
URL:http://www.drdobbs.com/footprints-in-the-butter-part-ii/184402011

September, 2005: Footprints in the Butter: Part II

Matthew Wilson is a software development consultant for Synesis Software, creator of the STLSoft libraries, and author of Imperfect C++ (Addison-Wesley, 2004). He can be contacted at http://stlsoft.org/.


Recall that I said in the previous installment of "Positive Integration" that I'd be focusing on the topic of code bloat in this two-part column. In the last installment, I examined source-code size. This month, I look at object-code size.

Binary Size

The first thing to do to trim the binary size was to establish a firm basis for comparison. The 1.5.3 and 1.6.2 libraries were built using the same settings. Tables 2 and 3 show the results. (Table 1, which lists the recls src directory contents appeared in the previous installment.)

The settings used for the five compilers were for minimal size and global/whole-program optimization: -O1 -Oz (Borland); -o+space (Digital Mars); -O1 -Og -GL (Intel); -O1 -Og (Visual C++ 6.0); and -O1 -Og -GL (Visual C++ 7.1).

Given that the 1.6.2 test files contain additional code for UNC paths and the new stat() functionality, the results demonstrate that there's no effective difference for the generated binaries. The library size has dropped, but the size of a statically linked library is not, in and of itself, important, rather the size of the symbols that are selected from it into a binary, hence the comparison of the C and C++ test files.

The values in Table 3 represent a base for all remaining figures, which is expressed relative to these. To make these meaningful, you need to find out how much of the sizes of the executables are a result of the client code and the runtime libraries, and how much to the recls library components linked into the executable. This was done by creating a single file containing empty implementations of all the recls API functions, and linking the compiled client C and C++ program object files to that stub. Table 4 shows the sizes of the executables when linked to the stub API file.

String Handling

Judging from Scenario A (Table 4), it looks like that code rationalization hasn't yet paid off. The first thing is to consider the effects of linking to standard libraries. Specifically, the recls libraries by default compile using the STLSoft basic_simple_string, to minimize coupling to the Standard Library so that they can be used in small footprint exes and dynamic libraries. (For example, recls is incorporated as a small part of a standard DLL of my company, built with Visual C++ 5 and not linking to the Standard Library, and coming in at less than 70 KB.)

But because the C++ test program links to the Standard Library, the instantiations of the member functions of the std::basic_string template are already paid for, so the next test Scenario B (Table 5) is to determine the effect of using std::basic_string throughout. Table 5 shows the results. (For this and the remaining scenarios, the percentage figures in the fourth and sixth columns represent the relative sizes of each test client's code after subtracting the "stub-size" in Table 4. In other words, it's a good indicator of the actual size of code in the executable that is part of recls itself, rather than the test program as a whole.)

Except Borland, for which there is a significant saving in the use of std::string, the other compilers show an increase, rather than a reduction, in binary code size. (In several cases, tables averages with and without Borland are presented because it tends to have widely different results to the other compilers, and can skew the results.)

Pattern Validation

In playing around with applying #if 0 here and there, I discovered some surprising amounts of code bloat. Amazingly, in Listing 1, when #ifdef'd out, it reduces the object code size of the containing file by 24 KB (with VC++ 6). Listing 1 is used to ensure that a multipart pattern does not contain a dots directory. This got me to experimenting with the pattern validation. The first thing I tried was using STLSoft's basic_string_view, which is a string-like class template that presents a view onto a C-style string, rather than allocating and writing its own copy. It's perfectly suited to the code previously mentioned because the results of the tokenization are not preserved, just manipulated within the find() algorithm. Using basic_string_view instead costs a slight increase (Scenario C, Table 6).

So there's 20 KB or so of code to go at. Changing the implementation to one based purely on strstr() and strncmp() results in Table 7 (Scenario D).

That's a significant gain in the size of the libraries—about 7 percent—but a modest gain in the executables—about 3 percent. But we're pretty much down to the maximum gain for that particular area of the code because removing pattern validation entirely (Scenario E, not shown) results in only an approximate additional 1 percent saving across the board.

Customization Versus Generality

Looking at the individual object sizes gives a clearer picture of where the bloat is, as can be seen in Figure 1. The left-most data point for each file represents the average object size for the base 1.6 build (Table 3). Clearly, ReclsFtpSearchDirectoryNode_win32 has the largest footprint.

In the September 2004 installment of this column, I described the use of the searchspec_sequence template:

[M]ultipattern filtering is handled via the...STLSoft searchspec_sequence class. It is parameterized with the underlying sequence type whose model it emulates...and uses the string_tokenizer template to split the pattern. The iterators that it presents...iterate over a number of ranges of the underlying search sequences, corresponding to the number of patterns. In other words, it linearizes N ranges into a single one...I wanted to be able to insert a sequence class into the recls implementation with minimal disruption to the existing, and working code. Because searchspec_sequence presents a functionally identical interface to its underlying sequence, this was almost a no brainer.

The searchspec_sequence is an adapter that presents a view spanning multiple traversals of an underlying filesystem sequence. Unfortunately, this generalization suffers from code bloat to a significant degree. Scenario F (Table 9) uses an updated implementation for inetstl::basic_findfile_sequence that tokenizes the search patterns internally, based on a delimiter passed to the constructor along with the other arguments. By skipping the adapter searchspec_sequence you save on object code size and on runtime cost, at the expense of having to write an enhancement to the basic_findfile_sequence class template. Table 8 shows the significant size savings to be had, roughly about 17.5 percent in object code size.

This translates to about 4 percent saving on the library size and around 7 percent for the C++ test program (see Table 9); the C test program does not exercise the recls FTP functionality.

Best Cases

Combining all these advances, you arrive at Scenario G (Table 10), which represents a saving of about 10 percent in library size and 3-9 percent in executable sizes.

Scenario H (Table 11) summarizes the relative gains for Scenarios B-F, in terms of the library, C, and C++ test programs, with and without Borland. I've managed to make savings of a total of about 10 percent on library and test programs.

Miscellaneous Optimizations

There are other optimizations possible, such as the 195 bytes saved by collapsing the switch in lookup_error_string_() (in recls_api.cpp) into a struct table, but such things are grasping at straws, and are probably below the limit at which point it's worthwhile. When you've reached such measures, it's usually clear that a bigger answer is needed.

Conclusion

In broad terms, you can see that:

In some senses this activity has succeeded, and in others, it has failed. Where it's failed is in bringing significant reductions in source code size. In part, this has been because the refactoring has naturally caused some parts of the software to be segregated into separate files. Each source file has nontrivial overhead, and rightly so—trimming this down to the bare minimum would mean that the source would become impenetrable. We must accept that a medium-sized library, such as recls consists of a large amount of source code, even when it compiles down to a small amount of object code.

It is the object-code area in which I think there's been a measured degree of success. Sure, the refactoring did not result in any significant savings, but the subsequent directed attempts at reduction have identified some areas, specifically, the string tokenization and the searchspec_sequence adaptation, that have resulted in significant amounts of bloat. This is a shame because the STLSoft string_tokeniser component is easy to use and highly speed efficient, and the searchspec_sequence class, well, was a lot of work! But it goes to show that, for many compilers, optimization of complex template code does always compare with manually coding equivalent algorithms. It also shows that generalized component adapters may introduce unexpected amounts of overhead compared with making enhancements to the class(es) being adapted. Whether you choose to eschew simple, tested, reliable, and already written library components depends on factors outside the scope of this article, but it's worthwhile thinking about before you do.

Writing open-source libraries is, more than anything else, an exercise in battling hindsight. Before I did the research that constituted this installment, I was pretty sure, in light of empirical responses from recls users, that recls 2 would have to be written in C to avoid code bloat. Interestingly, now I am less sure because I've seen that, for some compilers, the entire footprint of recls within a compiled executable is no more than around 20 KB—a heartening statistic, to be sure. One thing I am certain of is that the implementation will have to take into consideration factors of code size on a component-by-component basis, to keep the sizes down.

Next Time

I'd intended to cover the recls/Python mapping—also featured in recls 1.6.1—but space and time have not been my friend. Although the Open-RJ/Python mapping was covered in the March 2005 installment, there are more lessons to be learned from the recls/Python.

September, 2005: Footprints in the Butter: Part II

Figure 1: Average object code sizes for recls 1.6.2.

September, 2005: Footprints in the Butter: Part II

Listing 1

// "." can only be specified as a pattern in nonrecursive searches
if(RECLS_SUCCEEDED(rc))
{
  using stlsoft::basic_simple_string;
  using stlsoft::string_tokeniser;
  typedef basic_simple_string<recls_char_t>     string_t;
  typedef string_tokeniser<string_t, string_t>  tokeniser_t;

  tokeniser_t   toks(pattern, Recls_GetPathSeparator());
  if(toks.end() != std::find(toks.begin(), toks.end(), "..") ||
      ( 0 != (flags & RECLS_F_RECURSIVE) &&
        toks.end() != std::find(toks.begin(), toks.end(), ".")))
  {
    rc = RECLS_RC_DOT_RECURSIVE_SEARCH;
  }
}

September, 2005: Footprints in the Butter: Part II

Table 10: Scenario G. Binary sizes of recls 1.6.2 (stlsoft::simple_string throughout; manual pattern validation; enhanced InetSTL findfile_sequence).

Compiler lib (MT) C Test Program (MT) C++ Test Program (MT)
Borland 395,776 126,976 78.0% 245,248 73.1%
Digital Mars 385,536 99,356 96.4% 119,836 91.7%
Intel 404,358 71,680 96.2% 86,016 91.3%
Visual C++ (6) 220,844 65,536 97.9% 74,752 89.8%
Visual C++ (7.1) 1,672,984 60,416 97.6% 70,656 91.7%
Average 615,900 84,793 89.8% 119,302 81.8%
Average (no Borland) 670,931 74,247 96.8% 87,815 91.2%

September, 2005: Footprints in the Butter: Part II

Table 11: Scenario H. Relative gains for Scenarios B-F.

Lib Lib (w/o Borland) C Test C Test (w/o Borland) C++ Test C++ Test (w/o Borland)
B 127.5% 133.1% 97.3% 104.0% 74.7% 108.8%
C 103.6% 104.8% 95.7% 100.7% 97.0% 101.1%
D 93.3% 93.7% 89.8% 96.8% 92.4% 97.2%
E 93.2% 93.5% 88.0% 94.6% 91.7% 96.1%
F 96.2% 96.8% 88.7% 93.3%
G 89.6% 90.4% 89.8% 96.8% 81.8% 91.2%

September, 2005: Footprints in the Butter: Part II

Table 2: Binary sizes of recls 1.5.3.

Compiler lib (MT) C Test Program (MT) C++ Test Program (MT)
Borland 487,936 145,408 297,472
Digital Mars 432,128 94,236 125,468
Intel 505,930 72,704 91,136
Visual C++ (6) 261,584 64,512 76,800
Visual C++ (7.1) 1,783,032 59,904 73,216
Average 694,122 87,353 132,818
Average (no Borland) 745,669 72,839 91,655

September, 2005: Footprints in the Butter: Part II

Table 3: Binary sizes of recls 1.6.2 (common settings).

Compiler lib (MT) C Test Program (MT) C++ Test Program (MT)
Borland 470,528 145,408 287,744
Digital Mars 401,408 101,404 123,932
Intel 488,640 73,216 90,112
Visual C++ (6) 252,100 66,048 77,312
Visual C++ (7.1) 1,825,996 60,928 72,704
Average 687,734 89,401 130,361
Average (no Borland) 742,036 75,399 91,015

September, 2005: Footprints in the Butter: Part II

Table 4: Scenario A. Sizes of test executables linked to the stub form of library.

C Test Program (MT) C++ Test Program (MT)
Borland 61,440 130,048
Digital Mars 44,572 74,780
Intel 33,280 43,008
Visual C++ 6 41,984 52,224
Visual C++ 7.1 39,936 48,128

September, 2005: Footprints in the Butter: Part II

Table 5: Scenario B. Binary sizes of recls 1.6.2 (std::basic_string throughout).

Compiler lib (MT) C Test Program (MT) C++ Test Program (MT)
Borland 433,664 133,632 86.0% 198,144 43.2%
Digital Mars 510,464 106,012 108.1% 132,636 117.7%
Intel 515,144 73,728 101.3% 91,648 103.3%
Visual C++ (6) 264,894 66,560 102.1% 78,848 106.1%
Visual C++ (7.1) 2,661,210 60,928 100.0% 73,728 104.2%
Average 877,075 88,172 97.3% 115,001 74.7%
Average (no Borland) 987,928 76,807 104.0% 94,215 108.8%

September, 2005: Footprints in the Butter: Part II

Table 7: Scenario D. Binary sizes of recls 1.6.2 (stlsoft::basic_simple_string throughout + manual pattern validation).

Compiler lib (MT) C Test Program (MT) C++ Test Program (MT)
Borland 429,568 126,976 78.0% 268,800 88.0%
Digital Mars 392,192 99,356 96.4% 122,396 96.9%
Intel 436,376 71,680 96.2% 89,088 97.8%
Visual C++ (6) 239,154 65,536 97.9% 76,800 98.0%
Visual C++ (7.1) 1,712,018 60,416 97.6% 71,680 95.8%
Average 641,862 84,793 89.8% 125,753 92.4%
Average (no Borland) 694,935 74,247 96.8% 89,991 97.2%

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.