Turning Virtual Tables
I've been talking about C++ and how it is actually useful for writing toolkits, even for embedded systems. Naturally, lots of folks disagree. If you've read my blog (or just about any of my writing) for any length of time you know that I really like C. My point is that C++ is C if you avoid certain features, and then you can use the extra features carefully to build very easy-to-use and very reusable libraries.
- The Essential Guide to IT Transformation
- Consolidation: The Foundation for IT Business Transformation
I think a lot of the bent against C++ is that it can be harder to reason about what the code is doing. This is especially important if you are subject to analysis with safety-critical code. A C program is usually pretty transparent about what each line of code generates (if you disallow the preprocessor, especially). But the truth is, a C++ compiler doesn't do black magic. You can reason about what it will (and won't) do. You simply have to be aware of the things it adds to your code, how templates expand, and so on. Just as some C programming shops eschew the use of the preprocessor to do anything but the simplest macro expansion and file inclusion, you can also forbid C++ features you find scary (like templates) if that helps you.
A little over a year ago, I talked about how to get GCC to output the assembly language equivalent of your program so you could verify what it does (click here for more information). The same trick works with g++ (the GNU C++ compiler).
I ran the lcdui.cpp file through the ARM version of g++ with this command line:
arm-elf-g++ -c -g -Wa,-a,-ahl=output.s lcdui.cpp
Here's part of the result (with C++-style comments added):
81:lcdui.cpp **** output(work); // call virtual function output 413 .loc 3 82 0 414 03fc F4301BE5 ldr r3, [fp, #-244] // get this pointer 415 0400 003093E5 ldr r3, [r3, #0] // get virtual table 416 0404 143083E2 add r3, r3, #20 // index to output address 417 0408 003093E5 ldr r3, [r3, #0] // load call address for this->output() 418 040c 3C204BE2 sub r2, fp, #60 // get address of work 419 0410 0210A0E3 mov r1, #2 420 0414 28110BE5 str r1, [fp, #-296] 421 0418 F4001BE5 ldr r0, [fp, #-244] // get this pointer (hidden argument) 422 041c 0210A0E1 mov r1, r2 // get work into r1 (first argument) 423 0420 0FE0A0E1 mov lr, pc // save return address 424 0424 13FF2FE1 bx r3 // call output
That's a bit much to digest, so let's look at a simple function:
void x(int y);
If the function is not virtual, the call compiles to:
13:test.cpp **** bar->x(10); 67 .loc 1 14 0 68 0020 08001BE5 ldr r0, [fp, #-8] // get this pointer 69 0024 0A10A0E3 mov r1, #10 // load argument 70 0028 FEFFFFEB bl _ZN3foo1xEi // call
this pointer is always first (recall that the ARM passes arguments in registers until it runs out and has to switch to the stack).
As a virtual function, the call is a bit more complex:
13:test.cpp **** bar->x(10); 111 .loc 1 14 0 112 0038 0C301BE5 ldr r3, [fp, #-12] // get this pointer 113 003c 003093E5 ldr r3, [r3, #0] // get virtual table 114 0040 003093E5 ldr r3, [r3, #0] // get address of x 115 0044 0C001BE5 ldr r0, [fp, #-12] // load this pointer 116 0048 0A10A0E3 mov r1, #10 // load argument 117 004c 0FE0A0E1 mov lr, pc // save return address 118 0050 13FF2FE1 bx r3 // call
Each object has a virtual table pointer in it (an extra pointer). The entire class has one table of function pointers (the virtual table) that provides a list of addresses to virtual functions. If your object had, for example, 4 virtual functions, then the entire class would contain 4 extra pointers and each object would contain one extra pointer. In other words, you have the extra 4 regardless of how many (or few) instances of the class you create. Of course, if you had 100 virtual functions, the class would have 100 pointers, and each object would still only have one extra.
So there is a small amount of overhead in memory depending on how many virtual functions you use and an even smaller tax on each instance (but it doesn't depend on how many functions you use).
Granted, the virtual case also uses more instructions (3 vs 7). If you were calling a function in a critical interrupt routine or at a very high frequency, maybe those four extra instructions would be a problem. Many times, though, it doesn't really matter. The benefit of faster development will outweigh the tiny differences in execution speed and size.
Just as a side note, the virtual table overhead is only this bad when calling via a pointer. Granted, the whole point (no pun intended) of using a virtual function would be to call it through a base class pointer. However, if a time-critical piece of code uses a non-pointer type, the compiler resolves the call at compile time and the code is identical to the non-virtual case.
I don't want to come off as being anti-C. I do like C. But I also like using the right tool for the job and sometimes that right tool is C++. Of course, sometimes it is something completely different, like assembly, Java, or even Ruby on Rails (I don't want to write a database-driven website in C, do you?).