Channels ▼
RSS

Extending C for Object-Oriented Programming


July 1993/Extending C for Object-Oriented Programming

Although trained in cognitive psychology, Dr. Colvin has been happily hacking computers for twenty years. He has programmed professionally in C for the last ten years and in C++ for the last five, and is a member of X3J16, the ANSI C++ Standards committee. Greg is Scientist for Systems Development at Information Handling Services in Englewood, Colorado, 80150 (303-397-2848) and is on the Internet at gregc@ihs.com.

Three years ago my team and I set out to create two new programs for access to industry standards, vendor information, and catalog page images on CD-ROM. These programs were to be just the first of many, and though most of our customers were using only MS-DOS workstations we would soon need X Window, Microsoft Windows, and MacIntosh versions. We were less than six months from market. We met our goals with an object-oriented approach to C programming that may well serve others making the transition from C to C++.

In this article I present a redesigned and simplified version of the object macros of our tool kit kernel. Using this kernel it is possible to write object-oriented programs that can be compiled without change by either a Standard C compiler or a C++ compiler. If you need to implement an object-oriented design, an object-oriented language like C++, Smalltalk, or Eiffel will make your job much easier. But if you have compelling reasons to use Standard C this kernel may be of help. To motivate the presentation I will present a design for a simple file access facility and show how the design can be implemented in four different ways: traditional C, C++, object-oriented C, and C extended with object macros.

Standard C

My team was convinced that only an object-oriented design could make the rapid creation of so many applications, across so many platforms, reasonably painless. But the object-oriented languages available at that time were all deficient in either performance or portability — C was the language we all knew and loved. After much debate we chose to design our systems as an object class hierarchy and do the first implementations in Standard C, with the intention of porting to C++ as soon as quality compilers were supported on all of our target platforms.

Standard C does not support object-oriented programming. According to Bjarne Stroustrup, to support a style of programming means to make it "convenient (reasonably easy, safe, and efficient) to use that style." It is certainly possible to implement a class hierarchy in C, as the Xt Intrinsics and the Motif and Open Look widget sets demonstrate. But even a cursory reading of "Inside A Widget" will reveal just how inconvenient C can be.

Fortunately, the C preprocessor does allow the syntax of C to be extended, and Standard C blesses and codifies the necessary stringizing (operator #) and tokenizing (operator ##), which required dirty tricks in older C dialects. We were thus able to extend Standard C to provide a more convenient object-oriented syntax in which to implement our design. Although not perfect, it allowed us to get our products to market on schedule, and has eased our transition to C++.

File Access: A Simple Example in C and C++

Let's say we need to implement a File facility that allows files to be opened either by the Standard C open call or by Unix system calls, but accessed by the same Read, Write, and Seek methods, without needing to know how the file was opened. I have sketched out the design of such a facility in Figure 1. The File class declares Read, Write and Seek methods with no defining functions. The derived StdFile and UnixFile classes define these methods with the appropriate functions, and extend the File interface with appropriately defined Construct and Destruct methods.

In a traditional C implementation, the File object would be defined by a structure with a tagged union, which could contain either a standard file handle or a UNIX file descriptor:

typedef struct {
union {
    FILE *hdl;
    int dsc;
} u;

enum { STD, UNIX } tag;
} File;
The File methods would be defined by access functions that take a pointer to this structure and switch on the tag to call the appropriate functions. For instance:

int FileRead(File*p,void*buf,int n)
{ switch (p->tag) {
  case STD:
    if (fread(buf,n,1,p->u.hdl)<1)
      return -1;
    return n;
  case UNIX:
    return read(p->u.dsc,buf,n);
  }
return -1;
}
The disadvantage of this approach becomes evident once you attempt to extend the file object by adding more access functions (e.g., character I/O) or other ways of opening files (e.g., calls for other operating systems). Although our design calls for three distinct classes, each with its own methods, this implementation defines only one data structure for all three classes, and one function for each method. Thus, every new function you write must have a switch on the type tag and code for all classes. Even worse, adding a new type means adding new members to the File structure, adding new cases to every switch in every function, and rebuilding and retesting the entire facility and every other piece of code that uses this facility. Forget a case in one of your switches and your compiler may not complain, but some user of your facility eventually will. If you want to further extend your class hierarchy, say to derive a class of UnixFile with special buffering, then your structures and switches get even more complex.

These difficulties can be managed in a small program, but can easily become overwhelming in larger programs. Thus the traditional C approach is most appropriate for small programs, or for programs whose design is well-specified before implementation and not likely to change in the future.

The C++ Approach

C++ can provide a solution that maps much more closely to our design. The abstract File interface is declared in a single structure that serves as an abstract base class. No data members are declared, just the common access methods, which are defined as null by default:

struct File {
  virtual int Read(void*buf,int n)=0;
  virtual int Write(void*buf,int n)=0;
  virtual long Seek(long off,int pos)=0;
}
This declaration would typically be in a separate header, so that users of the access functions need include only the common interface.

The StdFile class is derived from File, with a class declaration in a header file:

class StdFile::File {
  FILE *hdl;
public:
  virtual int Read(void*buf,int n);
  virtual int Write(void*buf,int n);
  virtual long Seek(long off,int pos);
  StdFile(const char*nam,const char*acc);
  ~StdFile();
};
The method definitions would go in a separate source file:

int StdFile::Read(void*buf,int n)=0;
{ if (fread(buf,n,1,hdl)<1)
    return -1;
  return n;
}
//...other methods...
The StdFile and ~StdFile functions are special. The StdFile function is the constructor, which acquires any resources needed by the object, in this case a file handle. The ~StdFile function is the destructor, which releases those resources. All C++ classes must have a constructor and destructor. The compiler will provide default implementations if the programmer doesn't.

The UnixFile class is also derived from File, with a class declaration in a header file:

class UnixFile::File {
  int dsc;
public:
  virtual int Read(void*buf,int n);
  virtual int Write(void*buf,int n);
  virtual long Seek(long off, int pos);
  UnixFile(const char*nam, int omode,int cmode=0);
 ~UnixFile();
};
Again, the method definitions go in a separate source file:

virtual int Read(void *buf,int n)
{ return read(dsc,buf,n);
}
//...other methods...
Class declarations and definitions are typically placed in separate files so that they can be modified and extended independently, with minimal recompilation.

In C++, virtual functions provide for polymorphism, without the need for switching on tag fields. Thus the function

int GetNextBlock(File*fp)
{ return fp->Read(BlkBuf,BlkSz)
}
might call either StdFile::Read or UnixFile::Read depending on the type, at the time of the call, of the actual p parameter.

Implementing Classes in C

Stroustrup's original C++ compiler, CFront, is a portable front-end that emits efficient and notoriously cryptic C code. The Xt Intrinsics implement classes with similar, if slightly less cryptic, techniques. You too can use these techniques to implement objects directly in C. A picture of one possible implementation is shown in Figure 2. This implementation is rooted in a method table (called a vtable in CFront), which is a structure of pointers to functions, one pointer for each method:

typedef struct {
  int (*Read)(void*obj,void*buf,int n);
  int (*Write)(void*obj,void*buf,int n);
  long (*Seek)(void*obj,long off,int pos);
} FileVTable;
Each base class includes a pointer to a table of this type as a member of its data structure:

typedef struct {
   FileVTable *methods;
} File;
Using this approach, you can derive StdFile from File by nesting structure declarations:

typedef struct {
  FileVTable base;
  void (*Construct)(void*obj,const char*nam,const char*acc);
  void (*Destruct)(void*obj);
} StdFileVTable;
typedef struct {
  File base;
  FILE *hdl;
} StdFile;
You can derive UnixFile from File in the same way:

typedef struct {
  FileVTable base;
  void (*Construct)(void*obj,const char*nam,int omode,int cmode=0);
  void (*Destruct)(void*obj);
} UnixFileVTable;
typedef struct {
  File base;
  int dsc;
} UnixFile;
Because a pointer to a structure (in Standard C) has the same value as a pointer to its first member, any pointer to a StdFile or to a UnixFile also points to a File. (The first member of each is a File.) Thus a pointer to a StdFile or a UnixFile can be safely cast to a pointer to File. Of course, the methods pointer of each StdFile object must be initialized to an appropriate table of functions for the StdFile type:

int StdFileRead (void*obj,void*buf,int n)
{ StdFile *this=obj;
if (fread(buf,n,1,(this->hdl)<1)
  return -1;
return n;
}
/*...other functions...*/
StdFileVTable StdFileVT =
{ { StdFileRead,
    StdFileWrite,
    StdFileSeek },
  StdFileConstruct,
  StdFileDestruct,
};
and each UnixFile object must be initialized to an appropriate table of functions for the UnixFile type:

int UnixFileRead(void*obj,void*buf,int n)
{ UnixFile* this= obj;
  return read(this->dsc,buf,n);
}
/*...other functions...*/
UnixFileVTable UnixFileVT =
{ { UnixFileRead,
    UnixFileWrite,
    UnixFileSeek },
  StdFileConstruct,
  StdFileDestruct,
};
Given these definitions, a C++ expression for calling a virtual function, such as fp->Read(buf, n), can be implemented as the C expression ((File*)fp)->methods->Read(fp,buf,n). This expression will call a different function, depending on whether fp is a StdFile* or a UnixFile*, since these structures are initialized to point to different method tables.

Notice that with this implementation the cost, in time and space, of a virtual function call is comparable to calling a function which must switch on a type tag. The time cost is just accessing and calling the function via the method table, versus making a normal function call followed by accessing and switching on the tag. The space cost is just one table of pointers for each class and one extra pointer in each object, versus the switch logic in each function and one type tag in each object.

It is far from convenient to directly implement objects in this way, but it can be done. To implement a large class hierarchy you must properly declare, define, and initialize a host of class data structures and method tables. Using the resulting objects requires a cryptic syntax with lots of unsafe type casting. If you do choose to implement objects in this way I recommend you follow the lead of the Xt Intrinsics: establish and follow strict conventions for laying out and naming your data structures, and hide the ugly details of object creation and method invocation behind a wall of access functions.

Extending C Syntax with Macros

The features of C++ which I have chosen to support with my extensions are type safety, public classes, single inheritance, virtual functions (with a pointer to this), constructors, destructors, runtime type identification, and separate declaration and implementation of classes. These features were sufficient to meet our needs, and I was unable to go much farther within the constraints of the C preprocessor. I chose not to support multiple inheritance, and cannot see how to provide operator overloading, function overloading, default arguments, or access specifiers using macros. The industrial strength predecessor to this extension did provide simple templates and exception handling, but I have omitted those features from this version.

The syntax presented here did not spring full grown from my forehead. It has evolved over six years of trial and error. I have not been able to provide as concise a syntax or as much type safety as C++, but I have tried to ensure that incorrect syntax will cause compiler or runtime diagnostics. I use runtime type checking where compile-time checking is impractical.

Listing 1 (objects.h) and Listing 2 (objects.c) provide the full source code for my extension. The code disk for this article also includes a simple File facility implemented in my extension.

Declaring Classes

Just as with a C structure, you must declare a class before it is used. As the syntax shows, a class is declared by separately declaring first the methods and then the members of the class, using the macro

DCL_METHODS( class-name,base-name,constructor-parameters )
DCL_MEMBERS( class-name,base-name )
and associated macros. Methods are declared as pointers to functions using the DCL.METHOD( method-name, method-parameters, return-type) and REDCL_METHOD macros, and members are declared just like C structure members. Note that the base-name for the methods and members declarations must be the same. A single base class, Object, is provided to serve as the starting point for other classes. For example, the StdFile class is declared as:

DCL_METHODS (StdFile, File, (const char *nam, const char *acc))
REDCL_METHOD(Seek,(long off, int pos),long);
REDCL_METHOD(Read,(void *buf, int n),int);
REDCL_METHOD(Write,(void *buf, int n),int);
END_METHODS
DCL_MEMBERS(StdFile,File)
FILE *hdl;
END_MEMBERS
The class declaration macros expand to declarations of two structures, one for the class method table and one for the class data structure. The names for these structures are derived from the class-name via token pasting. Each of these structures includes the corresponding structure from its base class by token pasting with the base-name.

Defining and Implementing Classes

The macro

METHOD( class-name, method-name, (parameter-list), return-type)
begins the implementation of a method, the body of which is pretty much the same as for a C function. Within a method implementation the keyword this is defined as a pointer to an object of the same class as the method, just as in C++. The object pointers for each active method invocation are pushed on a separate stack (rather than being passed as parameters), which allows for runtime type checking and casting. This separate stack does increase the space and time for method invocation. In the MS-DOS version, I avoid this overhead by using Borland's inline 8086 assembler to pass the object pointer in a register. For instance, the Read method for the StdFile class is implemented as:

METHOD(StdFile,Read,(void*buf,int n),int)
if (fread(buf,n,1,this->hdl) < 1)
    return -1;
return n;
END_METHOD
Two special methods must be implemented for each class: the constructor and the destructor. Constructors begin with the CONSTRUCTOR( class-name, method-parameters) macro, followed by the CONSTRUCT(base-name, constructor-arguments) macro to invoke the base class constructor and set the methods pointer to the class-name method table. At the entry to each constructor the methods pointer points to the method table for that class. Construction is from the base class out to the derived classes. The constructor for the StdFile class is implemented as:

CONSTRUCTOR(StdFile, (const char*nam,const char*acc))
CONSTRUCT (File,()); 
this->hdl = fopen(nam,acc);
assert (this->hdl );
END_CONSTRUCTOR
Destructors begin with the DESTRUCTOR(class-name) macro. At the entry to each destructor the methods pointer points to the method table for that class. At the exit of each destructor the object destruction function automatically invokes the base class destructor. Destruction is from the derived class back to the base classes. The destructor for the StdFile class is implemented as:

DESTRUCTOR(StdFile)
fclose(this->hdl);
END_DESTRUCTOR
Once you have implemented a class's methods you must define the class itself. The class definition begins with the DEF_CLASS( class-name, base-name ) macro, and ends with the END_DEF macro. Within the class definition each new class method is defined with the DEF_METHOD( base-name, method-name ) macro, and inherited methods are redefined with the REDEF_METHOD( base-name, class-name, method-name ) macro. The REDEF_METHOD macro can, in the C version of the macro, be used at any tune to change the behavior of a class. (This is arguably a bug rather than a feature.) The StdFile class is defined as:

DEF_CLASS (StdFile,File)
REDEF_METHOD(StdFile,File,Read);
REDEF_METHOD(StdFile,File,Write);
REDEF_METHOD(StdFile,File,Seek);
END_CLASS
The definition of a class is actually implemented as the definition of a method table and a runtime function to initialize the table with pointers to functions. This function is called by the USE(class-name) macro, which must be invoked before any actual use of the class.

Creating and Destroying Objects

With these macros, you can construct objects in two places: on the stack or on the heap. Objects on the stack can exist only within the scope of their declaration, whereas objects on the heap can exist until explicitly destroyed.

To create an object on the stack, use the PUSH(object-pointer, class-name, constructor-arguments) macro. (This statement declares object-name as a pointer to an object of the specified class.) Destroy an object on the stack with the POP(object-pointer) macro beore it goes out of scope.

To create an object on the heap, use the NEW(class-name, constructor-arguments) macro. This expression evaluates to a pointer to an object of the specified class. Destroy an object on the heap with the DELETE(object-pointer) macro (when it is no longer needed). Both of these macros invoke the class destructor method, which should release any resources used by the object.

Using Objects

You use objects primarily by invoking their methods, and also by accessing their data. You can invoke object methods polymorphically with the SEND macro, or monomorphically with the CALL macro. You can access object data through the object pointer.

The SEND( object-pointer, base-name, method-name, method-arguments ) macro invokes the named method of the specified base class as redefined for the class of the object-pointer. The SEND macro calls the function pointed to by the method-name member of the method table pointed to by the first member of the object pointed to by object-pointer. Thus the runtime type of object-pointer determines which function is called.

The CALL( object-pointer, class-name, base-name, method-name, method-arguments ) macro invokes the named method of the specified base class as redefined for the specified class. The CALL macro calls the function pointed to by the method-name member of the method table of the specified class. Thus the compiler determines which function to call.

Dynamic binding with SEND is the most common form of method invocation. Indeed, I created all of this macro machinery primarily to make dynamic binding possible. Static binding with CALL is most useful when a redefined class method must invoke an earlier definition of that method.

Discipline

As the implementer of a class library you should always provide a public interface which defines access macros for every SEND, CALL, and data access operation needed by a user of a class. If enforcing data-hiding is more important than performance, then access functions, rather than macros, can be declared in a separate set of interface headers, and the library can be provided as compiled code only.

As the user of a class you should never directly access the methods and data of that class. You should always treat a class as a black box, using only the public interface, so that the implementer is free to change the private implementation details. C compilers cannot automatically enforce this discipline, as can C++ compilers, but discipline is needed nonetheless.

Porting C Classes to C++

Even if you choose to implement your classes in C, you may want or need a C++ implementation in the future. You can accomplish this port by simply using my macros to generate C++ instead of C. For many of the macros, like POP and END_METHOD, the translation amounts to a no-op. In others, like METHOD, SEND, and CALL, simple C++ code is substituted for more cryptic C code. Listing 2 provides both C and C++ versions of all these macros. Interestingly enough, Borland's Turbo C++ compiler produces smaller and faster output than does its Turbo C compiler for the example File classes, so don't be fooled into thinking that C++ is necessarily less efficient than C.

When we moved our project to C++, I ran our code through a Standard C preprocessor and a code beautifier, and followed that with some hand editing. Using this approach I was able to port several thousand lines of our C class code to C++ in a few days. The resulting code did not, of course, take full advantage of the more powerful C++ features, but did provide a clean, working starting point for further development.

Conclusion

C++ remains my first choice for implementing an object-oriented design. Nonetheless, there can be good reasons to choose C, including include contractual obligations, the need to use an industry standard language, severe portability or performance constraints, and the availability of trained staff.

Although Standard C does not support object-oriented programming, it can be extended — using its own preprocessor — to support the essentials of object-oriented programming in a way that is upwardly compatible with C++.

Sidebar: What is an Object? A Quick Review

Sidebar: Extended C Syntax

Sidebar: The Hazards of Macros


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video