Dr. Dobb's | It's Easy, Class!

It's Easy, Class!

Streams under Turbo Vision are a prime example of what Jeff calls the Rubber Pipe fallacy.

July 01, 1992
URL:http://www.drdobbs.com/architecture-and-design/its-easy-class/184408807

JUL92: STRUCTURED PROGRAMMING

Last week someone said that I had to have gone to Catholic school, because she could read my signature. Guilty! And proud of it.

I've heard Catholic schools blamed for everything from frigidity to morbid fear of rulers, and I just don't buy any of it. In eight years I never saw a nun lift a hand against a child, and I suspect it happens more in urban legend than it ever did in reality.

Come on, already. The nuns worked for God, and God was the guy who turned Arius the Heretic to worms before he was even dead or anything. The lesson was not lost on us.

No. What the poor old dears were guilty of was demanding that we work, and learn, and excel, without regard to things like self-esteem, which (if they had ever heard the term) they rightfully assumed was something like self-abuse. They knew that literacy was possible, and expected without exception that we would become literate.

Hey, it worked.

I'll admit the road was rocky at times. In third grade, Sister Agnes Eileen explained what definitions were, in that it was always possible to explain a word in terms of other words so that no one would mistake its meaning.

"It's easy, class!" she said with that boundless Irish enthusiasm. "Like this: The definition of 'hose' is, 'a rubber pipe.' How hard can that be?"

Not too, we agreed. So she passed out worksheets with a list of words that we were to define before the lunch bell.

And what was the first word on the list?

"Love."

Drowning in the Stream

I've been fussing with Turbo Vision streams for a while now, and that same old feeling keeps coming back, that I felt in Room 1 at Immaculate Conception Grade School. What's easy in theory is not always easy in practice -- and not everything sums up as easily as "a rubber pipe."

Streams, for instance. Streams are perhaps the most abysmally documented part of the abysmally documented Turbo Vision, with the sole exception of the standard dialogs, which are not documented at all. The Turbo Vision Guide is guilty of what I might as well call the Rubber Pipe Fallacy: Demonstrating that something is easy by giving a trivial example, and then entirely avoiding the issue of what happens when truly useful things need to be done.

Having read the TV Guide explanations and run the code examples, I felt that I understood how stream I/O was to be done. Then I attempted to add stream I/O to HCALC.PAS.

Hello, wall.

Oh, I figured it out, with the help of some people who make their living writing Turbo Vision code. And while I freely admit that I'll probably be glad someday that I learned it (as the nuns would relentlessly remind us), from here, well...

Let me see if I can save you some grief.

Filing Objects

An early Turbo Pascal disappointment for people who don't read the fine print in their manuals is that you can't create a FILE OF OBJECT. It seems a little arbitrary until you think for a while about the nature of objects and the nature of traditional Pascal file I/O. Record-oriented I/O is easy. A record is all data, and you can write the whole thing to disk without fear of violating any beneath-the-surface connections to other parts of the application. (You may have gotten a hint of the problems with filing objects if you ever tried to save a linked list -- or, worse, a more complex data structure -- from the heap to disk with a hope of later bringing it back to the heap intact.)

The #1 complication with objects and files is that when objects go to disk, they don't take their code with them. The threads of connection between object instances and method code are broken in the act of writing an object to disk as though it were a record, and reconnecting those threads in bringing back objects from disk is not trivial.

The #2 complication with objects and files is that we'd like to be able to read and write objects to disk polymorphically. In other words, if we have a collection of objects of different types, we'd like to be able to iterate over the collection and write each object to disk without necessarily knowing its exact type at run time. And that implies that the file system must support variable-length records and do it well, because the size of all those different object types is certainly not going to be identical.

The #3 complication with objects and files is that, especially under Turbo Vision, object-oriented programming hangs heavily on pointers and objects linked by pointers, in what can become pretty hairy dynamic structures. If your application makes heavy use of numerous objects on the heap, you'll end up reconstructing most of the heap every time you read the application and its objects from disk. This I find ugly work, rather like standing in the dark, swinging a hammer at a nail you can't see. Now and then you're bound to miss, and the misses will hurt.

The Streams Solution

Streams were designed to solve these problems, or at least allow them to be addressed. Streams actually predate Turbo Vision, and were present in Turbo Pascal 5.5. Because of scheduling and production constraints, streams didn't quite make it into the OOP Guide, and many people never looked closely enough at the example programs to discover them.

A stream is itself an object, encapsulating physical file support with the ability to wrestle objects out to disk and bring them back alive.

It's a two-way street, however. If they are to be written to a stream, objects must know about streams, and have methods within them to write themselves to a stream. It works like this: You instruct a stream to put a specified object onto itself. The stream then instructs that object to store itself as required onto the stream.

The stream has a pair of virtual methods called Get and Put. Put takes a single parameter, which can be a pointer to any object type that descends from TObject. Put puts an object onto the stream. Get is a function method, which brings an object back from the stream and returns a PObject pointer to that object.

To be streamable, an object must (among other things) have a pair of virtual methods called Load and Store. Store writes the object's data onto the stream by making a call to a method named Write once for each data item in the object. Similarly, Load brings the object's data back from disk by reading the object's fields, each with a separate call to a method named Read. And whose methods are Read and Write? The stream's, of course.

Assume you have an open stream S, and a pointer PP to some object on the heap. The following call writes object PP onto stream S:

  S.Put(PP);

Whew. Your application calls the stream's method Put. Put calls the Store method belonging to the object, and Store calls -- perhaps repeatedly -- the Write method belonging to the stream.

You make the following call to bring back an object from stream S and allocate it on the heap as the referent of pointer PP:

  PP:=S.Get;

Here, Get calls the Load method of the next object stored on the stream, and Load recreates the object it belongs to by allocating space for itself on the heap and then retrieving the stored values of its various fields by repeated calls to S.Read. Load can do this because Load is a constructor, and represents an alternate way to build an object, different from your old familiar Init constructor but ending up with the same result: a new object on the heap that wasn't there before.

That's how the stream process works from a height. It's easy, class! Well -- sort of.

Preparation

If you're sharp, you may be asking some pretty pointed questions about now. Like -- how does the stream know which constructor method to call when Get fetches the next object from the stream? Don't make the naive mistake of asking how the stream can call the methods of an object that doesn't really exist yet. The object doesn't yet exist, but its methods are always in the code segment, whenever the application that uses the objects is running. The true question is, how does the stream find the right constructor among the many in the code segment? The short answer is that it has to peek a little, and for that essential peeking to happen, you have to set things up just so.

First and most fundamental, to use the stream's mechanisms to store objects as objects, those objects must be descended from TObject. In other words, virtually all objects in Turbo Vision are already eligible, because they all descend directly or indirectly from TObject. However, if you create your own "mute objects" (my TMortgage object type from HCALC is a good example), you must explicitly make them descend from TObject.

The reason for this may surprise you. Most of the time you make an object descend from a particular parent object in order to inherit some particular methods or fields from the parent object. In this case, what your objects inherit from TObject is not any specific method or field (in fact, TObject has no fields of its own) but only an assurance that the first field in the child object will be the pointer to its virtual method table (VMT).

VMTs First!

The TObject type may well have some other purpose than to guarantee the position of the VMT pointer, but in truth I've never heard of one. Some quick recap here on VMTs: All virtual method tables are present in Turbo Pascal's single data segment, and there is one VMT there for every object type that contains virtual methods. Every instance of that object type contains a 16-bit pointer to its VMT in the data segment. This pointer, which we call the VMT pointer, is nothing more than the offset into the data segment at which the VMT itself exists.

Consider this: Unless an object has virtual methods, it has no VMT and hence no VMT pointer. If a child object descends from a parent object without a VMT, but the child object defines one or more virtual methods, a VMT pointer will be added to the child object's structure. However, the parent object's fields will be present in the object's image before the VMT pointer is. In other words, if you're mapping out an object's fields in memory, the parent's fields will exist at lower memory addresses than the child object's VMT pointer.

Now, TObject has no fields of its own at all. It does have a virtual method, however (its Done destructor), and therefore it has a VMT. Because there are no fields in TObject to come before the VMT pointer, the VMT pointer is right there at offset 0 from the start of the object. And this will always be the case in any object that inherits from TObject, because parent fields are always "ahead" of child fields in the object's image.

Therefore, if some object you define ultimately descends from TObject as the root of its inheritance tree, your object is guaranteed to have a VMT pointer at the very start of its image. This is important, because (as I'll explain a little later) the stream doesn't have the ability to go searching through an object's fields to find its VMT pointer. The VMT pointer must be in a totally predictable place -- like at the very start of any object -- to be considered streamable. This is the reason that all streamable objects must trace back to TObject as their ultimate ancestor.

The Registration Record

Another requirement for a streamable object type is that it be registered with the stream. This sounds more exotic than it actually is. When an object type is registered with a stream, it only means that the stream has obtained a small amount of information about that object type. This information allows the stream to connect an incoming object with its methods and its VMT, none of which go out to disk with the object's fields.

When you define an object type and want to make it streamable, you must also define a registration record for that object type. This record is usually created as a typed constant, since once defined, it's generally not altered at run time. The record's definition is shown in Example 1.

Example 1: The stream-registration record definition.

  TStreamRec = RECORD
    ObjType : Word;     { You define a unique code for this field     }
    VMTLink : Word;     { The offset of the type's VMT in the dataseg }
    Load    : Pointer;  { The full address of the type's Load method  }
    Store   : Pointer   { The full address of the type's Store method }
  END;

You must define one of these records for each object type you intend to make streamable. The ObjType field is -- literally -- key; it's a unique code that you the programmer define, and cannot be present in any other registration record. It is how the stream tells registration records apart, and how it identifies the one it needs. Any word-sized value greater than 99 is legal here. I picked 1100 out of my hat when streamizing HCALC, and started numbering my registration records from that value.

The VMTLink field contains the offset portion of the registered object type's VMT pointer. This can be derived by using the built-in TypeOf function, which returns a 32-bit pointer to the VMT belonging to the object type passed as its parameter. The segment portion of the pointer is discarded, and only the offset is used. See Example 2 for the actual syntax I used to derive the VMTLink value for my TMortgage type. You'll need a typed constant definition like this for every object type you intend to make streamable. (Note that Turbo Vision provides its own registration records for all of its provided types. You create registration records only for object types that you create from scratch or derive from the "stock" object types.)

Example 2: The registration record for TMortgage.

  CONST
   RMortgage : TStreamRec =
    (ObjType : 1200;
     VMTLink : Ofs (TypeOf (TMortgage)^);
     Load    : @TMortgage.Load;
     Store   : @TMortgage.Store);

Finally, the Load and Store fields simply contain pointers to the Load and Store methods of the registered type. The address-of operator @ is used to derive these pointers; see Example 2.

The stream-registration system contains a serious design pitfall: There's no promise that the ObjType you select for your own objects won't conflict with objects you may be using that were designed by others. It's particularly sticky when you don't have the source to the objects you're using and are linking from TPUs. You can read the ObjType code from a source-code file, but I know of no easy way to divine the ObjType codes embedded in a compiled .TPU file. Keep this in mind, since the compiler will not warn you when two ObjType codes conflict!

Registering Types

Defining a registration record for an object type is not enough. You must explicitly pass that record to a registration procedure in order to register the object type. It's easy enough to do: RegisterType(RMortgage);

The RegisterType procedure is global if you USE the OBJECTS.TPU unit, and it adds your stream-registration record to a list of such records that it maintains. Once registered through RegisterType, your object type is registered for any and all stream objects your application uses.

Another of the countless sources of confusion in Turbo Vision is that while Turbo Vision provides stream-registration records for all of the "stock" object types, you must still explicitly register all object types you intend to write to a stream. This includes all the collections and views and controls provided with the product, not only those that you subclass and modify!

Fortunately, there are canned routines that gather together the registration calls from each unit and allow you to register all types defined in that unit with one call. Look in the interface of each unit for a routine beginning with Register, such as RegisterApp, RegisterMenus, and so on.

My opinion is that all this rigmarole is totally unnecessary. The runtime library could easily handle registration by itself, beneath the surface, including creating its own registration records and assigning unique ObjType codes. (The RTL is actually the only entity with enough information about an application to avoid ObjType code collisions.) This is just another area where Turbo Vision's black box needs to be a great deal blacker.

Going Out to Disk

Since Turbo Vision insists on keeping all of this registration stuff in your face while you work with streams, you might as well understand what all the funny numbers do. In particular, knowing how streams work internally can be essential when you're trying to debug a streams problem that seems like it came from Mars.

In the first step in the Put process (writing an object onto a stream) the stream fetches the VMT pointer from offset 0 of the object to be written onto the stream. This is the benefit of having all streamable objects descend from TObject -- the stream doesn't have to search for the VMT pointer. The stream searches its linked list of registration records to find a registration record containing a VMT reference that matches the VMT pointer in the object. It then takes the ObjType code from the found registration record and writes this out to the stream as a sort of "who am I" header value. This header value is crucial when we go to read the object back into memory, as I'll discuss shortly.

The VMT pointer cannot itself be used as the ObjType code, in case you suspect (as I did) that the VMT pointer makes a separate ObjType code unnecessary. It's true that every object type has its own unique VMT, and thus within a single application every VMT pointer should in fact be unique. The kicker is this: You may want to write out a stream of objects from one application and read them back in to another application. And while the second application must have the code linked into itself for any objects it reads from a stream, there's nothing to indicate the order in which those objects were linked into the second application. All the VMTs have to be there in the first part of the data segment, but they do not have to be in the same order. Hence, a VMT pointer is not enough to identify an object uniquely from one application to another.

The ObjType code-header value serves to say, "What follows is a Widget object type." The stream then calls the object's Store method to write the individual data fields of the object to the stream. Store, in turn, uses the stream's Write method to write the individual fields to the stream.

Who Calls Store?

Another Turbo Vision confusion seed: In a couple of places in the TV Guide, you're told that you never call the Load or Store methods directly.

Wrong!

If you derive a new object type from an existing object type that has its own Load or Store methods, your child object's Load and Store methods must call its parent object's Load and Store methods. Each object takes care of writing its own fields to the stream -- and Put calls the child object's Store method, which only writes the fields defined within the child object. If the parent object (or grandparent object, or further ancestor objects up the line) has its own fields, it must take responsibility for writing those fields to the stream.

So when you write a Store method for some object that descends from an object that has its own Store method, you must call the parent's Store method before writing any of your own fields to the stream. It's very much like calling the parent object's constructor before executing your own constructor code, so that the parent object can get its own house built before you build yours.

So the TV Guide has it half right, sorta. It's true that you never call your own object's Load or Store methods directly. But it's just as true that you must call the parent object's Load and Store methods from within your own Load and Store methods if you expect the stream mechanism to work correctly. I'll provide a solid code example next month, once I've laid a little more groundwork.

Bringing an Object Back from Disk

When you call a stream's Get method to bring the next object in from the stream, Get reads the first word from the stream and assumes it to be the ObjType code of the object that follows. It looks up the code in its list of registration records until it finds a registration record containing a matching ObjType code. Recall that the registration record contains a full 32-bit pointer to the object's Load method. The stream can thus use this pointer to call the object's own Load without having yet read the object in from the stream.

The Load method (which is a constructor) allocates space on the heap for the new object and begins reading it in from the stream, field by field, using the stream's Read method. And while I don't see it mentioned in the TV Guide anywhere, the object gets a new VMT pointer when it comes in from disk. Remember, an object can be read from a stream into any application that contains the code implementing that object. The VMTs in different applications may be in a different order and at different offsets from the start of the data segment. So the newly read object gets a new VMT pointer from the VMTLink field of its registration record.

Partway There

Well, I'm bumping my head on my word count, and we're a long way from getting streams under control. This column is at best an overview of how streams operate -- and as such, it's probably pretty easy to understand. The worst part of streams is all the little details and the multitude of ways to go wrong. We'll take on some of those next month, and start to see how streamability can be added to MORTGAGE.PAS and finally HCALC.PAS.



_STRUCTURED PROGRAMMING COLUMN_
by Jeff Duntemann

[Example 1: The stream registration record definition]


TStreamRec = RECORD
  ObjType : Word;    { You define a unique code for this field     }
  VMTLink : Word;    { The offset of the type's VMT in the dataseg }
  Load    : Pointer; { The full address of the type's Load method  }
  Store   : Pointer  { The full address of the type's Store method }
END;






[Example 2: The registration record for TMortgage]

CONST
  RMortgage : TStreamRec =
    (ObjType : 1200;
    VMTLink : Ofs(TypeOf(TMortgage)^);
    Load    : @TMortgage.Load;
    Store   : @TMortgage.Store);