Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Tagged Data Storage


Dr. Dobb's Journal September 1997: Tagged Data Storage

Less coding, more flexibility

Jeremy, a senior software engineer for Viperware, can be contacted at [email protected].


One of the most common problems software developers face is object storage. And one of the most common solutions to this problem is encoding the object into a specialized data-storage record that holds the information necessary to recreate the stored object at a later time. The disadvantage of this approach is that, for every object you want to store, you need a new class/struct definition for the data. In addition to the data-storage record, you need storage and restoration methods for every object. If the storage format changes, you must code conversion routines that read old stored objects and translate them into the new format -- a lengthy and complicated process.

You can greatly improve the efficiency and robustness of your data-storage architecture using a technique called "tagged data storage." Tagged data greatly decreases the time spent writing code to store and restore data. The tagged data storage architecture transforms one of the most painstaking tasks of software engineering into something that is natural, flexible, and extremely powerful. For instance, I use this architecture in a user-interface prototyping tool called "Grayscale Kitchen" that lets you visually build views, saving them as tagged objects for restoration at another time. To edit the view object properties, I've written a tag editor that can edit any tagged object. (One of the key advantages of tagged data is that all objects can be treated the same way.) In short, Grayscale Kitchen lets you design user interfaces without writing code. An implementation of a tagged data storage architecture is available electronically; see "Availability," page 3. The source code, which was developed using Metrowerks CodeWarrior for Macintosh, includes code for attaching and retrieving tags to/from taggable objects. The source also includes code for packing tags into and unpacking tags from a chunk of data.

Object Properties

To illustrate tagged data storage, I'll begin by breaking an object that is to be stored down into its basic properties. Figure 1, for example, shows the properties of a typical window object. Once you have code to store and restore these properties, you can easily recreate a window object by inspecting the stored values.

It is a good idea to have classes that handle common interfaces to methods and member variables. This can save time when designing classes with similar properties. For example, for every object that has a size defined by a rectangle, you create a class called CSizable from which every object that has a size is descended; see Listing One By inheriting CSizable's methods and member variables, all sizable objects use exactly the same interface to the class methods dealing with the object's size. This technique will help you when implementing tagged data storage. In addition to the interface methods, each property class has storage and restoration methods so you don't have to write store/restore code for basic properties more than once.

Tags

Simply breaking an object down into its properties doesn't help you define a data storage/restoration format. For this, you must extend the concept to include tags. A tag is a property of an object that can be removed from or attached to an object, hence the name "tag."

A tagged object consists of a collection of tags attached to an object; the tags represent its basic properties; see Figure 2.

There are several features of the tagged data-storage architecture that make it powerful and flexible:

Flexible storage format. Tags can be attached to an object in any order. Therefore, when the data storage format changes and the order of the tags is changed, it doesn't affect the ability to restore the object.

Defined default values. Every storable object defines default values for each of its basic properties. If one of the basic property tags for the object is removed, no crashes or data loss will occur. The storable object simply looks for the tag and, if no tag is there, uses the default value. Notice that there is no attached "Has Resize Box" tag in Figure 2. The window was reconstructed correctly because the default value for "Has Resize Box" is False.

Common data format. Every object that is stored with the tagged data-storage architecture has exactly the same data format. This represents a decrease in the amount of code to be written because all objects use the same data storage and restoration code.

Extending Data Type With Tags

Even if the object-restoration system knows only about certain tags, you can extend the object's definition by attaching a new tag without disrupting the data format.

Assume, for example, that you have two programs that edit .GIF files: One editor stores only the .GIF data, and the other also stores special hot-click and transparency information that is not part of the .GIF data format. When you create a .GIF file in the simple editor, you store only one tag -- the .GIF data. If you reopen the file in the advanced editor, it attaches the hot-click and transparency tags before the .GIF tag. Open the object in the simple editor again, and it has the added tags. Extra data or a change in the order of the data causes absolutely no problems.

Implementing Tagged Objects

To implement a tagged object, you must have a class that attaches and retrieves tags to/from the object in memory. For every object that can have tags attached to it, you create the class FT_TaggableObject; see Listing Two. For the tags that are attached to the object, you create the class FT_Tag (Listing Three). You must also have a collection class that you use to store the tags that are attached to the object. To illustrate, a simple linked-list class will suffice (I don't supply this; however, you should be able to find linked-list code easily).

As you can see, the most important methods of FT_TaggableObject are for attaching, retrieving, and removing tags. FT_Tag holds the necessary information for each tag, and includes accessor/mutator methods to get/set the tag field values. If you are unhappy with the linked-list implementation used by FT_TaggableObject, override/rewrite the appropriate methods.

Storing Tagged Objects

Storing a tagged object requires several new classes. You need a class that packs the tags attached to an object into a data format and unpacks tags from packed tag data; see TK_PackerObject in Listing Four.

The class TK_StorableObject (Listing Five) descends from FT_TaggableObject, and represents every object that can have tags attached to it and can be stored/restored. The class LB_DataChunk (Listing Six) represents the chunk of data in memory that will be the packed representation of the stored object. The purpose of LB_DataChunk is to abstract OS-dependent memory routines to a single class, allowing you to make the other classes as portable as possible.

I have chosen a storage format that is suitably generic, but if you are unhappy with it, you can override and rewrite TK_ PackerObject to pack the tags differently. The tagged data format consists of:

  • Tag header (see Listing Seven), which has four long word fields: tagType, tagName, unused, and tagDataSize. tagType is a four-character long word that represents the tag type (CRect is TcRct, boolean is Tbool, and so on). tagName is a four-character long word that uniquely identifies that tag. unused is an unused field that allows extension of the data format. tagDataSize tells the size of the tag's data.
  • Tag data (see Figure 3). Every tag header with a tagDataSize value greater than 0 is followed by the tag's data of that size. The tag's data may be the tags of another object that has been packed.

Each packed object is a collection of packed tags, with the next tag header coming directly after the previous tag's data. As in most code systems, it's a good idea to establish naming conventions for your tag types. I use the following convention:

  • All basic data types (short, long, boolean, and so on) are stored as all lowercase.
  • All classes (CWindow, CMenu, and the like) are stored in mixed case, with the first two characters of the long word representing the class system; see Listing Eight.

Assuming the object you want to store doesn't have storage tags attached, packing tags into the tagged data storage format involves two steps:

1. Create a temporary TK_PackerObject and attach the object's storage tags to that object. Call TK_StorableObject:: AttachStorageTags(tempObj); see Listings Nine and Ten.

2. Iterate through the collection of tags, appending the tag header and tag data for each; see Listing Eleven.

There are two types of tags you can store: basic data types (short, boolean, and the like) and object classes (CRect, CWindow, and so on). Storing a basic data type is simple: Set the tag header's data-size member to the sizeof(basicType) and append the value of that type after the tag header.

Storing object classes is just an extension of what you are already doing. If you want to store subobjects (such as panes inside a view), pack the subobject data and attach that packed data as a tag; see Figure 4.

Restoring Tagged Objects

Restoring a basic data type is as simple as instantiating that data type and stuffing the tag data into it. Restoring object classes is not as simple. Because the object classes are themselves packed data, you must retrieve the tag data and unpack it into the instantiated object class variable.

There is an additional complication for restoring object classes. Because all you have of the stored object class is an identifier that represents its class type, you must have a method for instantiating the correct object class when you unpack the data. There are at least a few good methods for doing this:

  • Use factory classes that store template classes that create other objects based on their type. Pass the object's class type to the factory, which searches for a template that will create the appropriate object.
  • Use an extended switch statement (see Listing Twelve) that creates any object based on the object class type.
  • Use a hash table in which the object class identifiers correspond to a method that instantiates the correct object class. A hash table will most likely be faster than an extremely long switch statement, but requires significantly more coding.

The process of unpacking tagged data involves two steps:

1. Parse the packed data and attach the packed tags to a temporary taggable object. This taggable object may be the object being restored; see Listing Thirteen.

2. Once all tags are attached, translate the tag values to values of member variables inside the object. If a specific tag is not attached, use the default value for that tag; Listing Fourteen.

Storing References

How do you store a reference to another object? For example, if you have a member variable that is a pointer, this pointer has a value of something like "0x12345321." Yes, you can store that value, but it will be invalid when you later load it.

Most of the time, you can't store references because you can't assume that the object being referred to has been instantiated in the system restoring the object. There are many solutions to this problem.

One approach is to attach a name tag to each object that uniquely identifies the object. If you have all of your objects in some sort of collection (linked list, array), you can search through that collection for a specifically named object and assign a reference to the object at run time. Another solution, particularly for objects that are already collection based, such as views, is to store the object's index in the collection.

Disadvantages of Tagged Data

The foremost disadvantage of the tagged data format is that, for each tag stored, you must store a 16-byte tag header. The effects of tag headers are most evident when storing objects with lots of small tags. However, today, disk space is seldom a major factor in a design implementation. Don't be afraid to come up a tagged data format to better suit your needs. Tagged objects, for instance, exhibit redundancy (unused tag types, tag sizes), making compression an ideal extension to the data format.

Another disadvantage of the tagged data architecture is that you cannot immediately access a tag -- you must search through the list of tags attached to the object to find the appropriate one. This slowdown is most obvious with objects that have lots of tags attached and you are searching for the last tag in the list. This is where using advanced coding techniques and data structures (such as hash tables) can speed things up.

Conclusion

By using a common, flexible, and expandable data storage architecture, developers can save hundreds of hours in determining data-storage formats for objects and writing the storage/restoration code. Since all objects are represented in the same way, every object can be manipulated with the same code, greatly reducing the amount of code that needs to be written. Although tagged data storage does have a few disadvantages, the positive far outweighs the negative.


Listing One

class CSizable{
protected:
    CRect    cs_dimensions;   // Assume we already have a CRect class.
public:
   virtual CRect GetDimensions();
    virtual void Resize(CRect& rNewSize);
    ...
}

Back to Article

Listing Two

class FT_TaggableObject{
public:
    // Basic tag routines.  
    virtual SN_Error AttachTag(FT_TagName tagName, FT_TagType tagType,
                      FT_TagData tagData, FT_TagDataSize tagDataSize);
    virtual SN_Error RemoveTag(FT_TagName tagName);
    virtual SN_Error RemoveAllTags();
    virtual SN_Error RetrieveTag(FT_TagName tagName, FT_Tag*& rpTag);
    virtual SN_Error CountTags(FT_TagIndex& rTagCount);
    ...
};

Back to Article

Listing Three

class FT_Tag{
protected:
    FT_TagData          ftt_tagData;
    FT_TagDataSize      ftt_tagDataSize;
    FT_TagName          ftt_tagName;
    FT_TagType          ftt_tagType;
    ... 
};

Back to Article

Listing Four

class TK_PackerObject{
public:
    virtual SN_Error PackData(LB_DataChunk& rDataChunk);
    virtual SN_Error UnpackData(LB_DataChunk& rPackedData);
    ...
};

Back to Article

Listing Five

class TK_StorableObject{
public:
    virtual SN_Error PackData(LB_DataChunk& rDataChunk);
    virtual SN_Error UnpackData(LB_DataChunk& rPackedData);
public:
    virtual SN_Error AttachStorageTags(TK_PackerObject& rObject);
    virtual SN_Error ExtractStorageTags(TK_PackerObject& rObject);
};

Back to Article

Listing Six

class LB_DataChunk{
public:
    virtual void* GetDataReference();
    ...
};

Back to Article

Listing Seven

class TK_TagHeader{
public:
    FT_TagType          tkt_tagType;
    FT_TagName          tkt_tagName;
    long                tkt_unused;
    FT_TagDataSize      tkt_tagDataSize;
public: 
    TK_TagHeader();
};

Back to Article

Listing Eight

// This is the class ID of CA_ObjectPane.const long ca_kObjectPane   = TcaOPU;

Back to Article

Listing Nine

SN_Error TK_StorableObject::PackData(LB_DataChunk& rDataChunk){
    SN_Error            result;
    TK_PackerObject     tempObject;
    
    EX_TRY
    {
        EX_THROW_ERROR(AttachStorageTags(tempObject));
        EX_THROW_ERROR(tempObject.PackData(rDataChunk));
    }
    catch (SN_Exception& rException)
    {
        result = rException;
    }
    return result;  
}

Back to Article

Listing Ten

SN_Error CA_ColoredObject::AttachStorageTags(TK_PackerObject& rObject){
    SN_Error            result;
    EX_TRY
    {
        EX_THROW_ERROR(TK_AttachRGBColor(rObject, kColorName, 
                                       cac_color.GetRGBColor()));
    }
    catch (SN_Exception& rException)
    {
   }
    return result;
}

Back to Article

Listing Eleven

SN_Error TK_PackerObject::PackData(LB_DataChunk& rDataChunk){
    SN_Error            result;
    EX_TRY
    {
        // Pack the object's tags into a chunk of data.
        EX_THROW_ERROR(PrepareForPackingData(rDataChunk));
        EX_THROW_ERROR(PackMyData(rDataChunk));
        EX_THROW_ERROR(FinishPackingData(rDataChunk));
    }
    catch (SN_Exception& rException)
    {
        result = rException;
        FailPackingData(rDataChunk);
    }
    return result;
}
SN_Error TK_PackerObject::PackMyData(LB_DataChunk& rDataChunk)
{
    LB_DataSize             totalSize, chunkSize;
    SN_Error                result;
    EX_TRY
    {
        EX_THROW_ERROR(CalculateTotalDataSizeWithHeader(totalSize,
                                                   sizeof(TK_TagHeader)));
        // Resize the chunk of data to fit all of the packed tag 
        // data.
        chunkSize = totalSize;
        rDataChunk.SetDataSize(chunkSize);
        // Pack the object's tags into a chunk of data.
        long            stuffOffset = 0;
        EX_THROW_ERROR(KeepPackingData(rDataChunk, stuffOffset));
    }
    catch (SN_Exception& rException)
    {
        result = rException;
    }
    return result;
}
SN_Error TK_PackerObject::KeepPackingData(LB_DataChunk& rDataChunk, 
                                                       long& rStuffOffset)
{
   TK_TagHeader         tagHeader;
    FT_TagIndex         tagCount;
    FT_Tag*             pTag;
    FT_TagDataSize      headerSize = sizeof(TK_TagHeader);
    SN_Error            result;
    
    // KeepPackingData() stuffs tag info at specified offset into data.
    EX_TRY
    {
        EX_THROW_ERROR(CountTags(tagCount));
        for (FT_TagIndex tagIndex = 1; tagIndex <= tagCount; tagIndex++)
        {
            // Retrieve the current tag.
            EX_THROW_ERROR(RetrieveTagWithIndex(tagIndex, pTag));
            EX_THROW_NIL(pTag);
            // Stuff the tag information into the tag header.
            tagHeader.tkt_tagType = pTag->GetType();
            tagHeader.tkt_tagName = pTag->GetName();
            tagHeader.tkt_tagDataSize = pTag->GetDataSize();
            // Stuff the tag info into the data chunk.
            rDataChunk.StuffAndOffsetData((Ptr) &tagHeader, 
                                                rStuffOffset, headerSize);
            rDataChunk.StuffAndOffsetData(FT_GetTagDataReference(
             pTag->GetData()), rStuffOffset, pTag->GetDataSize());
        }
    }
    catch (SN_Exception& rException)
    {
        result = rException;
        // The data chunk is no longer valid.
        rDataChunk.DestroyData();
    }
    return result;
}

Back to Article

Listing Twelve

void* CreateAnyObject(long objectType){
    switch (objectType)
    {
        case ca_kObjectPane:
            return new CA_ObjectPane;
            break;  
        case vp_kView:
            return new VP_View;
            break;  
        ...
        default:
            return nil;
    }
}

Back to Article

Listing Thirteen

SN_Error TK_PackerObject::UnpackMyData(LB_DataChunk& rPackedData){
    TK_TagHeader*   pTagHeader;
    FT_TagData      tagData = nil;
    SN_Error        result;
    LB_DataSize     unpackOffset = 0;
    LB_DataSize     dataSize = rPackedData.GetDataSize();
    // Unpack the packed data to restore the object. Add each tag as you come 
    // upon it in data chunk. For now, assume all data is packed tag data.
    EX_TRY
    {
        // Make sure that if the data is a handle, it doesn't get purged.
        LB_ChunkData            chunkData;
        EX_THROW_ERROR(rPackedData.RetrieveData(chunkData));
        EX_THROW_NIL(chunkData.GetData());
        
        // Remove all existing tags.
        EX_THROW_ERROR(RemoveAllTags());
        
        // This object has been stored with tags- extract them.
        while (unpackOffset < dataSize)
        {
            // Extract the current tag header.
            pTagHeader = (TK_TagHeader*) ((long) chunkData.GetData() + 
                                                               unpackOffset);
            tagData = nil;
            // Advance past the tag header.
            unpackOffset += sizeof(TK_TagHeader);
            if (pTagHeader->tkt_tagDataSize > 0)
            {
                // This tag has data. Allocate space in memory for tag data.
                tagData = GetAllocatedTagData(pTagHeader->tkt_tagDataSize);
                EX_THROW_NIL(tagData);
                EX_THROW_NIL(FT_GetTagDataReference(tagData));
                {
                  FT_UseTagData   useTagData(tagData);
                  // Copy the tag data into the pointer.
                  BlockMove((Ptr) ((long) pTagHeader + sizeof(TK_TagHeader)),
                                      FT_GetTagDataReference(tagData), 
                                      FT_GetTagDataSize(tagData));
                  // Advance past the tag data.
                  unpackOffset +=
                              pTagHeader->tkt_tagDataSize;
                }
           }
            // Attach each tag.
            EX_THROW_ERROR(AttachTag(pTagHeader->tkt_tagName, 
              pTagHeader->tkt_tagType, tagData, FT_GetTagDataSize(tagData)));
        }
    }
    catch (SN_Exception& rException)
    {
        result = rException;
        if (tagData)
            // Release the unused tag data from memory.
            DisposeTagData(tagData);
    }
    return result;
}

Back to Article

Listing Fourteen

SN_Error CA_ColoredObject::ExtractStorageTags(TK_PackerObject& rObject){
    SN_Error            result;
    EX_TRY
    {
        RGBColor            rgbColor;
        if (!TK_RetrieveRGBColor(rObject, kColorName, rgbColor).NoError())
            // Use the default value.           
            SetColor(0, 0, 0);      
        else
            cac_color.SetRGBColor(rgbColor);
    }
    catch (SN_Exception& rException)
    {
        result = rException;
    }
    return result;
}

Back to Article

DDJ


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.