Cross-Platform Database Programming

If you want to write software that's portable on platforms from supercomputers to embedded systems, you'll want to use the programming techniques presented here.


March 01, 1995
URL:http://www.drdobbs.com/open-source/cross-platform-database-programming/184409516

Here's how one developer supports more than 100 platforms

How many different combinations of operating systems and hardware platforms are used today? Fifty, a hundred--does anyone really know? How many are in the mainstream? DOS, Windows, NT, OS/2, Solaris, UNIX--Intel, Motorola, DEC, SunSPARC, IBM. With all of these choices, how are you to develop truly portable applications?

The good news is that if an application is approached correctly and with foresight, writing portable code does not have to be a chore. In this article, we'll discuss coding strategies for developing truly portable database applications. In doing so, we'll focus on the strategies you can implement to ease the movement of code and data between computer platforms. The topics include code portability, function wrappers, size and alignment of data objects, binary word order, and true multiplatform portability. All of the techniques we'll cover here are real world--they're what FairCom uses to make its c-tree Plus File Handler highly portable. c-tree Plus is a C-function library of database calls designed from the ground up with portability in mind. The c-tree family has been ported to well over 100 platforms ranging from Cray supercomputers to embedded systems and virtually all machines in between.

Code Portability

The way you organize modules which comprise your application can greatly affect the time required to port it. We suggest explicitly organizing your application modules into two sets: one that is system independent and one that is system dependent. For example, in c-tree Plus, about 98 percent of the code resides in system-independent modules which are not changed when we port from platform to platform. Not one line of code in these modules has to be touched. The remaining code--the system-dependent modules--contains those aspects of c-tree Plus which depend on system specifics. For c-tree, virtually all the system-specific code relates to low-level file operations.

To achieve this degree of separation, certain sections of a system-independent module may depend on a configuration setting in a system-dependent header file. However, these dependencies should reflect generic concepts, not platform-specific issues. In c-tree Plus for instance, there are #ifdefs in the system-independent modules which depend on the word order of binary values. Each system-dependent configuration header specifies the type of word order found on that platform; then the system-independent code need only have #ifdefs for the word-order choices, not for each platform.

To minimize unexpected problems when moving your C source code from one platform to another, it is advisable to utilize a well-defined set of typedefs for the basic computational objects as well as for application-specific objects. For example, in c-tree Plus we use three different typedefs for integers: COUNT, LONG, and NINT. They are, respectively, 2-byte integers, 4-byte integers, and the platform's natural integer. (Of course, we also support unsigned versions of these integers.) Then on any platform, c-tree can always rely on a COUNT to be two bytes and a LONG to be four bytes. This is implemented in a manner typical of our portability strategy: Default typedefs are supplied in a system-independent header module, and an optional entry in a system-dependent module can override the default. For example, default typedefs like Example 1 are found in a system-independent header file. In those few platforms where a short int is not two bytes or a long int is not four bytes, these typedefs can be specifically coded in the system-dependent header file, and the #ifndef will be false in the system-independent module.

Example 1: Default typedefs like this are found in a system-independent header file.

#ifndef INTEGER_OVERRIDE
  typedef short int COUNT;
  typedef long  int LONG;
#endif

To shorten the porting time and avoid problems which are difficult to isolate, we use the C program in Listing One , which tests each of the system-specific dependencies found in c-tree Plus. By compiling and executing this module (which does not require any of the c-tree API) you can determine, among many other things, if your definition of COUNT really results in 2-byte integers, or whether the memcmp function performs signed or unsigned byte-wise comparisons. When we port to a new environment, executing this test program is one of the first steps we take.

Listing One
/* Copyright (c) 1984 - 1994 FairCom Corporation. ALL RIGHTS RESERVED 
 * FairCom Corporation, 4006 West Broadway, Columbia, MO 65203.
 * 314-445-6833
 */

#include "ctstdr.h"
#include "ctoptn.h"

#define CTF " 1 3 "
#define S377    "'\\377'"

typedef struct {
    TEXT    mb1;
    } MB;
TEXT getbuf[128];
TEXT *align[5] = {
    "strange: call FairCom (314) 445-6833",
    "byte",
    "word (2 bytes)",
    "strange: call FairCom (314) 445-6833",
    "double word (4 bytes)"
    };
#ifdef PROTOTYPE
main ()
#else
main ()
#endif
{
    TEXT   buffer[8];
    TEXT   t255,*tp;
    COUNT  i,done,d[4],afactor;
    UCOUNT tu;

    TEXT *th,*tl;
    MB    mb[2];
    struct {
        ctRECPT pa1;
        TEXT    pa2;
        ctRECPT pp;
        ctRECPT pa3;
        ctRECPT pa4;
    } p;
    struct {
        ctRECPT ca1;
        TEXT    ca2;
        COUNT   cc;
    } c;
    struct {
        ctRECPT aa1;
        TEXT    aa2;
        TEXT    aa[3];
    } a;
    struct {
        ctRECPT ta1;
        TEXT    ta2;
        TEXT    tt;
    } t;

    if (SIZEOF(COUNT) != 2 ||
        SIZEOF(UCOUNT) != 2 ||
        SIZEOF(LONG) !=4 ||
        SIZEOF(VRLEN) != 4) {
        printf(
"\n\nBefore continuing with CTTEST be sure that the following types are");
        printf(
  "\ncorrectly sized. Make the necessary changes in CTPORT.H.\n\n");
        printf("              COUNT     UCOUNT    LONG     VRLEN\n");
        printf("              -------   -------   -------  -------\n");
        printf("  Should be:  2 bytes   2 bytes   4 bytes  4 bytes\n");
        printf("Actual size:  %d         %d         %d        %d\n\n",
             SIZEOF(COUNT),SIZEOF(UCOUNT),SIZEOF(LONG),SIZEOF(VRLEN));
        exit(0);
    } else {
        printf("\n\nCOUNT, UCOUNT, LONG & VRLEN are properly sized.");
        tu = 40000;
        if (!(tu > 0)) {
            printf(
"\n\nBefore continuing with CTTEST, be sure that UCOUNT is an");
            printf("\nunsigned short integer. See CTPORT.H\n\n");
            exit(0);
        }
    }
    t255 = '\377';
    printf("\n\nC255 Test for CTCMPL.H:");
    printf("\n\tUse the following setup in CTCMPL.H - \t#define C255\t");
    if (t255 == -1)
        printf("%d",t255);
    else
        printf(S377);
    printf("\n\t\t\t    Current setting - \t#define C255\t");
    if (C255 == 0x00ff)
        printf(S377);
    if (C255 == -1)
        printf("%d",C255);
    i  = 0x0201;
    tp = (TEXT *) &i;
    printf("\n\nLOW HIGH Test for CTOPTN.H:");
    printf("\n\tUse the following setup in CTOPTN.H - \t#define ");
    if ((*tp & 0x00ff) > (*(tp + 1) & 0x00ff))
        printf("HIGH_LOW");
    else
        printf("LOW_HIGH");
    printf("\n\t\t\t    Current setting - \t#define ");
#ifdef LOW_HIGH
    printf("LOW_HIGH");
#endif
#ifdef HIGH_LOW
    printf("HIGH_LOW");
#endif
    /* NULL size test */
    printf("\nNULL Size Test: ");
    if (SIZEOF(NULL) == SIZEOF(tp))
        printf("ok (%d bytes)",SIZEOF(NULL));
    else
        printf("inconsistent (NULL is %d bytes & ptr's are %d bytes)",
            SIZEOF(NULL),SIZEOF(tp));
    /* test of compar function for byte-wise comparisons */
    for (i = 0; i < 4; i++) {
        buffer[i]   = 'A';
        buffer[i+4] = '\377';
    }
    tp   = buffer;
#ifndef FASTCOMP
#ifdef ctDS
    done = ((COUNT) *tp & 0x00ff) - ((COUNT) *(tp + 4) & 0x00ff);
#else
    done = (*tp & 0x00ff) - (*(tp + 4) & 0x00ff);
#endif
    if (done >= 0) {
        printf(
"\n\n\nBefore continuing with CTTEST, call FairCom (314) 445-6833 concerning");
        printf(
"\nthe critical compar function in CTCOMP.C. Please report the following");
        printf(
"\nthree numbers to FairCom: %d %d %d\n",(*tp & 0x00ff),(*(tp + 4) & 0x00ff),
            done);
        exit(0);
    } else
#endif /* ~FASTCOMP */
        printf("\n\ncompar function (CTCOMP.C) test is successful.");
#ifndef ctNOMEMCMP
    done = ctrt_memcmp(tp,tp + 4,1);
    if (done >= 0) {
        printf(
"\n\n\nBefore continuing with CTTEST, add '#define ctNOMEMCMP' to ctcmpl.h.");
        printf(
"\nThis indicates that your memcmp function cannot be used in our high speed");
        printf(
"\nkey loading routine since its treats bytes as signed quantities.\n");
        exit(0);
    }
#endif
    /* PAUSE IN OUTPUT */
    printf("\n\nHit RETURN (or ENTER) to continue...");
    gets(getbuf);

       printf("\n\nAlignment test for help in computing key segment offsets.");

    th = (TEXT *) &p.pa4;
    tl = (TEXT *) &p.pa3;
    i  = th - tl;
    if (i == 1) {
        printf(
"\n\n*** This machine addresses 32 bit words (not bytes). Call        ***");
        printf(
  "\n*** FairCom at (314) 445-6833. STATUS & HDRSIZ must be changed.  ***");
        afactor = 4;
    } else if (i == 2) {
        printf(
"\n\n*** This machine addresses words (not bytes). Add 2 to STATUS in ***");
        printf(
  "\n*** CTOPTN.H and add 4 to HDRSIZ in CTOPTN.H. Also each member   ***");
        printf(
  "\n*** of a structure will be at least word aligned. In particular,");
        afactor = 2;
    } else
        afactor = 1;
    printf("  Members of\nstructures will be aligned as follows:\n\n");
    printf("\tMember Type      Alignment\n");
    printf("\t-----------      -----------------\n"); 

    th = (TEXT *) &p.pp;
    tl =          &p.pa2;
    i  = (th - tl) * afactor;
    if (i > 4) i = 0;
    printf("\t4 byte int       %s\n",align[i]);

    th = (TEXT *) &c.cc;
    tl =          &c.ca2;
    i  = (th - tl) * afactor;
    if (i > 4) i = 0;
    printf("\t2 byte int       %s\n",align[i]);
    if (i > 2)
        printf(
"\nCall FairCom (314) 445-6833 concerning 2 byte integer alignment.\n");

    th = (TEXT *)  a.aa;
    tl =          &a.aa2;
    i  = (th - tl) * afactor;
    if (i > 4) i = 0;
    printf("\tchar array       %s\n",align[i]);

    th = &t.tt;
    tl = &t.ta2;
    i  = (th - tl) * afactor;
    if (i > 4) i = 0;
    printf("\tchar             %s\n",align[i]);

    printf("\n\nStructure 'SIZEOF' Test: ");
    i = SIZEOF(MB);
    th = &mb[1].mb1;
    tl = &mb[0].mb1;
    if (i == (th - tl))
        printf(" OK");
    else
        printf(
"\nCall FairCom at (314) 445-6833 to report these two numbers: %d %d\n",
            i, (th - tl));
    printf("\n\nShort Integer Input Test for CTOPTN.H:");

    done = NO;
    d[1] = d[3] = 5;
    if (sscanf(CTF,"%h %h",d,d+2) == 2 &&
        d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
        printf(
"\n\tUse the PERC_H option in CTOPTN.H.\n");
        done = YES;
    }
    if (!done) {
        d[1] = d[3] = 5;
        if (sscanf(CTF,"%d %d",d,d+2) == 2 &&
            d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
            printf(
"\n\tUse the PERC_D option in CTOPTN.H.\n");
            done = YES;
        }
    }
    if (!done) {
        d[1] = d[3] = 5;
        if (sscanf(CTF,"%hd %hd",d,d+2) == 2 &&
            d[0] == 1 && d[2] == 3 && d[1] == 5 && d[3] == 5) {
            printf(
"\n\tUse the PERC_HD option in CTOPTN.H.\n");
            done = YES;
        }
    }
    if (!done)
        printf(
"\n\n*** COMPILER DOES NOT CONFORM TO KNOWN CONVENTIONS ***\n");

    printf("\tCurrent setting - ");
#ifdef PERC_H
    printf("PERC_H");
#endif
#ifdef PERC_D
    printf("PERC_D");
#endif
#ifdef PERC_HD
    printf("PERC_HD");
#endif
    /* PAUSE IN OUTPUT */
    printf("\n\nHit RETURN (or ENTER) to continue...");
    gets(getbuf);

    printf("\n\nCTOPTN.H SUMMARY -\n");

#ifdef FPUTFGET
    printf("\nFPUTFGET:\tnon-server, multi-user application");
#endif
#ifdef NOTFORCE
    printf("\nNOTFORCE:\tsingle-user or server based application");
#endif

#ifdef RESOURCE
    printf("\nRESOURCE:\tresources are supported");
#else
    printf("\nNO_RESOURCE:\tresources are NOT supported");
#endif

#ifdef CTBATCH
    printf("\nCTBATCH:\tbatch retrieval is supported");
#else
    printf("\nNO_BATCH:\tbatch retrieval is NOT supported");
#endif

#ifdef CTSUPER
    printf("\nCTSUPER:\tsuperfiles are supported");
#else
    printf("\nNO_SUPER:\tsuperfiles are NOT supported");
#endif

#ifdef LOW_HIGH
    printf("\nLOW_HIGH:\tLSB to MSB ordering (ala Intel 8086 family)");
#endif
#ifdef HIGH_LOW
    printf("\nHIGH_LOW:\tMSB to LSB ordering (ala Motorola 68000 family)");
#endif

#ifdef VARLDATA
    printf("\nVARLDATA:\tvariable length data records are supported");
#else
    printf("\nNO_VARLD:\tvariable length data records are NOT supported");
#endif

#ifdef PERC_H
    printf("\nPERC_H:\t\t%%h");
#endif
#ifdef PERC_D
    printf("\nPERC_D:\t\t%%d");
#endif
#ifdef PERC_HD
    printf("\nPERC_HD:\t%%hd");
#endif
    printf(" short integer format specification");

#ifdef VARLKEYS
    printf("\nVARLKEYS:\tkey compression supported");
#else
    printf("\nNO_VARLK:\tkey compression is NOT supported");
#endif

#ifdef PARMFILE
    printf("\nPARMFILE:\tISAM parameter files are supported");
#else
    printf("\nNO_PARMF:\tISAM parameter files are NOT supported");
#endif

#ifdef RTREE
    printf("\nRTREE:\t\tr-tree supported");
#else
    printf("\nNO_RTREE:\tr-tree is NOT supported");
#endif

#ifdef CTS_ISAM
    printf("\nCTS_ISAM:\tISAM functionality supported");
#else
    printf("\nNO_ISAM:\tISAM functionality is NOT supported");
#endif

#ifdef CTBOUND
    printf("\nCTBOUND:\tnon-server mode of operation");
#else
    printf("\nNO_BOUND:\tserver mode of operation");
#endif

#ifdef PROTOTYPE
    printf("\nPROTOTYPE:\tfunction prototypes are supported");
#else
    printf("\nNO_PROTOTYPE:\tfunction prototypes are NOT supported");
#endif
    printf("\n\nEnd Of CTTEST\n");
    exit(0);
}


C lends itself to run-time library support. Many developers turn to third-party libraries to assist with database I/O, report generation, and other application necessities. When examining a third-party library, it is important to investigate its portability. If the library is developed and used properly, it can be a tremendous timesaver to the development and port of the application. Of course, if the library is not portable or available on different platforms, it may prove detrimental.

Function Wrappers

Another way to isolate your application from the specifics of the system or of third-party run-time libraries is to use function wrappers. These act as a layer between what your application needs to accomplish (say, adding a record to a database) and the particular function which will perform the desired action.

By placing all the wrapper functions in one module, you can change the underlying operations without affecting the many application modules which use these functions. However, while C++ makes it easy to modify the parameters used to invoke an action, C is more rigid. Therefore, to keep your application well insulated from the underlying functions, you must carefully select the parameters used in your wrapper functions. While you cannot ignore the parameter requirements of the underlying functions, you must make sure that the wrapper function parameters reflect the essential nature of your application and the function being called. A wrapper should not simply repeat the exact parameters used in the underlying function.

For example, in c-tree Plus AddRecord uses a small integer value to identify the data file involved. You may wish to use symbolic names to refer to data files. In this case, you would pass the symbolic name to the wrapper function, which would in turn call your own function to translate the name to a c-tree Plus file number. This same translation function would be used in many of the wrapper functions which call the c-tree Plus library.

Carefully selecting a naming convention for your wrapper functions simplifies the task of locating them if they must be modified. We would suggest, for example, that all your database wrapper functions begin with dbw_, followed by the desired action; say, dbw_AddRecord for the function to add a record.

Size and Alignment of Data Objects

The three most pressing issues related to moving data across platforms are structure alignment, size of data objects, and byte order of binary values.

Different hardware architectures and different C compilers enforce different alignment restrictions on various data types. An alignment restriction refers to the legitimate addresses at which a data object can be referenced. For instance, if a CPU can only address integers on even boundaries, integers are "word-aligned." Attempting to reference an integer on an odd boundary (that is, its beginning address is odd) would probably cause a system exception. Generally, a data object is at most restricted to an address boundary no larger than the data object itself. For instance, a 4-byte integer will at most be required to be aligned on a 4-byte (double-word aligned) boundary while a 2-byte integer on the same machine will, at worst, be restricted to a 2-byte boundary.

For information which only exists temporarily in memory, alignment restrictions are not a concern. But if your data structures are not carefully planned, then information stored on disk may not be usable across different platforms: The position of members within a data structure will change between platforms, and/or the size of the data structures will be different across platforms. To avoid these dilemmas, we:

  1. Create a set of constant size typedefs for basic data items (as discussed earlier for COUNT and LONG).
  2. Place members in structures to encourage "automatic" alignment, and use explicit padding between members as necessary.
  3. Add padding to the end of a structure, if necessary, to keep the size of the structure a multiple of its largest-sized data type.

The first step implies that we discourage the use of natural integers as part of data structures used for permanent storage. If moving your data across platforms is not important, then this is not an issue. (Some application developers will be more than satisfied if the application is portable, with no regard to the portability of the data. They do not expect the data to be moved from platform to platform.)

The second step implies that the largest data items be placed first in the structure, or that shorter data objects be grouped together to form clusters, the size of the most restricted alignment requirement. For example, if the largest member of a data structure is a 4-byte integer, then the 4-byte integers should be at the beginning of the structure. If you wish to place shorter members at the beginning, then group them in clusters which are multiples of four bytes. Note that character arrays are treated (along with individual characters) as the smallest data types, and should occur at the end of the structures.

The third step is necessary to ensure no size difference across platforms regardless of whether padding was required between structure members.

Two good examples of proper alignment techniques are shown in Example 2. (Note that UTEXT represents a 1-byte unsigned character and TEXT, a signed character.)

Example 2: Techniques for maintaining proper alignment.

(a)

typedef struct invent_record {
      LONG       invent_id;
      LONG       invent_level;
      LONG       invent_reorder;
      COUNT      invent_status;
      COUNT      invent_bin;
} INVENT_RECORD;

(b)

typedef struct vendor_record {
      COUNT      vendor_type;       /* The first three members  */
      UTEXT      vendor_status;     /* of this structure use    */
      UTEXT      vendor_reserved;   /* precisely four bytes.    */
      LONG       vendor_acc_pay;
      TEXT       vendor_name[58];
      TEXT       vendor_padding[2]; /* Keep struct multiple of 4*/
} VENDOR_RECORD;

If you do not follow this strategy, compilers on various platforms may be forced to insert padding bytes in front of some structure members to force the required alignment. Further, the size of the structures may vary from platform to platform. The structure in Example 3 may result in an 8-byte structure on a byte-aligned platform and a 12-byte structure on a double-word-aligned platform. On a double-word platform, three bytes of padding would be inserted before the customer_acc_rcv member and one byte of padding before the customer_zone member.

Example 3: The structure size in this example depends upon the byte alignment of the platform.

typedef struct customer_record {
     UTEXT     customer_status;
     LONG      customer_acc_rcv;
     UTEXT     customer_priority;
     COUNT     customer_zone;
} CUSTOMER_RECORD;

Finally, we strongly suggest omitting pointers to other structures within data structures used for permanent disk storage. While the use of pointers within structures is a very powerful and useful technique in C programming, we discourage it for actual data-storage structures. The size of pointers varies across platforms from as small as two bytes to as large as eight bytes, and the values of address pointers lose their meaning once the structure is placed on disk.

Binary Word Order

CPUs differ in the manner in which integers and floating-point values are stored in memory. On Little-endian machines, the lowest-order byte is stored in the first byte of the integer, and the most significant byte is stored last. Such CPUs include the Intel family of processors and the new DEC Alpha processors. On Big-endian machines, the highest-order byte is stored in the first byte, and the least-significant byte is stored last. These CPUs include the Motorola 68000 family of processors and the IBM RS/6000 family. (In some unusual circumstances, a binary value may be a mixture of these strategies.)

While most application code is totally independent of the internal word ordering, this difference does pose a problem when moving application data across platforms. Such a move results in invalid binary values if the binary word ordering is different. c-tree Plus uses two different strategies to deal with this problem. One is to store the binary data on disk in the same order regardless of the platform's internal order. c-tree Plus uses this approach for its nonserver implementations, and stores the data in the Little-endian order (because of the great preponderance of Intel processors). The second strategy, employed with client/server implementations of c-tree Plus, stores the data in the server's native ordering. This places the burden for transforming byte ordering onto the client processors, relieving the server processor of this overhead.

To permit c-tree Plus to automatically perform the byte-order transformations on application data, we take advantage of c-tree's ability to store resources in data files. c-tree Plus allows you to specify the field types of your data records in a resource stored within the data file. When the data is accessed, the field type information directs any necessary transformations. Also, if the data file is moved, it is still possible to interpret the data properly.

Summary

Careful organization and isolation of your application code from user and file-handling interfaces can significantly reduce the effort required to move your application code from one platform to another. Creating a test program sensitive to the platform-dependent elements of your application will further reduce the time and problems encountered in moving the code. With each port, you become more attuned to the issues of portability, and can further refine your strategy.

By defining basic computational data objects which are size invariant across platforms, and by constructing stable, well organized data structures, your applications will even be able to share data across different platforms, or use data stored on different platforms.


William is the founder of FairCom Corp. and senior developer of c-tree, c-tree Plus, r-tree, and the FairCom Server. Randal is FairCom's director of technical operations.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.