Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Virtual Array in C


MAY88: VIRTUAL ARRAY IN C

VIRTUAL ARRAY IN C

The concept of virtual arrays is not new, but it can now be applied to microcomputers

This article contains the following executables: VARTEST.EXE

Mark Tichenor

Mark Tichenor is an instructor in the Business Information Processing Dept., Holland College, Charlottetown Centre, Weymouth St., Charlottetown, PE C1A 4Z1, Canada. He is currently developing interactive software design tools.


The Code

This article presents a virtual-array management scheme for C programmers. This simple and flexible approach uses the power of the C language to manipulate arrays that extend themselves automatically and that are limited in size only by the file-size constraints of the operating system. The concept of virtual arrays is not new, but it can now be applied to microcomputers.

Highlights of the scheme are:

  • Virtual arrays are stored on disk but are accessed as though they are in memory.
  • Automatic file management is provided without explicit reads or writes.
  • Disk records can be accessed simply by referencing an array element in any C expression.
  • Data can be written to any record in the file simply by using an assignment statement to place a value in an array element.

The Problem

Many problems lend themselves naturally to simple solutions based on the use of data arrays because arrays are easy to manipulate using simple notation. Data arrays can only be as big as available memory allows, however. This is an aggravating limitation because, once arrays outgrow available memory, they are no longer useful.

In implementing an application prototype involving complex tree structures, I chose an array approach using data arrays that incorporated pointers. These pointers were the array indices of other array elements. As the application provided the means to manage a growing volume of information, it was necessary to overcome the memory limitation imposed by the array model.

A lot of thought went into solutions based on dynamic memory allocation of data structures as this approach appeared to promise an elegant and general solution. The need to swap data between memory and disk still remained, however. In fact a whole new problem arose: that of resolving the differences between pointers in memory and pointers on disk.

When dynamically allocated data structures are linked by memory pointers these pointers become meaningless when written to disk. The linkages must be recreated when the structures are brought back into memory. This problem dramatically increases the complexity of any program that incorporates the swapping of data between memory and disk.

It would be a great advantage to keep using data arrays if their size were not limited by available memory.

The Solution

The memory limitation associated with arrays can be overcome by using virtual arrays that are stored on disk and accessed as though they were in memory. Thus, virtual arrays offer a simple, elegant solution. The C programming language provides the power to manage virtual arrays automatically through the creative use of pointer notation and #define macros.

It turns out that virtual-array elements can be accessed by simple reference using a predefined alias and an array index---for example, item_number(i) = 5; could be used to set the item_number field of the ith array element to the value 5.

Because virtual arrays reside on disk, with paging to memory automatically accomplished behind the scenes, you have a disk-based data management system that operates without any explicit reads or writes. These operations are performed by the virtual-array access function. which is invoked by the predefined access macros.

The Code

Listing One, page 63, contains the virtual-array header file, and Listing Two, page 63, contains the virtual-array access routines. Four C functions handle all the mechanics of managing the virtual arrays.

The Initialization Routine

int init_v_array(filename, record_size, filchar) char *filename, filchar; int record_size;

Init_v_array creates a new virtual-array file named by filename. File headers are written initializing the record size of the file to record_size and setting the number of records to 0. The specified fill character is also placed in the file header for later use in initializing new array elements. The file is then closed. This routine returns a value of 1 if successful, 0 if not.

An example is:

if (init_v_array("DATA.VAR",128)) printf("Success!");

which will try to create a new virtual array named "DATA.VAR" with elements 128 bytes long and will print a message if successful.

The Open Routine

VACB *open_v_array(filename,buffer_size) char *filename; int buffer_size;

open_v_array prepares an existing virtual array for use, buffer_size specifies how many array elements to allocate space for in memory. The routine returns NULL if unsuccessful; otherwise, it returns a pointer to the created virtual-array control block (VACB).

An example is:

VACB *item array; item_array = open_v_array("DATA.VAR",100);

which opens the "DATA.VAR" array file and reserves enough buffer space for 100 array elements.

The Close Routine

void close_v_array(array) VACB *array;

close_v_array writes elements from buffer to disk, closes the array file, and frees allocated memory.

An example is:

close__v array(item_array);

The Access Routine

void *access_v_array(array_index) VACB *array; long index;

This routine performs the low-level file management for virtual arrays. It makes sure the array element referenced by index is in memory and returns a pointer to it in memory. If the specified array element is already in memory, access is immediate; if not, it is read into memory after saving the record it displaces in the buffer. If the referenced array element does not exist, the routine automatically extends the array file so that it does.

For an example, see the listings and the section entitled "Access Notation."

Figure 1: Virtual-array file layout

Figure 2: Virtual-array control block and array buffer

The Access Algorithm

The key to easy reference to array elements (or to fields within them) is the access_v_rec function. This function returns a void pointer to the location of the element in memory after making sure it is there.

In the interests of demonstrating the feasibility of this approach quickly, I have paid no regard to optimization. The only stipulation was that access to array elements already in memory be as fast as possible.

To meet this requirement, I chose a simple modulus buffering scheme. Each array element has a fixed position in the buffer calculated by element number modulus buffer size. Each element in the buffer is prefixed with a long array index containing the number of the element currently present or a -1 if no element is in that buffer position.

If the referenced array index does not match the index of the element currently occupying the calculated buffer location, that array element in the buffer (if any) is written to disk and the referenced element is read in before the buffer address is returned. This scheme avoids searching for records in memory.

Cautions

You should use only long integer indices to access array elements. Be sure to #include <varray.h>.

You should also take care not to accidentally reference array items far beyond the end of the array unless you really intend to do so. When an array element beyond the end of the array (that is, it does not exist) is referenced, the access routine automatically extends the file so that the referenced element does exist. This process can cause a lot of disk activity and take considerable time because all the elements from the end of the array up to the referenced element must be initialized and written to disk.

Memory copy functions, such as strcpy, used to copy data directly from one array element to another are unreliable because the two may occupy the same buffer location. For example:

strcpy(desc(n).desc(m));

will not work when n modulus buffer_size equals m modulus buffer_size. However:

qty(n) = qty(m) + 1;

will work because memory copying is not invoked. (This is true because the compiler calculates the assignment value before it calculates the address for assignment.)

To avoid buffer collision, use:

strcpy(temp desc,desc(m));
strcpy(desc(n),temp desc);

The buffer collision problem associated with the simple modulus buffering scheme could be avoided with the development of a robust least recently used buffering scheme.

Buffered access of any kind also precludes the use of such things as in-memory sort utilities (for example, qsort).

Take care not to overrun the end of the array elements when copying strings into them. This would corrupt the index information in the next buffer slot, with unpredictable (probably bad) results.

Access Notation

#define statements are used to simplify access notation both to array elements and to fields within them. One #define is required for each virtual array used. Optionally, one #define may be used to simplify access to each field within a virtual array of structures. Alternately, pointer notation may be used to access fields.

In the example program in Listing Three, page 66:

#define VREC(i) ((items *) access_v_rec(item_array,i))

sets up easy access to elements in the virtual array called item-array (stored in "ITEMS. VAR"). In this example. (items *) casts the void pointer returned by access_vrec to the type pointer to items, where items is the defined type of the virtual-array element structure. item_array is the virtual-array handle returned by open v_array.

In use, VREC(i) returns a memory pointer to the ith array element wherever it is in the virtual-array buffer. *VREC(i) is a reference to the ith items structure in the array. It can be used in any expression as a variable of type items.

The expression:

#define item(i) VREC(i)->v_item

in Listing Three sets up easy access to the v_item field in the ith element of the item_array virtual array, item(i) becomes its alias. This allows you to say:

item(i) = 24;

which is much more intuitive than:

((items *)  access_v_rec(item array,i))-> v_item = 24;

This ease of reference is what makes this virtual-array system possible.

The size of a virtual array is always available because it is stored in the virtual-array control block. The expression:

i = item array->size;

sets i to the current size of the virtual array referred to by item array. You should never change this value but can refer to it to determine the index value to use in order to extend the array by one element. For example:

= item array_size;  item(i) = any value;

will extend the array by one element.

Multidimension arrays are also possible; however, you can extend them only in one dimension. In the example program, the v_desc field is a character array that can be regarded as two dimensional when extended over the virtual array. Particular elements in the two-dimensional array can be referenced by desccv][x], where y is the element number and x is the character number.

If you wished to formalize this two-dimensional access, you could add a new macro:

#define D(x,y) VREC(y)->v_desc[x]

in which D(x,y) would refer to column x, row y. This construct is useful for creating very large virtual-display spaces that can be quickly mapped to the physical display to accomplish fast panning.

Using Virtual Arrays

    1. Create a virtual-array file by calling init_v_array and passing it the DOS file name, the record length of array elements (in bytes), and a fill character for initializing new elements in the array as they are referenced. This creates a new file, overwriting any existing file of the same name, and writes out the array header information: 4 bytes for array size (initialized to 0), 2 bytes for the size of array elements, and 1 byte for the fill character. Perform this step only once.

    2. Typedef your array element structure.

    3. Define your array element access macro---for example:

#define VBEC(i) ((items *) access_v_rec(item_array,i))

    4. Define each field's access macro for the array.

    5. Call open_v_array, passing it the file name and buffer size. This call must assign the return value to a variable of type (VACB *) This must be the same variable name used in your array access macro---for example:

item_array = open_v_array("ITEMS.VAR",10);

    6. Do whatever you want with your virtual array using the access macros you defined in steps 3 and 4.

    7. Close your virtual array by calling close_v_array and passing it your virtual-array handle---for example, close_virtual_array(item_array);.

Implementation

These routines were tested using Borland's Turbo C development system with both the large and small memory models. Using a standard 4.77-MHz PC-compatible with a cheap (slow) hard disk and MS-DOS 3.2. 200 records of 128 bytes each were swapped in 15 seconds and 200 4-byte records were swapped in less than 2 seconds. Though these access speeds may be slow for massive matrix multiplication, they are adequate for many data-access applications.

The beauty of this virtual-array system is that arrays are no longer limited by available memory and all the mechanics of file management take place automatically behind the scenes.

[LISTING ONE]

<a name="00e8_0012">

/* Virtual Array Header File */
#include <alloc.h>
#include <stdio.h>

/* Virtual Array Control Block typedef */

typedef struct {


   FILE *file;           /* pointer to file control block */
   long size;            /* number of array elements in file */
   int  elsize;          /* number of bytes in each element */
   char *buffer;         /* pointer to array buffer */
   int  buf_elsize;      /* size of element in buffer including index */
   int  buf_size;        /* number of elements in buffer */
   char *blank_rec;      /* pointer to initialization record */
                         /* used for extending file */
} VACB ;                 /* virtual array control block type name */

/* Virtual Array Access Prototypes */

int init_v_array(char *filename,int rec_size,char filchar);
VACB *open_v_array(char *filename,int buffer_size);
void close_v_array(VACB *v_array);
void *access_v_rec(VACB *v_array,long index);


<a name="00e8_0013"><a name="00e8_0013">
<a name="00e8_0014">
[LISTING TWO]
<a name="00e8_0014">

/*  Virtual Array Access Routines  */

#include <varray.h>

#define header 7

/****************
 * init_v_array *
 ****************/

int init_v_array(filename,rec_size,filchar)
char *filename, filchar;
int  rec_size;
{
   long size;
   FILE *f;
   f = fopen(filename,"wb");
   if (f != NULL) {
      size = 0;
      fwrite(&size,4,1,f);         /* write array size of zero */
      fwrite(&rec_size,2,1,f);     /* and array element size   */
      fwrite(&filchar,1,1,f);      /* and fill char            */
      fclose(f);                   /* to file header */
      return(1);
   }
   else
      return(NULL);
}

/****************
 * open_v_array *
 ****************/

VACB *open_v_array(filename,buffer_size)
char *filename;
int  buffer_size;
{
   VACB *v_array;
   char *buf_ptr;
   int  i;
   char filchar;

   /* allocate virtual array control block */

   v_array = (VACB *) malloc(sizeof(VACB));
   if (v_array == NULL) return(NULL);

   /* open virtual array file */

   v_array->file = fopen(filename,"r+b");
   if (v_array->file == NULL) {
      free(v_array);
      return(NULL);
   };

   /* get array size and element size for control block */

   fread(&v_array->size,4,1,v_array->file);
   fread(&v_array->elsize,2,1,v_array->file);
   fread(&filchar,1,1,v_array->file);
   v_array->buf_elsize = v_array->elsize + 4;

   /* allocate buffer */

   v_array->buffer = (char *) malloc(v_array->buf_elsize * (buffer_size + 1));
   if (v_array->buffer == NULL) {
      fclose(v_array->file);
      free(v_array);
      return(NULL);
   };
   v_array->buf_size = buffer_size;

   /* set up blank_rec using the fill character in array header */
   /* for initializing new array elements */

   buf_ptr = v_array->buffer + v_array->buf_elsize * v_array->buf_size;
   v_array->blank_rec = buf_ptr + 4;
   for (i=0; i < v_array->buf_elsize; i++)
      *buf_ptr++ = filchar;

   /* set record index negative for all buffer elements   */

   buf_ptr = v_array->buffer;
   for (i = 0; i < v_array->buf_size; i++) {
      *((long *)buf_ptr) = -1L;
      buf_ptr += v_array->buf_elsize;
   };
   return(v_array);
}

/*****************
 * close_v_array *
 *****************/

void close_v_array(v_array)
VACB *v_array;
{
   int  i;
   char *buf_ptr;
   long rec_index, file_offset;

   buf_ptr = v_array->buffer;

   /*  flush buffer */

   for (i=0; i < v_array->buf_size; i++) {

      /* check each element index */
      /* if element present, write it to disk */

      rec_index = *((long *)buf_ptr);
      if (rec_index >= 0) {
    file_offset = header + rec_index * v_array->elsize;
    fseek(v_array->file,file_offset,0);
    fwrite(buf_ptr + 4, v_array->elsize,1, v_array->file);
      };
      buf_ptr += v_array->buf_elsize;
   };
   free(v_array->buffer);     /* de-allocate buffer */
   fclose(v_array->file);     /* close array file   */
   free(v_array);             /* de-allocate VACB   */
}

/****************
 * access_v_rec *
 ****************/

void *access_v_rec(v_array,index)
VACB *v_array;
long index;
{
   char *buf_ptr;
   int  buf_index;
   long rec_index, temp_index;

   /* calculate buffer address of referenced element */

   buf_index = index % v_array->buf_size;
   buf_ptr = v_array->buffer + buf_index * v_array->buf_elsize;
   rec_index = *(long *)buf_ptr;

   /* if element present, return its buffer address */

   if (rec_index == index) return(buf_ptr + 4);

   /* if element doesn't exist, extend the file */

   if (index >= v_array->size) {
      fseek(v_array->file,0,2);
      for (temp_index = v_array->size; temp_index++ <= index; )
    fwrite(v_array->blank_rec, v_array->elsize, 1, v_array->file);
      v_array->size = index + 1;
      fseek(v_array->file,0,0);
      fwrite(&v_array->size, 4, 1, v_array->file);
   };

   /* if buffer slot is occupied by another element, */
   /* save it to disk */

   if (rec_index >= 0) {
      fseek(v_array->file, rec_index * v_array->elsize + header, 0);
      fwrite(buf_ptr + 4, v_array->elsize, 1, v_array->file);
   };

   /* read referenced element into buffer slot */

   fseek(v_array->file, index * v_array->elsize + header, 0);
   fread(buf_ptr + 4, v_array->elsize, 1, v_array->file);
   *((long *)buf_ptr) = index;

   /* return address of element in buffer */

   return(buf_ptr + 4);
}


<a name="00e8_0015"><a name="00e8_0015">
<a name="00e8_0016">
[LISTING THREE]
<a name="00e8_0016">

/* Example Program Using Virtual Arrays */

#include <varray.h>

/* Access Macros */

#define VREC(i) ((items *)access_v_rec(item_array,i))
#define item(i) VREC(i)->v_item
#define qty(i) VREC(i)->v_qty
#define desc(i) VREC(i)->v_desc

/* Array element structure typedef */

typedef struct {
   int v_item,
       v_qty;
   char v_desc[24];
} items;

main()
{
   VACB *item_array;
   long i;

   /* create a virtual array setting element size to */
   /* the size of items structure and setting the    */
   /* initialization char to the space char          */

   init_v_array("ITEMS.VAR",sizeof(items),' ');

   /* open the virtual array, reserve buffer space for 10 elements */

   item_array = open_v_array("ITEMS.VAR",10);


   /* create 50 array items */

   for (i = 0; i < 50 ; i++) {
      item(i) = i + 1;
      qty(i)  = 0;
      sprintf(desc(i)," Item # %ld", i + 1);
   }

   /* print contents of the 50 array items */
   /* plus the ascii code of last char in v_desc */

   for (i = 0; i < 50; i++)
      printf("Element # %ld  Item = %d  Qty = %d Desc = %s  %d\n",
             i, item(i), qty(i), desc(i), (int) desc(i)[23]);

   /* close virtual array */

   close_v_array(item_array);
}






Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.