Extensible Hashing

NOV89: EXTENSIBLE HASHING

Steve Heller has been programming for more than 18 years and is the president of Chrysalis Software Corp., P.O. Box 0335, Baldwin, NY 11510. He can be reached through CompuServe: 71101, 1702.

Next to every DOS programmer's terminal must be a fortune cookie with a fortune that reads "640K is the mother of invention." And if you've ever had to manage large files of records, you've probably cracked this cookie more than once.

The goal of this article is to show how you can retrieve any record in a multimegabyte file with one disk access, and any record in any size file with a maximum of two accesses. KRAM.PAS, Listing One, page 116, (written using Turbo Pascal 5.0) allows any record in a file of up to approximately 1.4 Mbytes of user data to be retrieved with one disk access. But before we look at the code, let's examine the basic algorithm.

Extensible Hashing

Extensible hashing uses a record's key to compute a hash code. The first n bits of the hash code (n = 10 in KRAM.PAS) are used as an index into a table of block numbers. All records whose hash codes are the same in their first n bits are stored in the same block. For faster access, a possible optimization is to use the full hash code again to look up the record within the block. This is especially valuable when using large data blocks, as a significant amount of time can be consumed in the search for empty slots when adding records.

When a block is full, it must be split into two new blocks. The distribution of records between the two new blocks is based on one more bit of the full hash code than was used before the split. For example, if the hash code used to look up the record in the old block was 4 bits long, with the value 1011, then the first new block would contain records with hash codes of 10110, and the second new block would contain records with hash codes of 10111.

Index-in-Memory

There are two ways of extending the capacity of a KRAM file. For maximum speed and simplicity of programming, you can keep the entire index table in memory, writing it out to the KRAM file only when it is modified as blocks are split, as is done in KRAM.PAS. This "index-in-memory" approach, however, limits the maximum size of the files that you can use: For example, if you are willing to set aside about 140K bytes for the index and the data buffers, you could retrieve any record from a file of about 128 Mbytes in one access.

The other possibility is to keep the index table mostly on disk in the KRAM file, and read portions of it into memory as needed. This requires two disk accesses, one for the index and one to retrieve the data, but only 16K bytes of memory would suffice for any size file. Of course, as a compromise, you could keep a few of the most recently used index records in memory. This would reduce the number of accesses to the index if your references tend to be repetitious.

While discussing the code, I will mention other possible extensions and optimizations that were not included in the code, primarily because the listings would have become too large.

Data Structures

First, let's examine the data structures used to keep track of the current status of a KRAM file. The sizes of these structures are parameterized so that you can adapt them to your particular application. This is important primarily when you use the "index-in-memory" approach, as the maximum file size is then limited by the number of index entries and the size of each data block. The maximum data storage available in this case can be approximated by the formula:

  DATASIZE*INDEXCOUNT*.67;

DATASIZE refers to the number of bytes in a data block, and INDEXCOUNT refers to the number of entries in the index. (Note that INDEXCOUNT must be a power of two because it must have one entry for each possible hash code of the maximum length allowed.)

The constant .67 is an approximation of the packing factor or the proportion of the file that is occupied by data records assuming random keys. The exact value of the packing factor depends on the distribution of keys, but shouldn't vary much from this value.

For example, if you needed to store 100,000 100-byte records for a total data storage requirement of 10 Mbytes, you could set INDEXCOUNT to 8192 and DATASIZE to 2048, giving a maximum accessible file size of 16 Mbytes. This provides an approximate storage capacity of 11.2 Mbytes.

The memory requirement for the data and index buffers in the "index-in-memory" approach is:

  (2*INDEXCOUNT) + (3*DATASIZE);

as three data buffers are required when splitting a block. The example of 100,000 100-byte records would require 2*8192 + 3*2048, or 22K bytes of memory for the index and data buffers.

Initialization

KramInit is used to create a new KRAM file. Once created, KramInit initializes the first data block to all zeros, and sets all pointers in the index block to point to that first data block. KramInit also initializes the parameter block, which contains the data length, the key length, and the current high block number.

Once the file has been initialized, call KramOpen to open the file, read the parameter block and the index block into memory, and allocate space for the temporary data block. The declaration for the FileRec record type shows how the file information is associated with the file pointer.

Turbo Pascal 5.0 represents an untyped file as a record type, called "FileRec." The UserData field, however, is never accessed by Turbo Pascal, and is free for user-written routines to store data in. Therefore, I have redefined that field as:

  UserData: array [1 . . 4] OF pointer;

Currently, only the first element is used to store the address of the KramParam block for the file. Therefore, any number of KRAM files may be open at once, subject to system limitations on file handles, with only the file handle itself being passed to the KRAM routines.

Adding Records to the File

KramAdd is used to write records to the KRAM file. KramAdd returns TRUE if the record was added successfully (the key was not a duplicate of one in the file), and FALSE otherwise.

KramAdd first calls HashCode with the key value to calculate which index entry corresponds to the record to be stored. It then retrieves the block number pointed to by that index entry. If that block is not the one currently in memory, it must be retrieved from disk. A possible optimization here is to keep more than one data block in memory, discarding the least recently used block when its buffer is needed for another block.

The next task is to discover whether the key is a duplicate of one already in the block, and if not, whether there is an empty slot in the block to store the new record. If an empty slot is found (and there is no duplication), then the data is moved from the function arguments KeyValue and DataValue to that slot.

If no empty slot is available, and there is no duplication of keys, we must split this block to make room for the new record. The first subtask here is to scan the index table to determine which index entries are affected by the split (that is, which index entries point to this block). Note that if only one index entry points to this block, the block cannot be split because one entry cannot point to more than one block. Splitting a block under these conditions would cause one of the new blocks to be inaccessible.

To prevent this from occurring, page the index from disk, allowing it to be increased in size as needed. When using a paged index, you should also keep in each data block a record of the number of bits of the key that are used to access that block. This way, you won't have to page through a large part of the index table to find out which entries are affected by a split. For example, if a data block contains all records with a hash code starting with the 5 bits 11101, splitting it would affect all index entries that start with 11101. After determining that at least two entries point to this block, allocate another block to accommodate the overflow, and update the higher numbered half of the affected entries so they point to the new block. Next, allocate two temporary block buffers, LowDataPtr^ and HighDataPtr^ (so called because they will receive the records with lower- and higher-hash codes, respectively). Next, initialize them to zeros so they are ready to receive the records from the full block.

The distribution of records from the full block to the two temporary buffers depends on the updated entries in the index table. The key of each record is used to calculate the hash code for that record and the block number is looked up in the index table. If the block number in the table is the same as the block number of the full block, the record will end up in the old block when we are through because the old block number was kept for the first (lower) half of the updated index table entries.

For example, suppose that before block 12 became full, the relevant section of the index table looked like that shown in Figure 1.

Figure 1: The index table before a split

  Element  Value
  Number
----------------
  .   .   .
  100      12
  101      12
  102      12
  103      12
  104      12
  105      12
  106      12
  107      12
  .   .

If, after the split, the highest block number in use was 22, the table would look like that shown in Figure 2.

Figure 2: The index table after a split

  Element  Value
  Number
----------------
  .   .   .
  100      12
  101      12
  102      12
  103      12
  104      23
  105      23
  106      23
  107      23
  .   .

All records with hash codes from 100 to 103 would remain in block 12, and records with hash codes from 104 to 107 would be moved to block 23.

After the records have been moved to their new places in LowDataPtr^ and HighDataPtr^, the file is updated. First, the old block (now approximately half-empty) is written back to its old position. Next, the new block is added to the end of the file. The parameter and index buffers are then written back to the file to keep the high block number and index tables up-to-date. Finally, we return to the top of the WHILE loop to handle the insertion of the record, now that there is room in the block it belongs in.

The Rest of the Code

KramRead is basically the same as KramAdd (except that it doesn't have to deal with the splitting of blocks) and requires little explanation. KramClose does what its name implies; it closes the file, after making sure that anything that has been changed is written out to the file. Be sure to call this routine if you have added any records to the file. KramInfo is a procedure that you can call to find out the key and data lengths of a KRAM file.

HashCode takes the key of a record and returns a 32-bit hash code, from which the calling program extracts the number of bits it needs to access the index table. SeekBlock takes a data block number and positions the file to enable that block to be read or written.

The main program is a driver that illustrates how to call these routines. It will read an ASCII file of lines, each consisting of a string key and numeric data, optionally initializing and loading the KRAM file. Then it will retrieve all the records, checking that they all exist and have the same data in them as when they were created. The data I used for testing is the same as the record numbers of the keys; the test records were created by a small test data generating program, KRAMDATA.PAS (see Listing Two, page 121).

In many applications, it is necessary to be able to delete records from a KRAM file. The simplest way to do this is to zero out the key and data of the record you want to delete, so that KramAdd can reuse its slot. This would not work properly, however, if you decided to use hashing to speed up the lookup within a block. In that case, you could change the key to something that would not occur in your application, such as FFh. You would then change KramAdd to reuse a slot that had that value, and change the block-splitting function to ignore records that had that key. Thus, whenever a block was split, any deleted records that had not already been reused would disappear.

Conclusion

Lest you get the impression that KRAM files are the best possible organization for keyed access, I should mention two limitations of KRAM files, which make them unsuitable for certain applications. First, this is not an indexed sequential access method, but truly a keyed random access method. There is no convenient way of retrieving the records other than by supplying the exact key for the record you need.

The second limitation is that KRAM files are not very efficient in their use of disk storage. You can expect that a file containing 1 Mbyte of your data will occupy approximately 1.5 Mbytes on the disk. This is an unavoidable side effect of the dynamic storage allocation method that gives KRAM files their tremendous speed. If these limitations do not adversely affect your application, KRAM files are probably the best way of organizing data that requires random access by key.

By the way, a shareware version of KRAM is also available. For more information, contact Chrysalis Software Corporation, at the address at the beginning of this article. Registered users will receive an updated version of the program, incorporating all of the optimizations and extensions mentioned in this article.

_EXTENSIBLE HASHING_ by Steve Heller

[LISTING ONE]

<a name="023a_000e">

{KRAM.PAS - A Keyed Random Access Method for Turbo Pascal 5.0.}
{Copyright (c) 1989 by Chrysalis Software Corp.  All rights reserved.}
{Written by Steve Heller.}

{$V-}

program KramTest;

uses Crt;

const
  PARAMSIZE = 128; {the size of the parameter block, in bytes}

  DATASIZE = 2048; {the size of a data block, in bytes}

  INDEXCOUNT = 1024; {the number of index entries.  Must be a power of 2.}
  INDEXSIZE = INDEXCOUNT * Sizeof(integer); {the size of the index block}

type
  KramDataType = array[1..DATASIZE] of byte;
  KramDataPtr = ^KramDataType;
  KramIndexType = array[1..INDEXCOUNT] of integer;
  KramIndexPtr = ^KramIndexType;

{The variant record type below is used so that we can add new}
{parameters if needed, without changing the size of the parameter}
{block in the data file.}
  KramParamType = record
                    case integer of
                      0: (

{the items below are saved from one run to another}
                          KeyLength:integer;
                          DataLength:integer;
                          HighBlock:integer;
{the items above are saved from one run to another}

{the ones below are valid only during the current run}
                          CurrentBlock:integer;
                          BlockModified:boolean;
                          DataPtr:KramDataPtr;
                          IndexPtr:KramIndexPtr;
                         );
                      1: (Dummy:array[1..PARAMSIZE] of byte);
                  end;
  KramParamPtr = ^KramParamType;

  FileRec = RECORD
          Handle : word;
          Mode   : word;
          RecSize: word;
          Private: array [1..26] OF byte;

{note: this is a nonstandard declaration}
          UserData: array [1..4] OF pointer;

          Name   : array [0..79] OF char;
          END;




FUNCTION HashCode(Key : string):longint;

{Use this function to calculate a pseudo-random number based on the}
{value of its argument.  The result is used to determine which index}
{entry will be used to access the record with the given key.  The}
{algorithm used here is appropriate for ASCII key values, as it uses}
{the low five bits of each byte of the key.}

VAR
  i : integer;
  result : longint;
  temp1 : longint;
  temp2 : longint;
  bytetemp : byte;

BEGIN

  result := 0;

  FOR i := 1 TO length(Key) DO
    BEGIN
      temp1 := result shl 5;
      temp2 := result shr 27;
      result := temp1 or temp2;
      bytetemp := ord(Key[i]);
      result := result xor bytetemp;
    END;

  HashCode := result;

END;




PROCEDURE SeekBlock(VAR KramFile : file; BlockNum : integer);

{Use this procedure to position the file pointer to a particular data}
{block.  In order to do this, we must skip the parameter block and }
{the index block.  Also note that the first data block is #1.}

VAR
  BlockPosition : longint;

BEGIN

  BlockPosition := PARAMSIZE + INDEXSIZE + (DATASIZE * (BlockNum-1));
  Seek(KramFile,BlockPosition);

END;




PROCEDURE KramInit(FileName : string; KeyLength, DataLength: integer);

{Use this procedure to initialize a new KRAM file.  It sets up the}
{key length and the data length according to the input arguments.}
{The high data block number is set to 1 and data block #1 is}
{initialized to zeroes (an empty block).  The index block is set to}
{all 1's, so that all accesses will go to the empty data block.}

VAR
  KramFile:file;
  Index : KramIndexType;
  Data : KramDataType;
  Params : KramParamType;
  i : integer;


BEGIN

     Assign(KramFile,FileName);
     Rewrite(KramFile,1);

     FillChar(Params.Dummy,SizeOf(Params.Dummy),0);

     Params.KeyLength := KeyLength;
     Params.DataLength := DataLength;

{the highest data block number in use is #1}
     Params.HighBlock := 1;

     BlockWrite(KramFile,Params,SizeOf(Params));

{Initialize the index block to all 1's, as only data block #1 exists}

     FOR i := 1 TO INDEXCOUNT DO
       Index[i] := 1;

     BlockWrite(KramFile,Index,SizeOf(Index));

{Initialize the first data block to all zeroes}

     FOR i := 1 TO DATASIZE DO
       Data[i] := 0;

     BlockWrite(KramFile,Data,SizeOf(Data));

     Close(KramFile);

END;




PROCEDURE KramOpen(VAR KramFile:file;KramFileName:string);

{Use this procedure to open the file.  It reads the parameter block}
{and the index block from the beginning of the file and allocates}
{space for the data block.}

VAR
   RecsRead : integer;
   ParamPtr : KramParamPtr;

BEGIN

  Assign(KramFile,KramFileName);
  Reset(KramFile,1);

  New(ParamPtr);

  KramParamPtr(FileRec(KramFile).UserData[1]) := ParamPtr;

  BlockRead(KramFile,ParamPtr^,SizeOf(KramParamType),RecsRead);
  IF RecsRead <> SizeOf(KramParamType) THEN
     BEGIN
       WriteLn('Invalid KRAM file: ',KramFileName);
       Halt(1);
     END;

  New(ParamPtr^.IndexPtr);
  New(ParamPtr^.DataPtr);
  ParamPtr^.CurrentBlock := 0;

  BlockRead(KramFile,ParamPtr^.IndexPtr^,
  SizeOf(KramIndexType),RecsRead);

  IF RecsRead <> SizeOf(KramIndexType) THEN
     BEGIN
       WriteLn('Invalid KRAM file: ',KramFileName);
       Halt(1);
     END;

END;




PROCEDURE KramClose(var KramFile:file);

{Use this procedure to close the file after use. It also deallocates}
{the dynamic storage used for the parameter, index, and data blocks.}

{IMPORTANT NOTE: If you have modified the file, by adding records,}
{for example, you MUST close the file to ensure that all the records}
{have been written to the file.}

VAR
  ParamPtr : KramParamPtr;

BEGIN

  ParamPtr := KramParamPtr(FileRec(KramFile).UserData[1]);
  SeekBlock(KramFile,ParamPtr^.CurrentBlock);
  BlockWrite(KramFile,ParamPtr^.DataPtr^,DATASIZE);

  Dispose(ParamPtr^.DataPtr);
  Dispose(ParamPtr^.IndexPtr);
  Dispose(ParamPtr);

  Close(KramFile);

END;



FUNCTION KramAdd(VAR KramFile : file; KeyValue : string;
                  DataValue : string):boolean;

{Use this procedure to add records to a KRAM file that has been opened}
{by KramOpen.  You must supply the file pointer that was returned by}
{KramOpen in "KramFile",the key in "KeyValue", and the data in}
{"DataValue".  The algorithm is known as "extensible hashing".}

VAR
  ParamPtr : KramParamPtr;
  LowDataPtr : KramDataPtr;
  HighDataPtr : KramDataPtr;
  HashVal  : longint;
  BlockNumber : integer;
  TempBlockNumber : integer;
  RecordLength : integer;
  i,j,k    : integer;
  SlotFound: boolean;
  FoundFirst : boolean;
  FoundLast : boolean;
  FirstBlock : integer;
  LastBlock : integer;
  MidBlock : integer;
  KramFileName : string;
  TempKeyValue : string;
  OldOffset : integer;
  LowOffset : integer;
  HighOffset : integer;
  Duplicate : boolean;
  KeepLooking : boolean;

BEGIN

  ParamPtr := KramParamPtr(FileRec(KramFile).UserData[1]);

  KramFileName := FileRec(KramFile).Name;

  RecordLength := ParamPtr^.KeyLength + ParamPtr^.DataLength;

  REPEAT

    HashVal := HashCode(KeyValue) and (INDEXCOUNT - 1);

    BlockNumber := ParamPtr^.IndexPtr^[HashVal+1];

    IF ParamPtr^.CurrentBlock <> BlockNumber THEN
      BEGIN
        IF (ParamPtr^.BlockModified)
        and (ParamPtr^.CurrentBlock <> 0) THEN
          BEGIN
            ParamPtr^.BlockModified := FALSE;
            SeekBlock(KramFile,ParamPtr^.CurrentBlock);
            BlockWrite(KramFile,ParamPtr^.DataPtr^,DATASIZE);
          END;
        SeekBlock(KramFile,BlockNumber);
        BlockRead(KramFile,ParamPtr^.DataPtr^,DATASIZE);
        ParamPtr^.CurrentBlock := BlockNumber;
      END;


    Duplicate := FALSE;
    KeepLooking := TRUE;
    SlotFound := FALSE;
    i := 1;
    OldOffset := 1;

    {initialize length of temporary string used for comparison}
    TempKeyValue[0] := char(ParamPtr^.KeyLength);

    WHILE KeepLooking AND (i <= DATASIZE DIV RecordLength) DO

      BEGIN
        IF ParamPtr^.DataPtr^[OldOffset] = 0 THEN
          BEGIN
            KeepLooking := FALSE;
            SlotFound := TRUE;
          END
        ELSE
          BEGIN
            Move(ParamPtr^.DataPtr^[OldOffset],TempKeyValue[1],ParamPtr^.KeyLength);
            IF TempKeyValue = KeyValue THEN
              BEGIN
                Duplicate := TRUE;
                KeepLooking := FALSE;
              END
            ELSE
              BEGIN
                OldOffset := OldOffset + RecordLength;
                i := i + 1;
              END;
          END;
      END;

    IF SlotFound THEN
      BEGIN
        Move(KeyValue[1],ParamPtr^.DataPtr^[OldOffset],
        ParamPtr^.KeyLength);

        Move(DataValue[1],
        ParamPtr^.DataPtr^[OldOffset+ParamPtr^.KeyLength],
        ParamPtr^.DataLength);

        ParamPtr^.BlockModified := TRUE;
      END
    ELSE IF Duplicate = FALSE THEN
      BEGIN
{First we must determine how many index entries are affected}
{by the split}
        FoundFirst := FALSE;
        FoundLast := FALSE;
        LastBlock := INDEXCOUNT;
        FOR i := 1 TO INDEXCOUNT DO
          BEGIN
            IF (ParamPtr^.IndexPtr^[i] = BlockNumber)
            and (not FoundFirst) THEN
              BEGIN
                FirstBlock := i;
                FoundFirst := TRUE;
              END;
            IF (ParamPtr^.IndexPtr^[i] <> BlockNumber)
            and FoundFirst and (not FoundLast) THEN
              BEGIN
                LastBlock := i - 1;
                FoundLast := TRUE;
              END
          END;
        IF FirstBlock >= LastBlock THEN
{we are out of room}
          BEGIN
            WriteLn('KRAM file: ',KramFileName,' is full.');
            Halt(1);
          END;

{Now we have to allocate another block for the split.}
        ParamPtr^.HighBlock := ParamPtr^.HighBlock + 1;
        MidBlock := (FirstBlock + LastBlock - 1) DIV 2;

{We will assign the items that have the higher hash code}
{to the new block}
        FOR i := MidBlock+1 TO LastBlock DO
          ParamPtr^.IndexPtr^[i] := ParamPtr^.HighBlock;

{Now we have to go through all the items in the block to be split}
{and assign them to the old block or the new one according to their}
{new hash codes, 1 bit longer than the previous ones.  An extra}
{temporary block makes this easier.}
        New(LowDataPtr);
        New(HighDataPtr);

        FillChar(LowDataPtr^,SizeOf(LowDataPtr^),0);
        FillChar(HighDataPtr^,SizeOf(HighDataPtr^),0);

        OldOffset := 1;
        LowOffset := 1;
        HighOffset := 1;
        FOR i := 1 TO DATASIZE DIV RecordLength DO
          BEGIN
            Move(ParamPtr^.DataPtr^[OldOffset],TempKeyValue[1],
            ParamPtr^.KeyLength);

            TempKeyValue[0] := char(ParamPtr^.KeyLength);
            HashVal := HashCode(TempKeyValue) and (INDEXCOUNT - 1);
            TempBlockNumber := ParamPtr^.IndexPtr^[HashVal+1];
            IF TempBlockNumber = BlockNumber THEN
{send to the lower one}
              BEGIN
                Move(ParamPtr^.DataPtr^[OldOffset],
                LowDataPtr^[LowOffset],RecordLength);

                LowOffset := LowOffset + RecordLength;
              END
            ELSE
              BEGIN
                Move(ParamPtr^.DataPtr^[OldOffset],
                HighDataPtr^[HighOffset],RecordLength);

                HighOffset := HighOffset + RecordLength;
              END;
            OldOffset := OldOffset + RecordLength;
          END;

        SeekBlock(KramFile,BlockNumber);
        BlockWrite(KramFile,LowDataPtr^,DATASIZE);

        SeekBlock(KramFile,ParamPtr^.HighBlock);
        BlockWrite(KramFile,HighDataPtr^,DATASIZE);

        Dispose(LowDataPtr);
        Dispose(HighDataPtr);

{Make sure the same block isn't used the the next time we have to}
{expand the file.  Also note that the data block is out of date.}
        ParamPtr^.CurrentBlock := 0;
        Seek(KramFile,0);
        BlockWrite(KramFile,ParamPtr^,SizeOf(KramParamType));
        BlockWrite(KramFile,ParamPtr^.IndexPtr^,
        SizeOf(KramIndexType));


      END;

  UNTIL SlotFound or Duplicate;

  IF Duplicate THEN
    KramAdd := FALSE
  ELSE
    KramAdd := TRUE;

END;




FUNCTION KramRead(var KramFile:file; KeyValue: string;
                  var DataValue: string):boolean;

{Use this function to read records from a KRAM file that has been}
{opened by KramOpen.  You must supply the key in "KeyValue" and the}
{file pointer that was returned by KramOpen in "KramFile".  The data}
{stored under that key will be returned in "DataValue".  The return}
{value is TRUE if the record was found, and FALSE if it wasn't.}

VAR
  ParamPtr : KramParamPtr;
  HashVal  : longint;
  BlockNumber : integer;
  RecordLength : integer;
  i,j,k    : integer;
  SlotFound: boolean;
  KeepLooking : boolean;
  KramFileName : string;
  TempKeyValue : string;
  DataOffset : integer;

BEGIN

  ParamPtr := KramParamPtr(FileRec(KramFile).UserData[1]);

  KramFileName := FileRec(KramFile).Name;

  RecordLength := ParamPtr^.KeyLength + ParamPtr^.DataLength;

  HashVal := HashCode(KeyValue) and (INDEXCOUNT - 1);

  BlockNumber := ParamPtr^.IndexPtr^[HashVal+1];

  IF ParamPtr^.CurrentBlock <> BlockNumber THEN
    BEGIN
      IF (ParamPtr^.BlockModified)
      and (ParamPtr^.CurrentBlock <> 0) THEN
        BEGIN
          ParamPtr^.BlockModified := FALSE;
          SeekBlock(KramFile,ParamPtr^.CurrentBlock);
          BlockWrite(KramFile,ParamPtr^.DataPtr^,DATASIZE);
        END;
      SeekBlock(KramFile,BlockNumber);
      BlockRead(KramFile,ParamPtr^.DataPtr^,DATASIZE);
      ParamPtr^.CurrentBlock := BlockNumber;
    END;


    KeepLooking := TRUE;
    SlotFound := FALSE;
    i := 1;
    DataOffset := 1;

    {initialize length of temporary string used for comparison}
    TempKeyValue[0] := char(ParamPtr^.KeyLength);

    WHILE KeepLooking AND (i <= DATASIZE DIV RecordLength) DO
      BEGIN
        IF ParamPtr^.DataPtr^[DataOffset] = 0 THEN
          KeepLooking := FALSE
        ELSE
          BEGIN
            Move(ParamPtr^.DataPtr^[DataOffset],TempKeyValue[1],ParamPtr^.KeyLength);
            IF TempKeyValue = KeyValue THEN
              BEGIN
                SlotFound := TRUE;
                KeepLooking := FALSE;
              END
            ELSE
              BEGIN
                DataOffset := DataOffset + RecordLength;
                i := i + 1;
              END;
          END;
      END;


  IF SlotFound THEN
    BEGIN
      Move(ParamPtr^.DataPtr^[DataOffset+ParamPtr^.KeyLength],
      DataValue[1],ParamPtr^.DataLength);

      DataValue[0] := char(ParamPtr^.DataLength);
      KramRead := TRUE;
    END
  ELSE
    KramRead:= FALSE;

END;




PROCEDURE KramInfo(var KramFile:file; var KeyLength: integer;
                  var DataLength: integer);

{Use this procedure to get the key and data lengths from a KRAM file that has}
{been opened by KramOpen.  You must supply the file pointer that was returned}
{by KramOpen in "KramFile".}

VAR
  ParamPtr : KramParamPtr;

BEGIN

  ParamPtr := KramParamPtr(FileRec(KramFile).UserData[1]);

  KeyLength := ParamPtr^.KeyLength;

  DataLength := ParamPtr^.DataLength;

END;




VAR
  KramTestFile : file;
  FileName : string;
  DataFileName : string;
  KeyLength : integer;
  DataLength : integer;
  KramFile : file;
  Ok : boolean;
  InputFile : text;
  Key : string;
  RecNum : integer;
  TestRecNum : integer;
  RecNumStr : string;
  InputFileBuf : array [1..10240] of char;
  Create : string;
  Status : integer;
  ReadStatus : boolean;
  AddStatus : boolean;
  Temp : string;
  Data : string;
  TempKey : string;
  TempData : string;

BEGIN

     ClrScr;
   WriteLn('KRAM.PAS - A Keyed Random Access Method for Turbo Pascal 5.0.');
   WriteLn('Copyright (c) 1989 by Chrysalis Software Corp.  All rights reserved.');
   WriteLn('Written by Steve Heller.');
     WriteLn;
   WriteLn('This program is shareware: if you find it useful, you should');
   WriteLn('register it by sending a check in the amount of $39.95, ');
   WriteLn('payable to Chrysalis Software Corporation, to the following address');
   WriteLn;
   WriteLn('Chrysalis Software Corporation');
   WriteLn('P. O. Box 0335');
   WriteLn('Baldwin, NY 11510');
   WriteLn;
   WriteLn('Registered users will receive an updated version of the program,');
   WriteLn('incorporating all of the optimizations and extensions mentioned');
   WriteLn('in this article.');
   WriteLn;
   Write('Please hit ENTER to continue.');
   ReadLn(FileName);
   ClrScr;

     Write('Please enter the name of the data file to be used for input: ');
     ReadLn(DataFileName);

     Write('Please enter the name of the KRAM file: ');
     ReadLn(FileName);

     Write('Create the KRAM file (Y/N): ');
     ReadLn(Create);
     IF (Create[1] = 'Y') or (Create[1] = 'y') THEN
       BEGIN

         Write('Please enter the key length: ');
         ReadLn(KeyLength);
         Write('Please enter the data length: ');
         ReadLn(DataLength);

         Assign(InputFile,DataFileName);
         Reset(InputFile);
         SetTextBuf(InputFile,InputFileBuf);

         KramInit(FileName,KeyLength,DataLength);
         KramOpen(KramTestFile,FileName);

         RecNum := 0;
         AddStatus := TRUE;
         WHILE NOT EOF(InputFile) DO
           BEGIN
             IF RecNum Mod 10 = 0 THEN
               Write(RecNum:5);
             RecNum := RecNum + 1;
             ReadLn(InputFile,Temp);
             Key := Copy(Temp,1,KeyLength);
             Data := Copy(Temp,KeyLength+1,length(Temp));
             AddStatus := KramAdd(KramTestFile,Key,Data);
             IF AddStatus = FALSE THEN
               BEGIN
                 WriteLn;
                 WriteLn('Unable to add key: ',Key);
               END;
           END;

         WriteLn;
         Close(InputFile);
         KramClose(KramTestFile);
       END;

       Assign(InputFile,DataFileName);
       Reset(InputFile);
       SetTextBuf(InputFile,InputFileBuf);

       KramOpen(KramTestFile,FileName);

       KramInfo(KramTestFile,KeyLength,DataLength);

       RecNum := 0;
       WHILE NOT EOF(InputFile) DO
         BEGIN
           IF RecNum Mod 10 = 0 THEN
             Write(RecNum:5);
           RecNum := RecNum + 1;
           ReadLn(InputFile,Temp);

           Key := copy(Temp,1,KeyLength);
           TempData := copy(Temp,KeyLength+1,DataLength);

           ReadStatus := KramRead(KramTestFile,Key,Data);
           IF Not ReadStatus THEN
             BEGIN
               WriteLn;
               WriteLn('Key ',TempKey,' not found.');
             END;
           IF TempData <> Data THEN
             BEGIN
               WriteLn;
               WriteLn('Record number ',RecNum,' does not match');
             END;
         END;

       WriteLn;

       Close(InputFile);
       KramClose(KramTestFile);

END.




<a name="023a_000f"><a name="023a_000f">
<a name="023a_0010">

[LISTING TWO]

<a name="023a_0010">

{KRAMDATA.PAS - generates data for KRAM testing}
{890226:1245}

VAR
  s : String[255];
  t : Text;
  n : LongInt;
  i : LongInt;
  j : Integer;
  KeyLength : Integer;
  DataLength : Integer;
  FName   : String[80];

BEGIN

  Randomize;

  WriteLn;

  Write('Name of data file to be generated: ');
  ReadLn(FName);

  Write('Number of items to generate: ');
  ReadLn(n);

  Write('Length of Keys: ');
  ReadLn(KeyLength);

  Write('Length of Data: ');
  ReadLn(DataLength);

  Assign(t,Fname);
  Rewrite(t);

  FOR i := 1 TO N DO
    BEGIN
      IF i = 1000*int(i/1000) THEN Write(i:10);
      s := '';
      FOR j := 1 TO KeyLength DO
        s := s + chr(random(26)+65);
      Write(t,s);
      WriteLn(t,i:DataLength);
    END;

  Close(t);

END.

Database

Extensible Hashing

Extensible Hashing

Index-in-Memory

Data Structures

Initialization

Adding Records to the File

Figure 1: The index table before a split

Figure 2: The index table after a split

The Rest of the Code

Conclusion

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Extensible Hashing

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content