Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Parallel

Fast Sorting Using Large String Buffers


JUN91: FAST SORTING USING LARGE STRING BUFFERS

Dale is a programmer at AGC Corp., where he specializes in optimization and portability of Basic programs. He can be reached at 1001 AGC Drive, Cleveland, TN 37312.


Basic has a reputation for making it possible to get small programs up and running quickly. But on the other hand, the language has gained a reputation for limited functionality, particularly when it comes to pointers and large-scale memory management.

Most of what needs to be done regarding pointer types can be accomplished with ordinary 16- and 32-bit integers; when formal identification is required, variable names such as ptr.databuffer or ptr.indexbuffer will suffice.

As for memory management, the sorting routine described in this article provides additional functionality by using large, single-string buffers and integer variables as pointers to the buffers. The design objective of this sort routine was to read data from files or file keys, send it one record at a time to the sort, and begin retrieval when the time between the first record sent and the first record retrieved is reduced to the absolute minimum. The current version of the routine will accommodate more than 32,000 records; you can modify the routine to handle several million records by creating index buffers with a segment length of three instead of two.

This version is compatible with Microsoft Basic 4.0 (and up) where the binary file mode provides the advantage of not closing and reopening files when writing to them with different lengths. Porting to Basic implementations compatible with Basic 4.0 and up shouldn't be a problem; moving to older dialects will likely involve changes. (For instance, you may have to change deflng x to defsng x if long integers are not available, then change the affected clng, mod, and \ statements accordingly; use gotos without changing the visible structure of the code when block if/then/else isn't available; and make the main buffers [sbuf$, sndx$] the field variables in corresponding random file opens if binary file mode is not available.) Porting to other languages (especially C) may involve more work, but I've written the code in a format designed to minimize porting problems. For example, I use nonstring operations (such as memcpy) and buffers and other entities that allow maximum flexibility in moving memory blocks.

Sorting String Data

Most Basic programs that sort string data use string arrays to hold the data to be sorted. While individual array elements can be swapped quickly as the result of a comparison, other problems (multilevel sorting, array element assignment, writing arrays to disk, and so on) tend to slow the process to the point where assembler routines are invariably required for performance. The sort routine in Listing Two (page 95) takes a somewhat radical approach to the problem by combining several techniques into one.

The first step in building this routine was to create a large string buffer to hold all the sort data, where every n character is a data element including blank-space padding, then insert each data element in sequence after first shifting any greater values upward in the buffer with a mid$ command: mid$ (buffer$, x + n) = mid$(buffer$, x). This technique has two problems that led to the addition of an indexing buffer in the current routine. First, the mid$ command requires swap space in string memory equal to the size of the shift on the right side of the equal sign, which could double the memory requirement or slow the process down by having to shift in segments. Secondly, the time required to shift the buffer for each insertion was prohibitive.

More Details.

My solution was to write each data element into the main buffer (sbuf$) in sequential order while performing the aforementioned insert on the associated index buffer (sndx$). The Basic stringstack technique used here can best be described by analogy: Imagine a library with three trays of index cards (ordered by author, title, and subject) where each card contains the data and an exact physical location for the book. Now remove all data from the cards except the physical location pointer. This makes for a very compact index (in terms of the size of index data stored). The major disadvantage is that to find a specific title, a library patron will have to perform a binary search on an index tray and walk over to the shelves for each index card examined, just to make a comparison. Then the patron will have to determine whether to move up or down the index stack before making another trip to the bookshelves. Although impractical for library patrons, this approach works very well on computers because computers get the data record directly from the index pointer.

The third major technique used here is the group merge, where memory is not sufficient to sort all data in one pass and each group is dumped to disk files until the sort process is completed. The sort routine then loads one record from each group and outputs the lowest (highest if descending order) of the batch.

String Variables

Probably the biggest time-killer in most Basic programs is the creation of string variables, both real and virtual. Creation of real strings cannot be avoided, but when a string is recreated to the same length over and over instead of just being cleared, it wastes time and forces more garbage collection. The virtual strings, as I call them, do not waste as much time as the assigned strings because they disappear after the current execution statement and do not work their way down into the string heap. Mid$ commands, str$, chr$, and so forth, all create virtual strings; when the number of incidents is very large, we have, in certain cases, an opportunity to save some time.

More Details.

The first technique is to create a substitute for the Basic function chr$. The char$ array in Listing One (page 94) requires about 800 bytes of memory, and once created (see Listing Two), a call to char$ should be about four times faster than a call to chr$.

The second technique is not available to current versions of Basic, but because it is available in a relatively inexpensive replacement library (see "Declarations for the Sample Program"), I think it worthwhile to describe here. The midchar function gets the ASCII value of any character in a string without having Basic create the single mid$ character first, and does it about five times faster if linking with Crescent Software's PDQ replacement library.

The third technique used in this sort is based on the second, and converts 16-bit integer values from string to numeric - cvi(mid$(xxx$, midpos, 2))--without having Basic create the mid$ characters; get the midchar of the first byte and add the midchar of the second byte after multiplying it by 256. This third technique is approximately three times faster than the cvi(mid$(...function. For maximum portability, I keep a set of these replaceable routines handy to append to Basic programs when compiling in situations where PDQ and other proprietary add-ons are not available or applicable.

Declarations for the Sample Sort Program

The sample program in Listing One (which call the sort in Listing Two) has four main functions: send data to sort, create external index (optional), resort from workfile (optional), and retrieve data from sort. From the top of the listing, you will note the midchar declaration. The mid$ call it replaces could be ignored in the code without requiring a separate function, but when the function (see the end of the listing) is eliminated and the program is compiled with Crescent Software's PDQ library, the speed increase is substantial, so I keep in this format for maximum portability.

The next bit of code is the DIM line; the subscripts equal to 10 may be raised or lowered if desired, and the subscript of 100 may have to be increased if the number of sort groups could be greater than 100. As an example, if the length of each sort string (sdat$) were 100 bytes, and the maximum sort buffer (sbuf$) size were 30,000, only 300 records could be held in memory in a single sort group. If the total number of records to be sorted were more than 30,000, more than 100 groups would be required, and the aforementioned subscript would have to be increased, although this would be an extreme case. The subscript of 255 for char$ cannot be lessened because it is a replacement for the Basic chr$ function, and requires all 256 characters of the ASCII collating set.

Next, I used common shared to pass a group of variables to the main sorting routine (Listing Two), but you will want to employ a named common block in real applications so that the variables are not passed when chaining. Most of the variable declarations below the common lines are required. The same list, along with a few others, can be found at the top of Listing Two. The three lines marked with an asterisk at the right side are the only lines that required the programmer's attention, the sort-string length, and the integer masking.

Two of the data sections (sortdata1 and sortdata3) are examples only and are usually provided to the executing program by configuration files or by a data dictionary. The data in sortdata2 is actual sort data and will generally come from database files and the like. The remaining information for configuring a sort can be found at the top of Listing Two.

--D.T.

Technical Details of the Sort Subprogram

The sortsq variable in Listing Two indicates the order of output (ascending/ descending), and resets the output pointer on each retrieval. nvflag is used to minimize the character inversions for ascending/descending sequences, and when nvflag matches the iseq() flag for a sort data segment, the characters in that segment are inverted (subtracted from 255) to facilitate the straightforward string comparisons that are the heart of the routine (see the fillproc subroutine). For reasons that pertain to Basic's internal structure, this inversion method, along with the swapping of bytes in 16-bit integer string, provides better performance than a multistage comparison where each segment is compared in turn, and the comparison terminates early if a difference is found. This advantages applies even when the char$ and midchar techniques are not included.

One place in the code where you will notice some cryptic and mostly undocumented variables is about 85 lines after the beginning line sub nsort...); ixx1, ixx2, and so on. These variables were built from a complex set of sort-group initialization parameters, and they represent the only efficient means I could find to put the intended idea into code. If this had to be rewritten, I'd recommend finding another algorithm and avoiding any modifications here.

The calls to memfree (25, 70, and 72 lines from the beginning) specify a byte exclusion value as the first parameter. For reasons I could not deduce, older versions of Basic seemed to want at least 4000 bytes overhead in the string heap beyond any amount calculated for temporary (virtual) strings and so on in order to prevent a string space corrupt error. Other Basic versions may not have this limitation, but on the other hand, they have the memory to spare, so I prefer to stay with the full exclusion for maximum portability. --D.T.


_FAST SORTING USING LARGE STRING BUFFERS_
by Dale Thorn


[LISTING ONE]
<a name="0150_000b">

'===========================================================================
'NSORT.BAS  Sort/retrieve/index data; ascending/descending; mixed data types
'           By: Dale Thorn
'           Rev. 03/26/91
'===========================================================================
main:

defint a-w
deflng x
defsng y
defdbl z

declare function midchar(i$, i)  'use Basic function (listed) if PDQ not avail.

dim ibeg(10), ilen(10), iptx(100, 1), iseq(10), char$(255)

common shared compln, ddunit, grpptr, grptot, maxrcd, memndx, ndunit
common shared ndxgrp, ndxlen, nosegs, nvflag, offset, opcode, opinit
common shared outptr, outtot, rcdptr, rcdtot, recptr, sdunit, sortln
common shared sortsq, subtot, ibeg(), ilen(), iptx(), iseq(), char$()

compln = 0    'comparison length in sort data (sdat$); may be less than sortln
ddunit = 0    'file channel/unit number for index-building (opcode = -3)
grpptr = 0    'sort group record pointer/sort buffer pointer
grptot = 0    'internal sort group size
maxrcd = 0    'internal maximum sort group size
memndx = 0    'internal index-load flag
ndunit = 0    'file channel/unit number for sort index files
ndxgrp = 0    'internal index file group record counter
ndxlen = 0    'internal index file record size
nosegs = 0    'no. of sort segments in sdat$; total length of segments = compln
nvflag = 0    'internal optimization for least ascending/descending inversions
offset = 0    'internal group-to-record offset counter
opcode = 0    'sort operation (0 to -3)
opinit = 0    'internal sort operation data initialization flag
outptr = 0    'internal data output record pointer
outtot = 0    'internal data output record counter
rcdptr = 0    'internal sort data record counter (all records)
rcdtot = 0    'internal sort data record total (final count)
recptr = 0    'internal sort data record counter (group records)
sdunit = 0    'file channel/unit number for sort data file (.sdx)
sortln = 0    'length of sort data buffer (sdat$); may be greater than compln
sortsq = 0    'internal sort sequence (ascending/descending) flag
subtot = 0    'internal partial group data record total (final count)

drcd$ = ""    'temp. sort data record buffer
nrcd$ = ""    'sort index file buffer
sbuf$ = ""    'main sort group memory buffer
sdat$ = ""    'main sort data record buffer
smsk$ = ""    'sort data mask (must be uppercased) [BBXXXBBXXXXXBB.....]
sndx$ = ""    'sort index-pointer memory buffer

'// NOTE: Any lines below with an asterisk (*) on the extreme /////
'         right will require a modification or replacement.   /////
'///////  Modification applies to DATA statements as well.    /////

sortln = 40                                          'total sort buffer length*
pfmt$ = space$(5)                    'output format buffer for integer strings
sdat$ = space$(sortln)                                'sort data record buffer

restore sortdata1                                'first tablespec to sort from
read sdunit, ndunit, ddunit       'file channel/unit numbers used by NSORT.SUB
read ibeg(0), ilen(0), iseq(0)               'test values from table sortdata1
nosegs = 0                              'initialize total no. of sort segments
while ibeg(0)                   'begin loop to load segment pointers and flags
   nosegs = nosegs + 1                          'increment total sort segments
   ibeg(nosegs) = ibeg(0)              'segment begin pointer for sdat$ buffer
   ilen(nosegs) = ilen(0)                                      'segment length
   iseq(nosegs) = iseq(0)        'segment sort sequence (ascending/descending)
   compln = compln + ilen(0)                        'total sort compare length
   read ibeg(0), ilen(0), iseq(0)                'read next set of test values
wend
smsk$ = string$(compln, "X") 'allocate masking buffer (default type=character)
mid$(smsk$, 21) = "BB"                            '"binary" position specified*
mid$(smsk$, 33) = "BB"                            '"binary" position specified*

restore sortdata2                                      'sample sort data table
opcode = 0                'set flag to add records to sort (initial operation)
nrcds  = 0                                'number of records added to the sort
do                                    'begin loop to read data and add to sort
   segptr = 1                          'set segment position pointer for sdat$
   lset sdat$ = ""                'clear the sort data buffer prior to loading
   for segno = 1 to nosegs               'begin loop to load each data segment
       read segdata$                   'read data segment from table sortdata2
       if len(segdata$) = 0 then exit do   'exit read-data loop at end-of-data
       if midchar(smsk$, segptr) = 66 then        '16-bit integer <BB> segment
          mid$(sdat$, segptr) = mki$(val(segdata$))   'convert data to integer
       else                                        'character <XX....> segment
          mid$(sdat$, segptr) = segdata$ 'put character segment to sort buffer
       end if
       segptr = segptr + ilen(segno)       'increment segment position pointer
   next
   call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$)    'add record to sort
   nrcds  = nrcds + 1                         'total records added to the sort
loop

opcode = -3          'set flag to build an external index to the sortdata file
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$)     'build the index file

open "sortdata.ddx" for binary as #ddunit        'open the external index file
ddxrcd$ = space$(2)                                 'allocate the index buffer
for rcdno = 1 to nrcds        'begin loop to retrieve and display indexed data
    call fileio(ddunit, 2, clng(rcdno), ddxrcd$, 0)  'retrieve an index record
    call fileio(sdunit, sortln, clng(cvi(ddxrcd$)), sdat$, 0)   'retrieve data
    for segno = 1 to nosegs               'begin loop to display sort segments
        if midchar(smsk$, ibeg(segno)) = 66 then  '16-bit integer <BB> segment
           rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5)
           print pfmt$; " ";                               'print integer data
        else                                                'character segment
           print mid$(sdat$, ibeg(segno), ilen(segno)); " "; 'print char. data
        end if
    next
    print                                                'terminate print line
next
call killfile("sortdata.ddx", ddunit)           'index file closed and removed

restore sortdata3                                 'next tablespec to sort from
read ibeg(0), ilen(0), iseq(0)               'test values from table sortdata3
compln = 0                             'comparison length in sort data (sdat$)
nosegs = 0                              'initialize total no. of sort segments
while ibeg(0)                   'begin loop to load segment pointers and flags
   nosegs = nosegs + 1                          'increment total sort segments
   ibeg(nosegs) = ibeg(0)              'segment begin pointer for sdat$ buffer
   ilen(nosegs) = ilen(0)                                      'segment length
   iseq(nosegs) = iseq(0)        'segment sort sequence (ascending/descending)
   compln = compln + ilen(0)                        'total sort compare length
   read ibeg(0), ilen(0), iseq(0)                'read next set of test values
wend

opcode = -1                   'set flag to resort data from existing sort file
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$)          'resort the data

opcode = -2          'set flag to retrieve records from sort (final operation)
call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'retrieve 1st data record
while len(sdat$)                              'begin loop to display sort data
   for segno = 1 to nosegs                'begin loop to display sort segments
       if midchar(smsk$, ibeg(segno)) = 66 then   '16-bit integer <BB> segment
          rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5)
          print pfmt$; " ";                                'print integer data
       else                                                 'character segment
          print mid$(sdat$, ibeg(segno), ilen(segno)); " ";  'print char. data
       end if
   next
   print                                                 'terminate print line
   call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$)  'retrieve next record
wend

close                                                         'close all files
system                                                          'return to DOS

'---------------------------------------------------------------------------
sortdata1:                'initial sort specifications
'---------------------------------------------------------------------------

'_____datafile____indexfile____buildfile      :'File channel/unit numbers;
data        1,           2,           3       :'may be found using FREEFILE


'_____segbegin____seglength____segsequence    :'Segment begin pointers, lengths
data        1,          20,             1     :'and sort sequences for sort
data       21,           2,            -1     :'data buffer (sdat$).
data       23,          10,             1     :'  sequence =  1; ascending
data       33,           2,            -1     :'  sequence = -1; descending
data       35,           6,             1     :'
data        0,           0,             0     :'end-of-data markers

'---------------------------------------------------------------------------
sortdata2:                   'example sort data
'---------------------------------------------------------------------------

'_______Alpha data, len=20______Num.(2)______Alpha (10)____Num.(2)____Alpha (6)
data  "Petrol Chemicals Ltd",    "3576",    "London SW3",    "588",   "A23456"
data  "Associated Factories",     "112",    "Richmond",     "1313",   "XNA"
data  "Dale's Containers",      "12343",    "Devonshire",     "55",   "DALE"
data  "",                            "",    "",                 "",   ""

'---------------------------------------------------------------------------
sortdata3:        'specifications for alternate sorting order
'---------------------------------------------------------------------------

'_____segbegin____seglength____segsequence    :'Segment begin pointers, lengths
data       33,           2,             1     :'and sort sequences for sort
data        1,          10,             1     :'data buffer (sdat$).
data        0,           0,             0     :'end-of-data markers

function midchar (i$, i) static   'find ASCII value of a single character in i$
   midchar = asc(mid$(i$, i, 1))                             'set midchar value
end function                                         'return to calling program

rem $include: 'nsort.sub'




<a name="0150_000c">
<a name="0150_000d">
[LISTING TWO]
<a name="0150_000d">


'===========================================================================
'NSORT.SUB  Sort/retrieve/index data; ascending/descending; mixed data types
'           By: Dale Thorn
'           Rev. 03/24/91
'---------------------------------------------------------------------------
' compln - comparison length in sort data (sdat$); may be less than sortln
' ddunit - file channel/unit number for index-building (opcode = -3)
' grpptr - sort group record pointer/sort buffer pointer
' grptot - internal sort group size
' maxrcd - internal maximum sort group size
' memndx - internal index-load flag
' ndunit - file channel/unit number for sort index files
' ndxgrp - internal index file group record counter
' ndxlen - internal index file record size
' nosegs - no. of sort segments in sdat$; total length of segments = compln
' nvflag - internal optimization for least ascending/descending data inversions
' offset - internal group-to-record offset counter
' opcode - sort operation (0 to -3)
' opinit - internal sort operation data initialization flag
' outptr - internal data output record pointer
' outtot - internal data output record counter
' rcdptr - internal sort data record counter (all records)
' rcdtot - internal sort data record total (final count)
' recptr - internal sort data record counter (group records)
' sdunit - file channel/unit number for sort data file (.sdx)
' sortln - length of sort data buffer (sdat$); may be greater than compln
' sortsq - internal sort sequence (ascending/descending) flag
' subtot - internal partial group data record total (final count)
'
' ibeg()  - segment begin pointers for sort data buffer (sdat$)
' ilen()  - segment length pointers for sort data buffer (sdat$)
' iptx()  - pointers used if merge-sort req'd. (set internally)
' iseq()  - segment sequence pointers for sort data buffer (sdat$)
'            1 = ascending;  -1 = descending
' char$() - high-performance substitute for Basic chr$() function
'
' drcd$  - temp. sort data record buffer (set to "" on first call)
' nrcd$  - sort index file buffer (set to "" on first call)
' sbuf$  - main sort group memory buffer (set to "" on first call)
' sdat$  - main sort data record buffer (set to actual value on first call)
' smsk$  - sort data mask (must be uppercased)
'            BB = integer string; XXX.... all other bytes
' sndx$  - sort index-pointer memory buffer (set to "" on first call)
'
'
' set opcode =  0 on first call to add records to sort.
' set opcode = -1 to resort data from existing sort work file (sortdata.sdx).
' set opcode = -2 on first call to retrieve records from sort.
' set opcode = -3 to build index file (sortdata.ddx).
'
' *** Notes:  opcode =  0 is always the first process (add records).
'             opcode = -1 may be set to resort data, but only following
'                         the creation of an index with opcode set to -3.
'             opcode = -2 may be set to retrieve records once all records
'                         have been added with opcode set to 0, or after
'                         a resort with opcode set to -1.  Once opcode is
'                         set to -2 and all records are retrieved, the
'                         sort routine is terminated and all sort memory
'                         is returned to the calling program.  If further
'                         sorting is required, begin anew with opcode = 0.
'             opcode = -3 may be set to build an index file following an
'                         initial sort with opcode set to 0, or a resort
'                         with opcode set to -1.  If more than 2 sorting
'                         sequences are required, where 2 or more index
'                         files are needed, rename each .ddx file to save it.
'                         The final sort sequence may be obtained using
'                         opcode = -2, and thus eliminate the need for a
'                         corresponding index file.  Each 2 bytes in the index
'                         file are a pointer to a record in the .sdx file.
'
'             For the first sort (opcode = 0), place all sort segments of sdat$
'             into the left part of sdat$ in sequential order (1, 2, 3, etc.).
'             When re-sorting using opcode = -1, segments may be in any order.
'             All data stored in sortdata.sdx will be in the original sequence.
'
'             ***** Important:  Minimum sort length is 2 bytes.
'             *****             If free memory is minimal, more sort groups may
'             *****              be needed, and dim iptx(nnn) may be too small.
'             *****             Each opcode process must be completed for all
'             *****              records before switching to another process.
'             *****             Use named common block if chaining programs.
'---------------------------------------------------------------------------
sub nsort (drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) static
    if opcode > -2 then                      'insert a record <add to the sort>
       if opinit mod 2 = 0 then               'first-sort-record initialization
          opinit = opinit - 1                       'adjust initialization flag
          sortsq = iseq(1)                             'primary output sequence
          nvflag = 0                                       'data inversion flag
          for segno = 1 to nosegs                    'build data inversion spec
              nvflag = nvflag + ilen(segno) * iseq(segno)  'bytes above/below 0
          next
          if nvflag < 0 then                       'data inversion optimization
             nvflag = 1                                'set inversion flag plus
          else
             nvflag = -1                              'set inversion flag minus
          end if                        '[see fillproc & writeproc subroutines]
          if nvflag = sortsq then sortsq = -sortsq     'primary output sequence
          call killfile("sortdata.ndx", ndunit)           'kill work index file
          open "sortdata.ndx" for binary as #ndunit       'open work index file
          if opcode = 0 then                   'initial (add records) operation
             call killfile("sortdata.sdx", sdunit)         'kill work data file
             open "sortdata.sdx" for binary as #sdunit     'open work data file
             drcd$ = space$(sortln)                 'temporary sort data buffer
             for ichr = 0 to 255               'create substitute character set
                 char$(ichr) = chr$(ichr) 'substitute for Basic chr$() function
             next
          end if
          call memfree(clng(4096), clng(195840), xfree)    'reserve 4 kb memory
          maxrcd = xfree \ (sortln + 4)       'maximum records per memory group
          if maxrcd > 32640 \ sortln then maxrcd = 32640 \ sortln  'buffer size
          sbuf$  = space$(maxrcd * sortln)               'main sort data buffer
          sndx$  = space$(maxrcd * 2 + 2)   'reorderable/shiftable index buffer
          rcdptr = 1                               'used to count total records
          recptr = 1                 'used to count records within a sort group
          grpptr = 1                                       'sort buffer pointer
       end if
       if opcode = -1 then                'resort from existing workfile (.sdx)
          ndxgrp = 0                               'total number of sort groups
          offset = 0                   'internal group-to-record offset counter
          while rcdptr <= rcdtot               'loop until all records are read
             call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'get sort data
             gosub putproc                    'add records in new sort sequence
          wend
       else                                         'original (insert) sequence
          gosub putproc                                    'add records to sort
       end if
    else                                   'retrieve a record or build an index
       offset = 0                               'group-to-record offset counter
       if opinit mod 2 then              'first retrieval record initialization
          opinit = opinit - 1                       'adjust initialization flag
          if opinit = -2 then              'first operation after original sort
             rcdtot = rcdptr - 1              'total records from original sort
             subtot = recptr - 1     'partial-group subtotal from original sort
          end if
          outptr = 1                         'beginning pointer for data output
          outtot = rcdtot                              'total records to output
          if ndxgrp then                            'sorting was done in groups
             gosub writeproc       'save data left over from previous operation
          else                                  'all sorting was done in memory
             maxrcd = rcdtot              'reset maximum records for file write
             ndxlen = maxrcd * 2                 'length of index data to write
             gosub writeproc                                    'save sort data
             ndxgrp = 0                        'reset index group count to zero
          end if
          sbuf$ = ""                            'erase buffer to reclaim memory
          sndx$ = ""                            'erase buffer to reclaim memory
          if ndxgrp then                                   'merge-sort required
             grplen = ndxlen                                    'group size * 2
             sbuf$  = space$(ndxgrp * sortln)  'buffer holds 1 record per group
             sndx$  = space$(ndxgrp * 2 + 2)   'buffer holds 1 record per group
          end if
          if opcode = -3 then                     'build index from sorted data
             call memfree(clng(6144), clng(32640), xfree) 'reserve 2kb for .ddx
          else        'normal retrieval [return each record to calling program]
             call memfree(clng(4096), clng(32640), xfree)  'reserve normal 4 kb
          end if
          xsize  = clng(outtot) * 2                          'total records * 2
          memndx = (xsize <= 32640 and xsize <= xfree)    'index-in-memory flag
          if memndx then               'retrieval index fits entirely in memory
             ndxlen = xsize                 'buffer length is index file length
          else                          'retrieval index does not fit in memory
             ndxlen = 2                 'buffer length is 16-bit integer length
          end if
          nrcd$ = space$(ndxlen)                    'allocate index file buffer
          if memndx then call fileio(ndunit, ndxlen, clng(1), nrcd$, 0)'fill it
          if ndxgrp then                             'merge-sort initialization
             ixx1 = (sortsq > 0)                  'used locally to shorten line
             ixx2 = (sortsq < 0)                  'used locally to shorten line
             ixx3 = (memndx and ixx1)             'used locally to shorten line
             ixx4 = (memndx and ixx2)             'used locally to shorten line
             iyy1 = 1 - memndx                    'used locally to shorten line
             iyy2 = grplen \ (1 - not memndx)     'used locally to shorten line
             for recptr = 1 to ndxgrp               'loop thru each index group
                 grpptr = recptr                     'sort group record pointer
                 iyy3   = (grptot - subtot) * (ixx2 and (recptr = ndxgrp))
                 iyy4   = (grptot - subtot) * (ixx1 and (recptr = ndxgrp))
                 ircd   = (recptr + ixx1) * iyy2 + iyy3 * iyy1 + ixx4 - ixx1
                 ircx   = (recptr + ixx2) * iyy2 + iyy4 * iyy1 + ixx3 - ixx2
                 if memndx then           'get index pointer from memory buffer
                    ichr   = midchar(nrcd$, ircd + 1) * 256 'high byte of index
                    rcdptr = midchar(nrcd$, ircd) + ichr  'same as cvi(mid$(...
                 else                              'get index pointer from file
                    call fileio(ndunit, ndxlen, clng(ircd), nrcd$, 0)
                    rcdptr = cvi(nrcd$)           'set pointer to retrieve data
                 end if
                 call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0)  'get data
                 gosub fillproc    'add 1 record from each sort group to buffer
                 iptx(recptr, 0) = ircd  'begin ptr.to load ndx.rcd. from group
                 iptx(recptr, 1) = ircx    'end ptr.to load ndx.rcd. from group
             next
             recptr = ndxgrp              'reset groups-pointer to begin output
             if sortsq < 0 then outptr = recptr  'begin output in reverse order
          else                               'non-merge; all output from memory
             if sortsq < 0 then outptr = outtot  'begin output in reverse order
          end if
       end if
       if opcode = -3 then                        'build index from sorted data
          call killfile("sortdata.ddx", ddunit)           'kill user index file
          open "sortdata.ddx" for binary as #ddunit       'open user index file
          ddxrcd$ = space$(2048)             'collection buffer for index-build
          filptr  = 0           'record pointer for writing .ddx buffer to file
          ddxptr  = 1        'buffer pointer for adding index values to ddxrcd$
          gosub getproc                                 'get first index record
          while not closed       'retrieve index pointers and save to .ddx file
             mid$(ddxrcd$, ddxptr) = mki$(rcdptr)    'copy index to .ddx buffer
             ddxptr = ddxptr + 2                      'increment buffer pointer
             if ddxptr > 2048 then               'write a group of data to file
                filptr = filptr + 1                     'increment file pointer
                call fileio(ddunit, 2048, clng(filptr), ddxrcd$, -1)  'put data
                ddxptr = 1         'reset buffer pointer to beginning of buffer
             end if
             gosub getproc                              'get next index records
          wend
          if ddxptr > 1 then                      'save leftover index pointers
             call fileio(ddunit, 2048, clng(filptr + 1), ddxrcd$, -1) 'put data
          end if
          close #ddunit                                    'close the .ddx file
          ddxrcd$ = ""                         'reclaim memory from .ddx buffer
       else        'retrieve a single sort record and return to calling program
          gosub getproc                                   'get a record pointer
          if not closed then    'retrieval OK as long as more records available
             call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'retrieve data
          end if
       end if
       if closed then                                'retrieval/index completed
          if opcode = -2 then         'final (single-record retrieval) sequence
             call killfile("sortdata.ndx", ndunit)    'kill sort index workfile
             call killfile("sortdata.sdx", sdunit)         'kill sort data file
             sdat$ = ""                                  'kill sort data buffer
          end if
          nrcd$ = ""                                    'kill index file buffer
          sbuf$ = ""                               'kill main sort group buffer
          sndx$ = ""                                    'kill sort index buffer
       end if
    end if
    exit sub                                         'return to calling program
    '----------------------------------------------------------------------
    fillproc:          'put sort data into sbuf$, sndx$
    '----------------------------------------------------------------------
    if opcode = 0 then lset drcd$ = sdat$            'load all segments at once
    iptr = 1                                    'initialize work buffer pointer
    for segno = 1 to nosegs    'load segments into work buffer and/or do invert
        if midchar(smsk$, ibeg(segno)) = 66 then 'invert 16-bit integer strings
           ichr = midchar(sdat$, ibeg(segno))       'save first byte, then swap
           mid$(drcd$, iptr) = char$(midchar(sdat$, ibeg(segno) + 1)) '2nd byte
           mid$(drcd$, iptr + 1) = char$(ichr)    'put 1st byte in 2nd position
        else                              'non-integer (character) sort segment
           if opcode then       'segments not in original (contiguous) sequence
              mid$(drcd$, iptr) = mid$(sdat$, ibeg(segno), ilen(segno))
           end if           'insert each sort segment into temp. buffer [above]
        end if
        if iseq(segno) = nvflag then   'invert data for ascend/descend sequence
           for ichr = iptr to iptr + ilen(segno) - 1   'do each byte in segment
               mid$(drcd$, ichr) = char$(255 - midchar(drcd$, ichr))
           next                'data will be re-inverted before writing to file
        end if
        iptr = iptr + ilen(segno)        'increment work buffer segment pointer
    next                          'begin binary search for sort compare [below]
    topptr = recptr                               'set top end of binary search
    lowptr = 0                                    'set low end of binary search
    while topptr - lowptr > 1  'search work data buffer using work index buffer
       midptr = lowptr + (topptr - lowptr) \ 2       'set mid point for compare
       ichx   = midptr * 2       'mid-position incorporating 16-bit index width
       ichr   = midchar(sndx$, ichx) * 256            'same as cvi(mid$(.....))
       iptr   = (midchar(sndx$, ichx - 1) + ichr - offset - 1) * sortln   'mid-
       if left$(drcd$, compln) <= mid$(sbuf$, iptr + 1, compln) then '-buff.pos
          topptr = midptr                                    'move search lower
       else            'sort record value > compare value in sort memory buffer
          lowptr = midptr                                   'move search higher
       end if
    wend
    iptr = topptr * 2 - 1                'current index-"stack" insert position
    mid$(sbuf$, (grpptr - 1) * sortln + 1) = drcd$   'write sort data to buffer
    mid$(sndx$, iptr + 2) = mid$(sndx$, iptr, (recptr - topptr) * 2) 'shift ndx
    mid$(sndx$, iptr) = mki$(grpptr + offset)   'write current pointer to index
    return                                           'return to calling routine
    '-----------------------------------------------------------------------
    getproc:            'retrieve a record from the sort
    '-----------------------------------------------------------------------
    if ndxgrp then                            'merge-retrieval from sort groups
       if recptr then                         'sort records are still available
          ichr   = outptr * 2         'mid-position based on 16-bit index width
          grpptr = midchar(sndx$, ichr - 1) + midchar(sndx$, ichr) * 256
          if memndx then             'get group pointer from work index [above]
             ichr   = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 'get record ptr
             rcdptr = midchar(nrcd$, iptx(grpptr, 0)) + ichr 'from memory-index
          else                              'get record pointer from index file
             call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0)
             rcdptr = cvi(nrcd$)              'nrcd$ is a 16-bit integer record
          end if
          if sortsq > 0 then mid$(sndx$, 1) = mid$(sndx$, 3)  'shift work index
          if iptx(grpptr, 0) = iptx(grpptr, 1) then       'end of group reached
             recptr = recptr - 1                 'decrement group stack pointer
             if sortsq < 0 then outptr = recptr    'set output pointer if appl.
          else                                    'end of group not yet reached
             iptx(grpptr, 0) = iptx(grpptr, 0) + (1 - memndx) * sortsq'move ptr
             if memndx then      'get a data record using a pointer from memory
                ichr = midchar(nrcd$, iptx(grpptr, 0))  'get the record pointer
                ichx = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 '..from memory
                call fileio(sdunit, sortln, clng(ichr + ichx), sdat$, 0)
             else        'get a data record using a pointer from the index file
                call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0)
                call fileio(sdunit, sortln, clng(cvi(nrcd$)), sdat$, 0)
             end if
             gosub fillproc              'add the data record to the merge-sort
          end if
          closed = 0                              'retrieval process not closed
       else                                          'no more records available
          closed = not 0                              'retrieval process closed
       end if
    else                       'non-merge sort retrieval; all data is in memory
       if outtot then                         'sort records are still available
          ichr   = outptr * 2         'mid-position based on 16-bit index width
          rcdptr = midchar(nrcd$, ichr - 1) + midchar(nrcd$, ichr) * 256
          outptr = outptr + sortsq        'increment or decrement index pointer
          outtot = outtot - 1                      'decrement remaining records
          closed = 0                              'retrieval process not closed
       else                                          'no more records available
          closed = not 0                              'retrieval process closed
       end if
    end if
    return                                           'return to calling routine
    '----------------------------------------------------------------------
    putproc:               'add a record to the sort
    '----------------------------------------------------------------------
    if recptr > maxrcd then                  'too many records to fit in memory
       if ndxgrp = 0 then        'first group; initialize index group variables
          grptot = recptr - 1                      'number of records per group
          ndxlen = grptot * 2                        'size of index file buffer
       end if
       gosub writeproc                         'save data group and index group
       offset = rcdptr - 1                      'group-to-record offset counter
       recptr = 1                                   'reset group record counter
       grpptr = 1                                          'sort buffer pointer
    end if
    gosub fillproc                                  'add current record to sort
    rcdptr = rcdptr + 1                        'increment total records counter
    recptr = recptr + 1                         'increment group record counter
    grpptr = recptr                                        'sort buffer pointer
    return                                           'return to calling routine
    '-----------------------------------------------------------------------
    writeproc:        'write index and sort data to files
    '-----------------------------------------------------------------------
    ndxgrp = ndxgrp + 1                       'increment the index group number
    call fileio(ndunit, ndxlen, clng(ndxgrp), left$(sndx$, ndxlen), -1)
    if opinit > -3 then         'initial sequences; save sort data to .sdx file
       for iptr = 0 to (maxrcd - 1) * sortln step sortln  'loop thru mem.buffer
           for segno = 1 to nosegs               're-invert data as appropriate
               iptz = iptr + ibeg(segno)      'sort group memory buffer pointer
               if midchar(smsk$, ibeg(segno)) = 66 then  'invert integer string
                  ichr = midchar(sbuf$, iptz)       'save first byte, then swap
                  mid$(sbuf$, iptz) = char$(midchar(sbuf$, iptz + 1)) '2nd byte
                  mid$(sbuf$, iptz + 1) = char$(ichr) 'put 1st byte in 2nd pos.
               end if
               if iseq(segno) = nvflag then 'invert data for ascend/descend seq
                  for ichr = iptz to iptz + ilen(segno) - 1   'invert each byte
                      mid$(sbuf$, ichr) = char$(255 - midchar(sbuf$, ichr))
                  next
               end if
           next
       next
       sdxlen = maxrcd * sortln                    'size of group memory buffer
       xflptr = lof(sdunit) \ sdxlen + 1                 'current data "record"
       call fileio(sdunit, sdxlen, xflptr, sbuf$, -1)   'put data group to file
    end if
    return
end sub                                              'return to calling program

sub fileio (fcno, flen, xrec, fbuf$, fopr) static         'read/write file data
   'int fcno                                             'file unit/channel no.
   'int flen                         '"record" length used for positioning only
   'int fopr                                           '0 = read; non-0 = write
   'long xrec                                          'logical "record" number
   'char fbuf$                                          'read/write data buffer
    xpos = (xrec - 1) * flen + 1                'absolute byte position in file
    if fopr then                                             'operation = write
       put #fcno, xpos, fbuf$                               'write data to file
    else                                                      'operation = read
       get #fcno, xpos, fbuf$                              'read data from file
    end if
end sub                                              'return to calling program

sub killfile (ffil$, fcno) static                              'kill a DOS file
   'int fcno                                             'file unit/channel no.
   'char ffil$                                                       'file name
    close #fcno                                             'close file if open
    open ffil$ for binary as #fcno                    'open file in binary mode
    close #fcno                                                 'close the file
    kill ffil$                                                   'kill the file
end sub                                              'return to calling program

sub memfree (xexc, xmax, xfree) static     'get max. free memory less exclusion
   'long xexc                              'amount of memory to reserve/exclude
   'long xmax                                  'upper limit for xfree (or zero)
    xfree = fre("") - xexc                    'total free memory less exclusion
    if xmax > 0 and xfree > xmax then xfree = xmax   'set maximum if applicable
end sub                                              'return to calling program


Copyright © 1991, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.