Dale is a programmer at AGC Corp., where he specializes in optimization and portability of Basic programs. He can be reached at 1001 AGC Drive, Cleveland, TN 37312.
Basic has a reputation for making it possible to get small programs up and running quickly. But on the other hand, the language has gained a reputation for limited functionality, particularly when it comes to pointers and large-scale memory management.
Most of what needs to be done regarding pointer types can be accomplished with ordinary 16- and 32-bit integers; when formal identification is required, variable names such as ptr.databuffer or ptr.indexbuffer will suffice.
As for memory management, the sorting routine described in this article provides additional functionality by using large, single-string buffers and integer variables as pointers to the buffers. The design objective of this sort routine was to read data from files or file keys, send it one record at a time to the sort, and begin retrieval when the time between the first record sent and the first record retrieved is reduced to the absolute minimum. The current version of the routine will accommodate more than 32,000 records; you can modify the routine to handle several million records by creating index buffers with a segment length of three instead of two.
This version is compatible with Microsoft Basic 4.0 (and up) where the binary file mode provides the advantage of not closing and reopening files when writing to them with different lengths. Porting to Basic implementations compatible with Basic 4.0 and up shouldn't be a problem; moving to older dialects will likely involve changes. (For instance, you may have to change deflng x to defsng x if long integers are not available, then change the affected clng, mod, and \ statements accordingly; use gotos without changing the visible structure of the code when block if/then/else isn't available; and make the main buffers [sbuf$, sndx$] the field variables in corresponding random file opens if binary file mode is not available.) Porting to other languages (especially C) may involve more work, but I've written the code in a format designed to minimize porting problems. For example, I use nonstring operations (such as memcpy) and buffers and other entities that allow maximum flexibility in moving memory blocks.
Sorting String Data
Most Basic programs that sort string data use string arrays to hold the data to be sorted. While individual array elements can be swapped quickly as the result of a comparison, other problems (multilevel sorting, array element assignment, writing arrays to disk, and so on) tend to slow the process to the point where assembler routines are invariably required for performance. The sort routine in Listing Two (page 95) takes a somewhat radical approach to the problem by combining several techniques into one.
The first step in building this routine was to create a large string buffer to hold all the sort data, where every n character is a data element including blank-space padding, then insert each data element in sequence after first shifting any greater values upward in the buffer with a mid$ command: mid$ (buffer$, x + n) = mid$(buffer$, x). This technique has two problems that led to the addition of an indexing buffer in the current routine. First, the mid$ command requires swap space in string memory equal to the size of the shift on the right side of the equal sign, which could double the memory requirement or slow the process down by having to shift in segments. Secondly, the time required to shift the buffer for each insertion was prohibitive.
My solution was to write each data element into the main buffer (sbuf$) in sequential order while performing the aforementioned insert on the associated index buffer (sndx$). The Basic stringstack technique used here can best be described by analogy: Imagine a library with three trays of index cards (ordered by author, title, and subject) where each card contains the data and an exact physical location for the book. Now remove all data from the cards except the physical location pointer. This makes for a very compact index (in terms of the size of index data stored). The major disadvantage is that to find a specific title, a library patron will have to perform a binary search on an index tray and walk over to the shelves for each index card examined, just to make a comparison. Then the patron will have to determine whether to move up or down the index stack before making another trip to the bookshelves. Although impractical for library patrons, this approach works very well on computers because computers get the data record directly from the index pointer.
The third major technique used here is the group merge, where memory is not sufficient to sort all data in one pass and each group is dumped to disk files until the sort process is completed. The sort routine then loads one record from each group and outputs the lowest (highest if descending order) of the batch.
String Variables
Probably the biggest time-killer in most Basic programs is the creation of string variables, both real and virtual. Creation of real strings cannot be avoided, but when a string is recreated to the same length over and over instead of just being cleared, it wastes time and forces more garbage collection. The virtual strings, as I call them, do not waste as much time as the assigned strings because they disappear after the current execution statement and do not work their way down into the string heap. Mid$ commands, str$, chr$, and so forth, all create virtual strings; when the number of incidents is very large, we have, in certain cases, an opportunity to save some time.
The first technique is to create a substitute for the Basic function chr$. The char$ array in Listing One (page 94) requires about 800 bytes of memory, and once created (see Listing Two), a call to char$ should be about four times faster than a call to chr$.
The second technique is not available to current versions of Basic, but because it is available in a relatively inexpensive replacement library (see "Declarations for the Sample Program"), I think it worthwhile to describe here. The midchar function gets the ASCII value of any character in a string without having Basic create the single mid$ character first, and does it about five times faster if linking with Crescent Software's PDQ replacement library.
The third technique used in this sort is based on the second, and converts 16-bit integer values from string to numeric - cvi(mid$(xxx$, midpos, 2))--without having Basic create the mid$ characters; get the midchar of the first byte and add the midchar of the second byte after multiplying it by 256. This third technique is approximately three times faster than the cvi(mid$(...function. For maximum portability, I keep a set of these replaceable routines handy to append to Basic programs when compiling in situations where PDQ and other proprietary add-ons are not available or applicable.
Declarations for the Sample Sort Program
The sample program in Listing One (which call the sort in Listing Two) has four main functions: send data to sort, create external index (optional), resort from workfile (optional), and retrieve data from sort. From the top of the listing, you will note the midchar declaration. The mid$ call it replaces could be ignored in the code without requiring a separate function, but when the function (see the end of the listing) is eliminated and the program is compiled with Crescent Software's PDQ library, the speed increase is substantial, so I keep in this format for maximum portability.
The next bit of code is the DIM line; the subscripts equal to 10 may be raised or lowered if desired, and the subscript of 100 may have to be increased if the number of sort groups could be greater than 100. As an example, if the length of each sort string (sdat$) were 100 bytes, and the maximum sort buffer (sbuf$) size were 30,000, only 300 records could be held in memory in a single sort group. If the total number of records to be sorted were more than 30,000, more than 100 groups would be required, and the aforementioned subscript would have to be increased, although this would be an extreme case. The subscript of 255 for char$ cannot be lessened because it is a replacement for the Basic chr$ function, and requires all 256 characters of the ASCII collating set.
Next, I used common shared to pass a group of variables to the main sorting routine (Listing Two), but you will want to employ a named common block in real applications so that the variables are not passed when chaining. Most of the variable declarations below the common lines are required. The same list, along with a few others, can be found at the top of Listing Two. The three lines marked with an asterisk at the right side are the only lines that required the programmer's attention, the sort-string length, and the integer masking.
Two of the data sections (sortdata1 and sortdata3) are examples only and are usually provided to the executing program by configuration files or by a data dictionary. The data in sortdata2 is actual sort data and will generally come from database files and the like. The remaining information for configuring a sort can be found at the top of Listing Two.
--D.T.
Technical Details of the Sort Subprogram
The sortsq variable in Listing Two indicates the order of output (ascending/ descending), and resets the output pointer on each retrieval. nvflag is used to minimize the character inversions for ascending/descending sequences, and when nvflag matches the iseq() flag for a sort data segment, the characters in that segment are inverted (subtracted from 255) to facilitate the straightforward string comparisons that are the heart of the routine (see the fillproc subroutine). For reasons that pertain to Basic's internal structure, this inversion method, along with the swapping of bytes in 16-bit integer string, provides better performance than a multistage comparison where each segment is compared in turn, and the comparison terminates early if a difference is found. This advantages applies even when the char$ and midchar techniques are not included.
One place in the code where you will notice some cryptic and mostly undocumented variables is about 85 lines after the beginning line sub nsort...); ixx1, ixx2, and so on. These variables were built from a complex set of sort-group initialization parameters, and they represent the only efficient means I could find to put the intended idea into code. If this had to be rewritten, I'd recommend finding another algorithm and avoiding any modifications here.
The calls to memfree (25, 70, and 72 lines from the beginning) specify a byte exclusion value as the first parameter. For reasons I could not deduce, older versions of Basic seemed to want at least 4000 bytes overhead in the string heap beyond any amount calculated for temporary (virtual) strings and so on in order to prevent a string space corrupt error. Other Basic versions may not have this limitation, but on the other hand, they have the memory to spare, so I prefer to stay with the full exclusion for maximum portability. --D.T.
_FAST SORTING USING LARGE STRING BUFFERS_ by Dale Thorn[LISTING ONE]
<a name="0150_000b"> '=========================================================================== 'NSORT.BAS Sort/retrieve/index data; ascending/descending; mixed data types ' By: Dale Thorn ' Rev. 03/26/91 '=========================================================================== main: defint a-w deflng x defsng y defdbl z declare function midchar(i$, i) 'use Basic function (listed) if PDQ not avail. dim ibeg(10), ilen(10), iptx(100, 1), iseq(10), char$(255) common shared compln, ddunit, grpptr, grptot, maxrcd, memndx, ndunit common shared ndxgrp, ndxlen, nosegs, nvflag, offset, opcode, opinit common shared outptr, outtot, rcdptr, rcdtot, recptr, sdunit, sortln common shared sortsq, subtot, ibeg(), ilen(), iptx(), iseq(), char$() compln = 0 'comparison length in sort data (sdat$); may be less than sortln ddunit = 0 'file channel/unit number for index-building (opcode = -3) grpptr = 0 'sort group record pointer/sort buffer pointer grptot = 0 'internal sort group size maxrcd = 0 'internal maximum sort group size memndx = 0 'internal index-load flag ndunit = 0 'file channel/unit number for sort index files ndxgrp = 0 'internal index file group record counter ndxlen = 0 'internal index file record size nosegs = 0 'no. of sort segments in sdat$; total length of segments = compln nvflag = 0 'internal optimization for least ascending/descending inversions offset = 0 'internal group-to-record offset counter opcode = 0 'sort operation (0 to -3) opinit = 0 'internal sort operation data initialization flag outptr = 0 'internal data output record pointer outtot = 0 'internal data output record counter rcdptr = 0 'internal sort data record counter (all records) rcdtot = 0 'internal sort data record total (final count) recptr = 0 'internal sort data record counter (group records) sdunit = 0 'file channel/unit number for sort data file (.sdx) sortln = 0 'length of sort data buffer (sdat$); may be greater than compln sortsq = 0 'internal sort sequence (ascending/descending) flag subtot = 0 'internal partial group data record total (final count) drcd$ = "" 'temp. sort data record buffer nrcd$ = "" 'sort index file buffer sbuf$ = "" 'main sort group memory buffer sdat$ = "" 'main sort data record buffer smsk$ = "" 'sort data mask (must be uppercased) [BBXXXBBXXXXXBB.....] sndx$ = "" 'sort index-pointer memory buffer '// NOTE: Any lines below with an asterisk (*) on the extreme ///// ' right will require a modification or replacement. ///// '/////// Modification applies to DATA statements as well. ///// sortln = 40 'total sort buffer length* pfmt$ = space$(5) 'output format buffer for integer strings sdat$ = space$(sortln) 'sort data record buffer restore sortdata1 'first tablespec to sort from read sdunit, ndunit, ddunit 'file channel/unit numbers used by NSORT.SUB read ibeg(0), ilen(0), iseq(0) 'test values from table sortdata1 nosegs = 0 'initialize total no. of sort segments while ibeg(0) 'begin loop to load segment pointers and flags nosegs = nosegs + 1 'increment total sort segments ibeg(nosegs) = ibeg(0) 'segment begin pointer for sdat$ buffer ilen(nosegs) = ilen(0) 'segment length iseq(nosegs) = iseq(0) 'segment sort sequence (ascending/descending) compln = compln + ilen(0) 'total sort compare length read ibeg(0), ilen(0), iseq(0) 'read next set of test values wend smsk$ = string$(compln, "X") 'allocate masking buffer (default type=character) mid$(smsk$, 21) = "BB" '"binary" position specified* mid$(smsk$, 33) = "BB" '"binary" position specified* restore sortdata2 'sample sort data table opcode = 0 'set flag to add records to sort (initial operation) nrcds = 0 'number of records added to the sort do 'begin loop to read data and add to sort segptr = 1 'set segment position pointer for sdat$ lset sdat$ = "" 'clear the sort data buffer prior to loading for segno = 1 to nosegs 'begin loop to load each data segment read segdata$ 'read data segment from table sortdata2 if len(segdata$) = 0 then exit do 'exit read-data loop at end-of-data if midchar(smsk$, segptr) = 66 then '16-bit integer <BB> segment mid$(sdat$, segptr) = mki$(val(segdata$)) 'convert data to integer else 'character <XX....> segment mid$(sdat$, segptr) = segdata$ 'put character segment to sort buffer end if segptr = segptr + ilen(segno) 'increment segment position pointer next call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'add record to sort nrcds = nrcds + 1 'total records added to the sort loop opcode = -3 'set flag to build an external index to the sortdata file call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'build the index file open "sortdata.ddx" for binary as #ddunit 'open the external index file ddxrcd$ = space$(2) 'allocate the index buffer for rcdno = 1 to nrcds 'begin loop to retrieve and display indexed data call fileio(ddunit, 2, clng(rcdno), ddxrcd$, 0) 'retrieve an index record call fileio(sdunit, sortln, clng(cvi(ddxrcd$)), sdat$, 0) 'retrieve data for segno = 1 to nosegs 'begin loop to display sort segments if midchar(smsk$, ibeg(segno)) = 66 then '16-bit integer <BB> segment rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5) print pfmt$; " "; 'print integer data else 'character segment print mid$(sdat$, ibeg(segno), ilen(segno)); " "; 'print char. data end if next print 'terminate print line next call killfile("sortdata.ddx", ddunit) 'index file closed and removed restore sortdata3 'next tablespec to sort from read ibeg(0), ilen(0), iseq(0) 'test values from table sortdata3 compln = 0 'comparison length in sort data (sdat$) nosegs = 0 'initialize total no. of sort segments while ibeg(0) 'begin loop to load segment pointers and flags nosegs = nosegs + 1 'increment total sort segments ibeg(nosegs) = ibeg(0) 'segment begin pointer for sdat$ buffer ilen(nosegs) = ilen(0) 'segment length iseq(nosegs) = iseq(0) 'segment sort sequence (ascending/descending) compln = compln + ilen(0) 'total sort compare length read ibeg(0), ilen(0), iseq(0) 'read next set of test values wend opcode = -1 'set flag to resort data from existing sort file call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'resort the data opcode = -2 'set flag to retrieve records from sort (final operation) call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'retrieve 1st data record while len(sdat$) 'begin loop to display sort data for segno = 1 to nosegs 'begin loop to display sort segments if midchar(smsk$, ibeg(segno)) = 66 then '16-bit integer <BB> segment rset pfmt$ = right$(str$(cvi(mid$(sdat$, ibeg(segno), 2))), 5) print pfmt$; " "; 'print integer data else 'character segment print mid$(sdat$, ibeg(segno), ilen(segno)); " "; 'print char. data end if next print 'terminate print line call nsort(drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) 'retrieve next record wend close 'close all files system 'return to DOS '--------------------------------------------------------------------------- sortdata1: 'initial sort specifications '--------------------------------------------------------------------------- '_____datafile____indexfile____buildfile :'File channel/unit numbers; data 1, 2, 3 :'may be found using FREEFILE '_____segbegin____seglength____segsequence :'Segment begin pointers, lengths data 1, 20, 1 :'and sort sequences for sort data 21, 2, -1 :'data buffer (sdat$). data 23, 10, 1 :' sequence = 1; ascending data 33, 2, -1 :' sequence = -1; descending data 35, 6, 1 :' data 0, 0, 0 :'end-of-data markers '--------------------------------------------------------------------------- sortdata2: 'example sort data '--------------------------------------------------------------------------- '_______Alpha data, len=20______Num.(2)______Alpha (10)____Num.(2)____Alpha (6) data "Petrol Chemicals Ltd", "3576", "London SW3", "588", "A23456" data "Associated Factories", "112", "Richmond", "1313", "XNA" data "Dale's Containers", "12343", "Devonshire", "55", "DALE" data "", "", "", "", "" '--------------------------------------------------------------------------- sortdata3: 'specifications for alternate sorting order '--------------------------------------------------------------------------- '_____segbegin____seglength____segsequence :'Segment begin pointers, lengths data 33, 2, 1 :'and sort sequences for sort data 1, 10, 1 :'data buffer (sdat$). data 0, 0, 0 :'end-of-data markers function midchar (i$, i) static 'find ASCII value of a single character in i$ midchar = asc(mid$(i$, i, 1)) 'set midchar value end function 'return to calling program rem $include: 'nsort.sub' <a name="0150_000c"> <a name="0150_000d">[LISTING TWO]
<a name="0150_000d"> '=========================================================================== 'NSORT.SUB Sort/retrieve/index data; ascending/descending; mixed data types ' By: Dale Thorn ' Rev. 03/24/91 '--------------------------------------------------------------------------- ' compln - comparison length in sort data (sdat$); may be less than sortln ' ddunit - file channel/unit number for index-building (opcode = -3) ' grpptr - sort group record pointer/sort buffer pointer ' grptot - internal sort group size ' maxrcd - internal maximum sort group size ' memndx - internal index-load flag ' ndunit - file channel/unit number for sort index files ' ndxgrp - internal index file group record counter ' ndxlen - internal index file record size ' nosegs - no. of sort segments in sdat$; total length of segments = compln ' nvflag - internal optimization for least ascending/descending data inversions ' offset - internal group-to-record offset counter ' opcode - sort operation (0 to -3) ' opinit - internal sort operation data initialization flag ' outptr - internal data output record pointer ' outtot - internal data output record counter ' rcdptr - internal sort data record counter (all records) ' rcdtot - internal sort data record total (final count) ' recptr - internal sort data record counter (group records) ' sdunit - file channel/unit number for sort data file (.sdx) ' sortln - length of sort data buffer (sdat$); may be greater than compln ' sortsq - internal sort sequence (ascending/descending) flag ' subtot - internal partial group data record total (final count) ' ' ibeg() - segment begin pointers for sort data buffer (sdat$) ' ilen() - segment length pointers for sort data buffer (sdat$) ' iptx() - pointers used if merge-sort req'd. (set internally) ' iseq() - segment sequence pointers for sort data buffer (sdat$) ' 1 = ascending; -1 = descending ' char$() - high-performance substitute for Basic chr$() function ' ' drcd$ - temp. sort data record buffer (set to "" on first call) ' nrcd$ - sort index file buffer (set to "" on first call) ' sbuf$ - main sort group memory buffer (set to "" on first call) ' sdat$ - main sort data record buffer (set to actual value on first call) ' smsk$ - sort data mask (must be uppercased) ' BB = integer string; XXX.... all other bytes ' sndx$ - sort index-pointer memory buffer (set to "" on first call) ' ' ' set opcode = 0 on first call to add records to sort. ' set opcode = -1 to resort data from existing sort work file (sortdata.sdx). ' set opcode = -2 on first call to retrieve records from sort. ' set opcode = -3 to build index file (sortdata.ddx). ' ' *** Notes: opcode = 0 is always the first process (add records). ' opcode = -1 may be set to resort data, but only following ' the creation of an index with opcode set to -3. ' opcode = -2 may be set to retrieve records once all records ' have been added with opcode set to 0, or after ' a resort with opcode set to -1. Once opcode is ' set to -2 and all records are retrieved, the ' sort routine is terminated and all sort memory ' is returned to the calling program. If further ' sorting is required, begin anew with opcode = 0. ' opcode = -3 may be set to build an index file following an ' initial sort with opcode set to 0, or a resort ' with opcode set to -1. If more than 2 sorting ' sequences are required, where 2 or more index ' files are needed, rename each .ddx file to save it. ' The final sort sequence may be obtained using ' opcode = -2, and thus eliminate the need for a ' corresponding index file. Each 2 bytes in the index ' file are a pointer to a record in the .sdx file. ' ' For the first sort (opcode = 0), place all sort segments of sdat$ ' into the left part of sdat$ in sequential order (1, 2, 3, etc.). ' When re-sorting using opcode = -1, segments may be in any order. ' All data stored in sortdata.sdx will be in the original sequence. ' ' ***** Important: Minimum sort length is 2 bytes. ' ***** If free memory is minimal, more sort groups may ' ***** be needed, and dim iptx(nnn) may be too small. ' ***** Each opcode process must be completed for all ' ***** records before switching to another process. ' ***** Use named common block if chaining programs. '--------------------------------------------------------------------------- sub nsort (drcd$, nrcd$, sbuf$, sdat$, smsk$, sndx$) static if opcode > -2 then 'insert a record <add to the sort> if opinit mod 2 = 0 then 'first-sort-record initialization opinit = opinit - 1 'adjust initialization flag sortsq = iseq(1) 'primary output sequence nvflag = 0 'data inversion flag for segno = 1 to nosegs 'build data inversion spec nvflag = nvflag + ilen(segno) * iseq(segno) 'bytes above/below 0 next if nvflag < 0 then 'data inversion optimization nvflag = 1 'set inversion flag plus else nvflag = -1 'set inversion flag minus end if '[see fillproc & writeproc subroutines] if nvflag = sortsq then sortsq = -sortsq 'primary output sequence call killfile("sortdata.ndx", ndunit) 'kill work index file open "sortdata.ndx" for binary as #ndunit 'open work index file if opcode = 0 then 'initial (add records) operation call killfile("sortdata.sdx", sdunit) 'kill work data file open "sortdata.sdx" for binary as #sdunit 'open work data file drcd$ = space$(sortln) 'temporary sort data buffer for ichr = 0 to 255 'create substitute character set char$(ichr) = chr$(ichr) 'substitute for Basic chr$() function next end if call memfree(clng(4096), clng(195840), xfree) 'reserve 4 kb memory maxrcd = xfree \ (sortln + 4) 'maximum records per memory group if maxrcd > 32640 \ sortln then maxrcd = 32640 \ sortln 'buffer size sbuf$ = space$(maxrcd * sortln) 'main sort data buffer sndx$ = space$(maxrcd * 2 + 2) 'reorderable/shiftable index buffer rcdptr = 1 'used to count total records recptr = 1 'used to count records within a sort group grpptr = 1 'sort buffer pointer end if if opcode = -1 then 'resort from existing workfile (.sdx) ndxgrp = 0 'total number of sort groups offset = 0 'internal group-to-record offset counter while rcdptr <= rcdtot 'loop until all records are read call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'get sort data gosub putproc 'add records in new sort sequence wend else 'original (insert) sequence gosub putproc 'add records to sort end if else 'retrieve a record or build an index offset = 0 'group-to-record offset counter if opinit mod 2 then 'first retrieval record initialization opinit = opinit - 1 'adjust initialization flag if opinit = -2 then 'first operation after original sort rcdtot = rcdptr - 1 'total records from original sort subtot = recptr - 1 'partial-group subtotal from original sort end if outptr = 1 'beginning pointer for data output outtot = rcdtot 'total records to output if ndxgrp then 'sorting was done in groups gosub writeproc 'save data left over from previous operation else 'all sorting was done in memory maxrcd = rcdtot 'reset maximum records for file write ndxlen = maxrcd * 2 'length of index data to write gosub writeproc 'save sort data ndxgrp = 0 'reset index group count to zero end if sbuf$ = "" 'erase buffer to reclaim memory sndx$ = "" 'erase buffer to reclaim memory if ndxgrp then 'merge-sort required grplen = ndxlen 'group size * 2 sbuf$ = space$(ndxgrp * sortln) 'buffer holds 1 record per group sndx$ = space$(ndxgrp * 2 + 2) 'buffer holds 1 record per group end if if opcode = -3 then 'build index from sorted data call memfree(clng(6144), clng(32640), xfree) 'reserve 2kb for .ddx else 'normal retrieval [return each record to calling program] call memfree(clng(4096), clng(32640), xfree) 'reserve normal 4 kb end if xsize = clng(outtot) * 2 'total records * 2 memndx = (xsize <= 32640 and xsize <= xfree) 'index-in-memory flag if memndx then 'retrieval index fits entirely in memory ndxlen = xsize 'buffer length is index file length else 'retrieval index does not fit in memory ndxlen = 2 'buffer length is 16-bit integer length end if nrcd$ = space$(ndxlen) 'allocate index file buffer if memndx then call fileio(ndunit, ndxlen, clng(1), nrcd$, 0)'fill it if ndxgrp then 'merge-sort initialization ixx1 = (sortsq > 0) 'used locally to shorten line ixx2 = (sortsq < 0) 'used locally to shorten line ixx3 = (memndx and ixx1) 'used locally to shorten line ixx4 = (memndx and ixx2) 'used locally to shorten line iyy1 = 1 - memndx 'used locally to shorten line iyy2 = grplen \ (1 - not memndx) 'used locally to shorten line for recptr = 1 to ndxgrp 'loop thru each index group grpptr = recptr 'sort group record pointer iyy3 = (grptot - subtot) * (ixx2 and (recptr = ndxgrp)) iyy4 = (grptot - subtot) * (ixx1 and (recptr = ndxgrp)) ircd = (recptr + ixx1) * iyy2 + iyy3 * iyy1 + ixx4 - ixx1 ircx = (recptr + ixx2) * iyy2 + iyy4 * iyy1 + ixx3 - ixx2 if memndx then 'get index pointer from memory buffer ichr = midchar(nrcd$, ircd + 1) * 256 'high byte of index rcdptr = midchar(nrcd$, ircd) + ichr 'same as cvi(mid$(... else 'get index pointer from file call fileio(ndunit, ndxlen, clng(ircd), nrcd$, 0) rcdptr = cvi(nrcd$) 'set pointer to retrieve data end if call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'get data gosub fillproc 'add 1 record from each sort group to buffer iptx(recptr, 0) = ircd 'begin ptr.to load ndx.rcd. from group iptx(recptr, 1) = ircx 'end ptr.to load ndx.rcd. from group next recptr = ndxgrp 'reset groups-pointer to begin output if sortsq < 0 then outptr = recptr 'begin output in reverse order else 'non-merge; all output from memory if sortsq < 0 then outptr = outtot 'begin output in reverse order end if end if if opcode = -3 then 'build index from sorted data call killfile("sortdata.ddx", ddunit) 'kill user index file open "sortdata.ddx" for binary as #ddunit 'open user index file ddxrcd$ = space$(2048) 'collection buffer for index-build filptr = 0 'record pointer for writing .ddx buffer to file ddxptr = 1 'buffer pointer for adding index values to ddxrcd$ gosub getproc 'get first index record while not closed 'retrieve index pointers and save to .ddx file mid$(ddxrcd$, ddxptr) = mki$(rcdptr) 'copy index to .ddx buffer ddxptr = ddxptr + 2 'increment buffer pointer if ddxptr > 2048 then 'write a group of data to file filptr = filptr + 1 'increment file pointer call fileio(ddunit, 2048, clng(filptr), ddxrcd$, -1) 'put data ddxptr = 1 'reset buffer pointer to beginning of buffer end if gosub getproc 'get next index records wend if ddxptr > 1 then 'save leftover index pointers call fileio(ddunit, 2048, clng(filptr + 1), ddxrcd$, -1) 'put data end if close #ddunit 'close the .ddx file ddxrcd$ = "" 'reclaim memory from .ddx buffer else 'retrieve a single sort record and return to calling program gosub getproc 'get a record pointer if not closed then 'retrieval OK as long as more records available call fileio(sdunit, sortln, clng(rcdptr), sdat$, 0) 'retrieve data end if end if if closed then 'retrieval/index completed if opcode = -2 then 'final (single-record retrieval) sequence call killfile("sortdata.ndx", ndunit) 'kill sort index workfile call killfile("sortdata.sdx", sdunit) 'kill sort data file sdat$ = "" 'kill sort data buffer end if nrcd$ = "" 'kill index file buffer sbuf$ = "" 'kill main sort group buffer sndx$ = "" 'kill sort index buffer end if end if exit sub 'return to calling program '---------------------------------------------------------------------- fillproc: 'put sort data into sbuf$, sndx$ '---------------------------------------------------------------------- if opcode = 0 then lset drcd$ = sdat$ 'load all segments at once iptr = 1 'initialize work buffer pointer for segno = 1 to nosegs 'load segments into work buffer and/or do invert if midchar(smsk$, ibeg(segno)) = 66 then 'invert 16-bit integer strings ichr = midchar(sdat$, ibeg(segno)) 'save first byte, then swap mid$(drcd$, iptr) = char$(midchar(sdat$, ibeg(segno) + 1)) '2nd byte mid$(drcd$, iptr + 1) = char$(ichr) 'put 1st byte in 2nd position else 'non-integer (character) sort segment if opcode then 'segments not in original (contiguous) sequence mid$(drcd$, iptr) = mid$(sdat$, ibeg(segno), ilen(segno)) end if 'insert each sort segment into temp. buffer [above] end if if iseq(segno) = nvflag then 'invert data for ascend/descend sequence for ichr = iptr to iptr + ilen(segno) - 1 'do each byte in segment mid$(drcd$, ichr) = char$(255 - midchar(drcd$, ichr)) next 'data will be re-inverted before writing to file end if iptr = iptr + ilen(segno) 'increment work buffer segment pointer next 'begin binary search for sort compare [below] topptr = recptr 'set top end of binary search lowptr = 0 'set low end of binary search while topptr - lowptr > 1 'search work data buffer using work index buffer midptr = lowptr + (topptr - lowptr) \ 2 'set mid point for compare ichx = midptr * 2 'mid-position incorporating 16-bit index width ichr = midchar(sndx$, ichx) * 256 'same as cvi(mid$(.....)) iptr = (midchar(sndx$, ichx - 1) + ichr - offset - 1) * sortln 'mid- if left$(drcd$, compln) <= mid$(sbuf$, iptr + 1, compln) then '-buff.pos topptr = midptr 'move search lower else 'sort record value > compare value in sort memory buffer lowptr = midptr 'move search higher end if wend iptr = topptr * 2 - 1 'current index-"stack" insert position mid$(sbuf$, (grpptr - 1) * sortln + 1) = drcd$ 'write sort data to buffer mid$(sndx$, iptr + 2) = mid$(sndx$, iptr, (recptr - topptr) * 2) 'shift ndx mid$(sndx$, iptr) = mki$(grpptr + offset) 'write current pointer to index return 'return to calling routine '----------------------------------------------------------------------- getproc: 'retrieve a record from the sort '----------------------------------------------------------------------- if ndxgrp then 'merge-retrieval from sort groups if recptr then 'sort records are still available ichr = outptr * 2 'mid-position based on 16-bit index width grpptr = midchar(sndx$, ichr - 1) + midchar(sndx$, ichr) * 256 if memndx then 'get group pointer from work index [above] ichr = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 'get record ptr rcdptr = midchar(nrcd$, iptx(grpptr, 0)) + ichr 'from memory-index else 'get record pointer from index file call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0) rcdptr = cvi(nrcd$) 'nrcd$ is a 16-bit integer record end if if sortsq > 0 then mid$(sndx$, 1) = mid$(sndx$, 3) 'shift work index if iptx(grpptr, 0) = iptx(grpptr, 1) then 'end of group reached recptr = recptr - 1 'decrement group stack pointer if sortsq < 0 then outptr = recptr 'set output pointer if appl. else 'end of group not yet reached iptx(grpptr, 0) = iptx(grpptr, 0) + (1 - memndx) * sortsq'move ptr if memndx then 'get a data record using a pointer from memory ichr = midchar(nrcd$, iptx(grpptr, 0)) 'get the record pointer ichx = midchar(nrcd$, iptx(grpptr, 0) + 1) * 256 '..from memory call fileio(sdunit, sortln, clng(ichr + ichx), sdat$, 0) else 'get a data record using a pointer from the index file call fileio(ndunit, ndxlen, clng(iptx(grpptr, 0)), nrcd$, 0) call fileio(sdunit, sortln, clng(cvi(nrcd$)), sdat$, 0) end if gosub fillproc 'add the data record to the merge-sort end if closed = 0 'retrieval process not closed else 'no more records available closed = not 0 'retrieval process closed end if else 'non-merge sort retrieval; all data is in memory if outtot then 'sort records are still available ichr = outptr * 2 'mid-position based on 16-bit index width rcdptr = midchar(nrcd$, ichr - 1) + midchar(nrcd$, ichr) * 256 outptr = outptr + sortsq 'increment or decrement index pointer outtot = outtot - 1 'decrement remaining records closed = 0 'retrieval process not closed else 'no more records available closed = not 0 'retrieval process closed end if end if return 'return to calling routine '---------------------------------------------------------------------- putproc: 'add a record to the sort '---------------------------------------------------------------------- if recptr > maxrcd then 'too many records to fit in memory if ndxgrp = 0 then 'first group; initialize index group variables grptot = recptr - 1 'number of records per group ndxlen = grptot * 2 'size of index file buffer end if gosub writeproc 'save data group and index group offset = rcdptr - 1 'group-to-record offset counter recptr = 1 'reset group record counter grpptr = 1 'sort buffer pointer end if gosub fillproc 'add current record to sort rcdptr = rcdptr + 1 'increment total records counter recptr = recptr + 1 'increment group record counter grpptr = recptr 'sort buffer pointer return 'return to calling routine '----------------------------------------------------------------------- writeproc: 'write index and sort data to files '----------------------------------------------------------------------- ndxgrp = ndxgrp + 1 'increment the index group number call fileio(ndunit, ndxlen, clng(ndxgrp), left$(sndx$, ndxlen), -1) if opinit > -3 then 'initial sequences; save sort data to .sdx file for iptr = 0 to (maxrcd - 1) * sortln step sortln 'loop thru mem.buffer for segno = 1 to nosegs 're-invert data as appropriate iptz = iptr + ibeg(segno) 'sort group memory buffer pointer if midchar(smsk$, ibeg(segno)) = 66 then 'invert integer string ichr = midchar(sbuf$, iptz) 'save first byte, then swap mid$(sbuf$, iptz) = char$(midchar(sbuf$, iptz + 1)) '2nd byte mid$(sbuf$, iptz + 1) = char$(ichr) 'put 1st byte in 2nd pos. end if if iseq(segno) = nvflag then 'invert data for ascend/descend seq for ichr = iptz to iptz + ilen(segno) - 1 'invert each byte mid$(sbuf$, ichr) = char$(255 - midchar(sbuf$, ichr)) next end if next next sdxlen = maxrcd * sortln 'size of group memory buffer xflptr = lof(sdunit) \ sdxlen + 1 'current data "record" call fileio(sdunit, sdxlen, xflptr, sbuf$, -1) 'put data group to file end if return end sub 'return to calling program sub fileio (fcno, flen, xrec, fbuf$, fopr) static 'read/write file data 'int fcno 'file unit/channel no. 'int flen '"record" length used for positioning only 'int fopr '0 = read; non-0 = write 'long xrec 'logical "record" number 'char fbuf$ 'read/write data buffer xpos = (xrec - 1) * flen + 1 'absolute byte position in file if fopr then 'operation = write put #fcno, xpos, fbuf$ 'write data to file else 'operation = read get #fcno, xpos, fbuf$ 'read data from file end if end sub 'return to calling program sub killfile (ffil$, fcno) static 'kill a DOS file 'int fcno 'file unit/channel no. 'char ffil$ 'file name close #fcno 'close file if open open ffil$ for binary as #fcno 'open file in binary mode close #fcno 'close the file kill ffil$ 'kill the file end sub 'return to calling program sub memfree (xexc, xmax, xfree) static 'get max. free memory less exclusion 'long xexc 'amount of memory to reserve/exclude 'long xmax 'upper limit for xfree (or zero) xfree = fre("") - xexc 'total free memory less exclusion if xmax > 0 and xfree > xmax then xfree = xmax 'set maximum if applicable end sub 'return to calling program
Copyright © 1991, Dr. Dobb's Journal