Tools

Accessing Large Data Arrays With X-Array

By Barr E. Bauer, June 01, 1992

Barr examines X-arRAY, a Fortran library that manages extended memory and math operations.

JUN92: ACCESSING LARGE DATA ARRAYS WITH X-ARRAY

Barr uses high-performance computers to design pharmaceuticals for Schering-Plough Research Institute. He can be reached at 60 Orange St., B-1-3-85, Bloomfield, NJ 07003.

The primary barrier to using contemporary 386-based PCs for tackling large-data scientific and engineering problems is the artificial limitation imposed by conventional memory. All other factors considered, they are far faster, pack more memory and disk, and are substantially cheaper than standard platforms like the early Sun workstations and the MicroVAX-II of just a few years ago. Except for the lack of multitasking and virtual-memory support, even DOS is not a major limitation. Yet, the anachronistic 640-Kbyte conventional memory limit, a holdover from the original IBM-PC design, effectively blocks their use for all but the smallest of problems. You can access extended memory using a DOS extender, a PC version of UNIX, or Windows as a DOS extender for Microsoft Fortran 5.0--or you can use libraries like X-arRAY.

This article examines X-arRAY routines for handling megabyte-sized data arrays. The X-arRAY package is a tiny (84 Kbytes) Microsoft Fortran 5.0-compatible library of subroutines that manage access to extended memory and perform mathematical operations on data stored in arrays located within either extended or conventional memory. As such, X-arRAY is actually a combination of an extended-memory manager and a general-purpose array-manipulation package that sets it apart from DOS extenders.

X-arRAY Memory Management

The first call to the X-arRAY memory-management routines places the program in protected mode; the details of protected-mode operation are handled entirely by X-arRAY. Extended-memory access is either through XMS via the Microsoft HIMEM.SYS driver (preferred) or the modified LIM control block. HIMEM.SYS is standard with DOS 5.0 or Windows 3.0, making it a convenient choice. X-arRAY can use whichever memory manager is available or be forced to use a specific manager.

The extended-memory management routines (see Table 1) operate in a manner analogous to that of those used for C memory management: Memory blocks are requested by size, referenced through a key that serves as a pointer to the allocated memory, and freed. In contrast to memory management in C, getxtd returns both an integer*4 handle and a modified integer*4 key associated with the successfully allocated extended-memory block. The handle is used by relxtd and endxtd to free the memory allocation. The key is the absolute address of the first byte of the allocated memory block, with bits 30 and 31 set to mark it as a legitimate key referencing extended memory. All of the library routines use this key to access and manipulate extended memory. The key itself behaves like a pointer and can be conveniently manipulated by address arithmetic. The maximum allocation is 1 Gbyte, which s ould be enough for most applications. (If you really need huge amounts of memory, you ought to seriously consider relocating your application to a more appropriate computer.)

Table 1: X-arRAY extended-memory access routines.

  Routine        Description
  ------------------------------------------------------------

  getxtd         Allocate blocks of available extended memory
  bufxtd         Allocate memory in memory-mapped hardware
  inqxtd         Report status of extended memory allocations
  relxtd         Free a single allocation
  endxtd         Free all allocations
  rzmxtd         Restore linkage to existing allocation(s)
  a2axtd         Array-to-array copy
  a2fxtd         Extended-memory allocation to file copy
  f2axtd         File to extended-memory allocation copy
  sgtrnm         Get a real*4 from extended memory
  sgtcnm         Get a complex*8 from extended memory
  igt[1/2/4]im   Get an integer*[1/2/4] from extended memory
  sptrnm         Put a real*4 into extended memory
  sptcnm         Put a complex*8 into extended memory
  ipt[1/2/4]im   Put an integer*[1/2/4] into extended memory
  flashr         Flash extended-memory access on console

  *[1/2/4] means either 1,2, or 4 at that position in the name
  corresponding to the variable type employed.

Allocation size can be specified by indicating the array dimensionality, width of each dimension (passed as an array), and the size of the variable in bytes. Alternatively, you can simply specify the total number of bytes desired. For example, the two getxtd calls in Figure 1 are equivalent. Both allocate enough extended memory for a 512x512 array of real*4 variables. The actual allocated memory is structureless--that is, not associated with any array dimensionality or variable type. Structure and variable types are imposed by the manipulation routines that themselves can use either mode to address specific array elements or subarrays in the allocated block. This turns out to be very handy (and makes accessing extended memory straightforward) when retention of array addressing is important. Also, the array can be manipulated in portions using address arithmetic.

Figure 1: Equivalent calls using getxd.

  call getxtd(0,0,1048576,0,ihandle,key,kbytes,iret,ier)

  and

  iwidth(1) = 512
  iwidth(2) = 512
  call getxtd(2,iwidth,4,0,ihandle,key,kbytes,iret,ier)

Unlike C, program termination does not automatically deallocate extended-memory blocks. In fact, allocated memory blocks persist intact, including their data, until deallocated by another program or machine reboot. Memory allocations are under the control of the XMS or LIM memory manager, which is external to the program. endxtd provides convenient end-of-program allocation cleanup and ensures that all blocks are freed; see Listing Five and Listing Six (page 114).

The persistence of extended-memory allocations beyond program termination can be used to advantage. rzmxtd reestablishes the linkage to extended memory previously allocated by an earlier program. rzmxtd uses a snapshot of the active handles and keys (provided by inqxtd) passed between the programs in a binary file. inqxtd also determines free and allocated memory, memory management in use, and other useful data. Although I do not have a specific example of this, I can envision a large-data/large-code Fortran program broken into smaller modules that each operate on the data passed between the modules in extended memory.

Routines are provided to shuttle data between extended memory and conventional memory, either as blocks or as individual variables (Table 1). The block-copy routine a2axtd determines the data type by its size in bytes, while the individual element routines are specific to the variable types. Routines are also provided to copy data between extended memory and binary files.

a2axtd uses extended-memory keys and/or conventional memory array names to specify source and destination, thus requiring the MS-Fortran interface to directive in order to pass keys by value and to properly declare real and complex arrays. The multiple contexts for a2axtd in a program that shuttles blocks of data between extended and conventional memory created a problem that was solved by interfacing a2axtd twice. The first version of a2axtd was interfaced at the top of the example for copying data from extended memory (via a key) into a real*4 array in conventional memory. The second version of a2axtd was aliased by the subroutine putback in a separate source file (see Listing Four, page 113) and interfaced to copy from a real*4 array in conventional memory to extended memory pointed to by the key. Yes, Fortran has no alias (but should), so putback merely passes its arguments through to the different versions of a2axtd. When you see putback in the examples, think a2axtd.

Finally, data stored in extended memory can be manipulated in extended memory using a number of unary and binary routines (Table 2). The routines ssmrnm (array scaling) and smprnm (element-by-element product of two arrays) are used in Listing One (page 112). Note that the binary array product is not the normal array product. Each routine operates on specific variable types currently limited to integer*1, integer*2, integer*4, real*4 and complex*8. Of interest to those who do fast fourier transformations, for which X-arRAY is finely tuned, are access routines to handle floating-point numbers in a decimated form and to manipulate the bits of array elements. As with a2axtd, the keys must be passed by value necessitating the use of the interface to directive.

Table 2: Extended-memory data-manipulation array routines.

  Routine        Description
  ----------------------------------------------------------------------

  sabcnm         Absolute value of a complex*8 array
  scjcnm         Conjugate a complex*8 array
  szicnm         Zero the imaginary part of a complex*8 array
  szrcnm         Zero the real part of a complex*8 array
  sngrnm         Negate a real*4 array
  sngcnm         Negate a complex*8 array
  ssmrnm         Scalar multiply a real*4 array
  ssmcnm         Scalar multiply a complex*8 array
  ism[1/2/4]sm   Scalar multiply a signed integer*[1/2/4] array
  ism[1/2/4]um   Scalar multiply an unsigned integer*[1/2/4] array
  imn[1/2/4]sm   Location and value of min element of signed
                 integer*[1/2/4] array
  imn[1/2/4]um   Location and value of min element of unsigned
                 integer*[1/2/4] array
  imx[1/2/4]sm   Location and value of max element of signed
                 integer*[1/2/4] array
  imx[1/2/4]um   Location and value of max element of unsigned
                 integer*[1/2/4] array
  sadrnm         Element-by-element sum of real*4 arrays
  sadcnm         Element-by-element sum of complex*8 arrays
  iad[1/2/4]im   Element-by-element sum of integer*[1/2/4] arrays
  smprnm         Element-by-element product of real*4 arrays
  smpcnm         Element-by-element product of complex*8 arrays
  imp[1/2/4]sm   Element-by-element product of signed
                 integer*[1/2/4] arrays
  imp[1/2/4]um   Element-by-element product of unsigned
                 integer*[1/2/4] arrays
  ssbrnm         Element-by-element difference of real*4 arrays
  ssbcnm         Element-by-element difference of complex*8 arrays
  isb[1/2/4]im   Element-by-element difference of integer*[1/2/4]
                 arrays
  sflcnmp        Product of dissimilar complex*8 arrays
  iln[1/2/4]sm   Arbitrary linear combination of signed integer*[1/2/4]
                 arrays
  iln[1/2/4]um   Arbitrary linear combination of unsigned integer*[1/2/4]
                 arrays

  *[1/2/4] means either 1,2, or 4 at that position in the name
  corresponding to the variable type employed.

Extended-memory Strategy

X-arRAY arrays located in extended memory are not arrays from a conventional Fortran-array point of view. The elements are stored in extended memory structured like an array, but cannot be manipulated except through the supplied access routines. One approach might be to replace all array element references with sgtrnm and sptrnm calls in your algorithm to shuttle element values into conventional memory for processing. Although this preserves algorithm structure, data stored in multidimensional arrays is generally accessed by nested loops, in which array-element access occurs in the innermost loop, and large arrays (the reason for using extended memory) will often have many iterations. The result of the overhead associated with the repeated sgtrnm or sptrnm calls on performance is cumulative and lethal.

The strategy shifts to moving blocks of array elements between extended and conventional memory. This dramatically diminishes the overhead, even though the block move done with a2axtd itself takes longer to complete. Because Fortran stores data in column-major order, the ideal unit of movement is a column vector. A 512x512 array in extended memory is read into conventional memory with 512 calls to a2axtd, each moving the nth column vector (,n) of 512 elements, rather than 262,144 calls to sgtrnm. The temporary array receiving the column vector is small enough to not tax the available conventional memory, but the use of a temporary array and pieces of the total array will force an algorithm change that might have to be made anyway for data arrays exceeding the size of conventional memory. Vector supercomputers use this same scheme to boost performance, the difference being that column-vector movement is from conventional memory into an array of special CPU registers. The savings, however, still accrue from moving groups rather than individual elements.

The block-move strategy implements smoothly using the X-arRAY primitives. The 2-D summation in Figure 2(a) becomes that shown in Figure 2(b). The extended memory can be conveniently and temporarily redimensioned from the viewpoint of a2axtd to access 1-D arrays of 512 real*4 elements. The address arithmetic is analogous to that routinely done in C--key1 points to the start of the next column vector to be accessed by the loop. This is perfectly legal as long as key1 points to a legitimate extended-memory allocation and the requested block resides within the allocation; otherwise, a2axtd reports an error.

Figure 2: (a) Summing a two-dimensional array; (b) using the block-move strategy to sum a two-dimensional array.

  (a)

  sum = 0.0
  do i = 1,512
      do j = 1,512
          sum = sum + arr(i,j)
      enddo
  enddo

  (b)

  iwidth(1) = 512
  iwidth(2) = 512       ! declared as a 2D array
  call getxtd(2, iwidth,4,ihandle,key,kbret,iret,ier)
  :
  sum = 0.0
  key1 = key            ! used for address arithmetic
  ichunk = 4 * 512      ! size of 512 real*4 elements
  do i = 1,512          ! loop over column vectors
      call a2axtd (1,512,4,key1,temp,iret,ier) ! bring in as 1D
      do j = 1,512      ! loop down temp array doing sum
          sum = sum + temp (j)
      enddo
      key1 = key1 + ichunk  ! advance to the next column vec
  enddo

Listing Two (page 112) tests this by performing the same summation twice, first by column-vector moves and second by individual-element accesses. The results are dramatic. The column-vector step processes the 1-Mbyte array in 3.16 seconds and produces sum=3.436025E + 10. The individual-element access pass done in row order such that the second index was associated with the inner loop and accesses were to noncontiguous array elements requires 126.4 seconds and produces sum=3.434290E + 10. These results are from a 16-MHz 386/387SX computer. Clearly, the column-vector approach works well with only a small restructuring of the algorithm.

The different sums produced are normal for floating-point calculations, but are also a concern. The difference is due to different cumulative round-off errors that are the result of elements being summed in a different order. Reverse the indexes in Listing Two into column order for the individual-element summation and it gives an answer identical to the column-vector version. Note that we are not talking about a correct or pure answer; the reality of floating-point calculations is that they have an unavoidable round-off error that manifests differently, depending on the order of calculations. If you need the same answer independent of method, be sure to process the array elements in column order to produce the same round-off error. Column ordering in arrays is, in my opinion, a flaw in Fortran (or the teaching of Fortran) because most programmers write multidimensional arrays with the index order following loop nesting; see Figure 3.

Figure 3: A multidimensional array with the index order following loop nesting.

  do outer = 1,n
      do inner = 1,n
          sum = sum + a(outer,inner)
      enddo
  enddo

The above discussion does not address array elements stored contiguously in memory. For maximum performance, array indexing should be a(inner, outer). The inner loop references, contiguous array elements stored in memory, and the outer references the column vector. This facilitates easy conversion to the column-vector transfer strategy discussed here. It also makes vectorization and parallelization possible, but that is a story for another day.

A triply nested lower triangular array (see Listing Three, page113) in which the inner-loop bounds depend on the current value of an outer-loop index presents a challenge. Although only one array is used, two column vectors are manipulated, and the number of elements used in the column vector varies. The strategy is similar to that in Listing Two. Two column vectors (,k) and (,j) must be moved into their corresponding temporary arrays and processed. Then the (,j) column vector is put back into its original place in the array in extended memory. This is shown schematically in Figure 4 and completely in Listing Three.

Figure 4: Two column vectors (,k) and (,j) must be moved into their corresponding temporary arrays and processed. Then the (,j) column vector is put back into its original place in the array in extended memory.

  keyj = key
  keyk = key ! both temporary pointers point to the same array
  do j = 1,512
      ! get column vector (,j) pointed to by keyj into arrj ()
      do k = 1,j-1
          ! get column vector (,k) pointed to by keyk into arrk()
          do i = k+1, 512
              arrj(i) = arrj(i) + arrk(i) *arrj(k)
          enddo
          ! increment keyk to next column vector
      enddo
      ! put arrj() back into extended memory pointed to by keyj
      ! increment keyj to next column vector
  enddo

The address arithmetic is kept simple by copying entire column vectors, even though only part of a vector may be used for any given iteration. Improved performance might be eeked out by moving only the required portion of the column vector but at the price of more overhead from the additional address arithmetic. Listing Three runs as expected, steadily slowing as the simulation proceeds, but still completing within 13 minutes. Note that the basic algorithm structure was not mangled beyond recognition.

The shuttling of array blocks into conventional memory for processing breaks down when the algorithm is fatally row oriented, as in the case of an array inversion using Gaussian elimination. I was interested in a megabyte-sized array-inversion routine for reconstructing 2-D stereo graphics projections into 3-D, as an example. The inverter I created was sadly too slow, due to the large amount of single-element shuttling to and from extended memory. The basic algorithm also became unrecognizable. When this happens, the best bet is to use a DOS extender, in my case, the Windows version of Microsoft Fortran 5.1; X-arRAY manipulation of extended memory should be targeted at contiguous array elements for the best performance, as demonstrated in Listing Two.

Manipulating Data in Extended Memory

Clearly, shuttling portions of a megabyte-sized array in and out of conventional memory for processing is feasible, even efficient. It is far more desirable to manipulate the data directly in extended memory wherever possible. Consider a case in which a megabyte-sized array is duplicated in extended memory, all members of the duplicate array are multiplied by a scale factor, and then the two arrays are multiplied element-by-element with the results placed into the third array. This was done in Listing One with the added wrinkle that the array copy was done by copying the source array from extended memory directly to a binary file, and then reading the file directly into the newly allocated destination in extended memory. I also used inqxtd to assess available extended memory and determine which extended-memory manager was active at the start of the example. All phases of the resulting program were quick: one to three seconds, even on my relatively slow 386SX.

Conclusion

Frankly, the ability to access extended memory from within a DOS program free of DOS extenders was refreshing. Compared to the Windows extensions to Microsoft Fortran, X-arRAY addresses more extended memory, memory can be managed in a manner familiar to C programmers, and the resulting programs run faster and are independent of Windows. I liked the performance delivered by X-arRAY even though effort was required to incorporate the extended-memory routines into programs. That effort will often lead to optimizations that might otherwise be overlooked. What I would like to see in future versions of the X-arRAY library is an expanded list of array primitives such as a true-array product, determinant, array inverter, swap elements or columns, and fill with value; all of course supporting all Fortran data types. I would even like to see this functionality in a C-language library.

Incorporation of X-arRAY into applications will depend on the application. I have found that programs ultimately intended for UNIX computers can be successfully developed and tested with their full-sized (multimegabyte) arrays using the Windows version of Microsoft Fortran. Performance is not great, but that is not the point of cross-platform program development. On the other hand, a large-memory, array-based Fortran application undergoing a one-way port onto a DOS-based PC will benefit from incorporation of X-arRAY routines.

Products Mentioned

X-arRAY 1.0 Release 2 Davis Associates Inc. 43 Holden Road West Newton, MA 02165 617-244-1450 $99.00 Minimum requirements: 80386 with 387 math coprocessor; MS-DOS 2.0 or higher; Microsoft Fortran 5.0



_ACCESSING LARGE ARRAYS WITH X-ARRAY_
by Barr E. Bauer

[LISTING ONE]

<a name="015c_0010">

* Extended memory manipulation using X-arRAY Fortran Library.
* Does the following: 1. allocates a 1 Mbyte real*4 array a(512,512); 2. loads
*   array a with real*4 values; 3. saves the data in array a to disk;
*   4. allocates two 1 Mbyte real*4 arrays b and c; 5. loads data from file
*   (step 3) into array b; 6. scales all members of array b by 5.0; 7. does an
*   element-by-element array multiplication of arrays a and b, results into
*   array c; 8. sums all members of array c, reports results.
* Compile with Microsoft Fortran 5.1 using:
*    fl /FPi87 /G2 example1.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92

      interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
      integer*4 i1,i2,i3,i4,i5
      integer*2 i6
      real*4 r1
      end

      interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      interface to subroutine smprnm(i1,i2,i3[VALUE],i4[VALUE],
     +  i5[VALUE],i6)
      integer*4 i1,i2,i3,i4,i5
      integer*2 i6
      end

      interface to subroutine ssmrnm(i1,i2,i3[VALUE],r1,i4)
      integer*4 i1,i2,i3
      real*4 r1
      integer*2 i4
      end

      include 'bagit.inc'  ! error codes and other symbols
    integer*4 kb_total, kb_unallocated, number_allocations
    integer*4 memory_manager, required_memory, shortage
    integer*4 handle_array(1), key_array(1)
    integer*4 ARRAY_SIZE(ARRAY_DIM), allocated_array(1)

    integer*4 handle, key, key1, kb_allocated
    integer*4 bytes_moved, increment
      integer*4 keyb, keyc, handleb, handlec
    real*4 temp, a(SIZE)
    integer*2 return_status, eflag
      character*13 tempfile
      data tempfile /'tempfile.dat'C/ ! C string format
      data ARRAY_SIZE / SIZE, SIZE /

* enable extended memory routine flashing
      call flashr(ON,LOWER_RIGHT,eflag)
      if (eflag .ne. 0) call bagit(FLASHR_ERROR)
      required_memory = 3*SIZE*SIZE*REAL4/1024 ! need 3 Mbytes
* determine status of extended memory
      call inqxtd(kb_total, kb_unallocated, number_allocations,
     +      memory_manager, handle_array, key_array,
     +      allocated_array, return_status, eflag)
      if (eflag .ne. 0) call bagit(INQXTD_ERROR)
      if ((memory_manager .eq. 0) .or.
     +    (memory_manager .gt. 2)) then
            call bagit(WRONG_MMANAGER)
      else if (memory_manager .eq. 1) then
        print *,'XMS in use'
      else
        print *,'Modified LIM in use'
      endif
      print *,'Extended memory available ',kb_unallocated,' kb'
      if (kb_unallocated .lt. required_memory) then
            shortage = required_memory - kb_unallocated
            print *,'insufficient memory, need',shortage,'kb'
            call bagit(STOPPING)
      endif
* enough memory present, allocate memory for 1st array
      print *,'just ahead of memory allocation'
      ! allocate a 2D array of real*4 dimensioned 512 by 512
      call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handle,key,
     1      kb_allocated,return_status, eflag)
      if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* load extended memory array (X,Y) with 1.0 using column vector approach
      print *,'at loading stage'
      key1 = key
      temp = 0.0
      increment = SIZE*REAL4
      do j = 1,SIZE
            do k = 1,SIZE
                  a(k) = 1.0 ! fills the 1D array with values
            enddo
            ! move the 1D into extended memory by columns
            ! putback is a2axtd interfaced for
            ! conventional -> extended memory transfers
            call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
            if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
            if (bytes_moved .ne. increment) then
                call bagit(PUTBACK_BADCNT)
            endif
            key1 = key1 + increment
      enddo
* save a copy of this array to disk
      print *,'saving array to file'
      call a2fxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,tempfile,key,
     +      ibytes_moved,eflag)
      if (ibytes_moved.ne.SIZE*SIZE*REAL4) then
          call bagit(A2FXTD_BADCNT)
      endif
      if (eflag.ne.0) call bagit(A2FXTD_ERROR)
* allocate extended memory for arrays b and c
      call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handleb,keyb,
     +      kb_allocated,return_status, eflag)
      if (eflag .ne. 0) call bagit(GETXTD_ERROR)
      call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handlec,keyc,
     +      kb_allocated,return_status, eflag)
      if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* read file into extended memory for array b
      print *,'reading tempfile'
      call f2axtd(ARRAY_DIM,ARRAY_SIZE,REAL4,tempfile,keyb,
     1      ibytes_moved,eflag)
      if (eflag.ne.0) call bagit(F2AXTD_ERROR)
      if (ibytes_moved.ne.SIZE*SIZE*REAL4) then
          call bagit(F2AXTD_BADCNT)
      endif
* scale array b by 5.0
      print *,'scaling array b elements by 5.0'
      call ssmrnm(ARRAY_DIM,ARRAY_SIZE,keyb,5.0,eflag)
      if (eflag.ne.0) call bagit(SSMRNM_ERROR)
* element-by-element mult of a and b, results to c
      print *,'ahead of array multiplication'
      call smprnm(2,ARRAY_SIZE,key,keyb,keyc,eflag)
      if (eflag .ne. 0) call bagit(SMPRNM_ERROR)
* sum all elements of array c to check results by using column vectors to
* bring data from extended into conventional memory, where sum is performed.
      key1 = keyc
      temp = 0.0
      increment = SIZE*REAL4
      do j = 1,SIZE
        call a2axtd(1,SIZE,REAL4,key1,a,bytes_moved,eflag)
        if (eflag.ne.0) call bagit(A2AXTD_ERROR)
        if (bytes_moved.ne.increment) call bagit(A2AXTD_BADCNT)
           do i=1,SIZE
               temp = temp + a(i)
           enddo
        key1 = key1 + increment ! advance to next column vector
      enddo
      print *,'done, sum = ',temp,' (correct = 1310720.000000)'
* done, remove all allocations through ENDXTD in bagit
      call bagit(DONE)
      stop
      end




<a name="015c_0011">
<a name="015c_0012">

[LISTING TWO]

<a name="015c_0012">

* Performs a sum reduction first using column vector moves then individual
* element accesses
* Compile with Microsoft Fortran 5.1
*  fl /FPi87 /G2 example1.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92
*
      interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
      integer*4 i1,i2,i3,i4,i5
      integer*2 i6
      real*4 r1
      end

      interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      include 'bagit.inc'

    integer*4 kb_total, kb_unallocated, number_allocations
    integer*4 memory_manager, required_memory, shortage
    integer*4 handle_array(1), key_array(1), allocated_array(1)
    integer*4 ARRAY_SIZE(2)

    integer*4 handle, key, key1, kb_allocated, increment
    integer*4 bytes_moved, index(2), keyj

    real*4 temp, a(SIZE), arrj(SIZE)
    integer*2 return_status, eflag

      data ARRAY_SIZE / SIZE, SIZE /   ! 2D 512x512 array used
* enable console flashing when extended memory is accessed
      call flashr(1,3,eflag)
      if (eflag .ne. 0) call bagit(FLASHR_ERROR)
      required_memory = SIZE*SIZE*REAL4/1024
* check for adequate XMS memory, quit if inadequate
      call inqxtd(kb_total, kb_unallocated, number_allocations,
     +      memory_manager, handle_array, key_array,
     +      allocated_array, return_status, eflag)
      if (eflag.ne.0) call bagit(INQXTD_ERROR)
      if (required_memory .gt. kb_unallocated) call bagit(NOT_ENOUGH)
* allocate a 512 by 512 array of real*4
      print *,'just ahead of memory allocation'
      call getxtd(2,ARRAY_SIZE,REAL4,XMS,handle,key,
     1      kb_allocated,return_status, eflag)
      if (eflag .ne. 0) call bagit(GETXTD_ERROR)
* load extended memory array (X,Y) using column vectors
      print *,'at loading stage'
      key1 = key
      temp = 0.0
      increment = SIZE*REAL4
      do j = 1,SIZE
        do k = 1,SIZE
           a(k) = float(k) + float(SIZE*(j-1))
        enddo
        call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
        if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
        if (bytes_moved .ne. increment) then
          call bagit(PUTBACK_BADCNT)
        endif
        key1 = key1 + increment
      enddo
* column vector summation
      print *,'start column vector sum reduction'
      sum_col = 0.0
      chunk = SIZE*REAL4
      do j=1,SIZE
        keyj = key + chunk*(j-1)  ! address arithmetic
        ! put (,j) into arrj
        call a2axtd(1,SIZE,REAL4,keyj,arrj,bytes_moved,eflag)
        if (eflag.ne.0) call bagit(A2AXTD_ERROR)
        if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
        do k=1,SIZE ! process the column vector
          sum_col = sum_col +arrj(k)
        enddo
      enddo
      print *,'done with column vector sum reduction'
* individual element access
      print *,'start individual access sum reduction'
      sum_ind = 0.0
      do i=1,SIZE
        do j=1,SIZE
          index(1)=i   ! row of element
          index(2)=j   ! column of element
          ! get the element into retval
          call sgtrnm(2,ARRAY_SIZE,key,index,retval,eflag)
          if (eflag.ne.0) call bagit(SGTRNM_ERROR)
          sum_ind = sum_ind + retval
        enddo
      enddo
      print *,'done with individual access sum reduction'
      print *,'column sum =',sum_col,', individual sum =',sum_ind
      call bagit(DONE)
      stop
      end






<a name="015c_0013">
<a name="015c_0014">

[LISTING THREE]

<a name="015c_0014">

* Triangular array manipulation of a single 1 Mbyte real*4 array arr(512,512)
*  using X-arRAY routines
* Does the following:
*    do j=1,512
*        do k = 1, j-1
*            do i = k+1, 512
*                arr(i,j) = arr(i,j) + arr(i,k) * arr(k,j)
*            enddo
*        enddo
*    enddo
* Compile in Microsoft Fortran 5.1 using:
* fl /FPi87 /G2 example2.for putback.for bagit.for /link xarray
* B. E. Bauer 3/20/92
*
      interface to subroutine a2axtd(i1,i2,i3,i4[VALUE],r1,i5,i6)
      integer*4 i1,i2,i3,i4,i5
      integer*2 i6
      real*4 r1
      end

      interface to subroutine sgtrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      interface to subroutine sptrnm(i1,i2,i3[VALUE],i4,r1,i5)
      integer*4 i1,i2,i3,i4
      integer*2 i5
      real*4 r1
      end

      include 'bagit.inc'

      integer*4 kb_total, kb_unallocated, number_allocations
      integer*4 memory_manager, required_memory
      integer*4 handle_array(1), key_array(1), allocated_array(1)
      integer*4 ARRAY_SIZE(ARRAY_DIM)

      integer*4 handle, key, key1, kb_allocated, increment
      integer*4 bytes_moved, index(2), keyj, keyk

      real*4 temp, a(SIZE), arrj(SIZE), arrk(SIZE)
      integer*2 return_status, eflag

      data ARRAY_SIZE / SIZE, SIZE /
      call flashr(ON,LOWER_RIGHT,eflag)
      required_memory = SIZE*SIZE*REAL4/1024
      call inqxtd(kb_total, kb_unallocated, number_allocations,
     +      memory_manager, handle_array, key_array,
     +      allocated_array, return_status, eflag)
      if (eflag.ne.0) call bagit(INQXTD_ERROR)
      if (kb_unallocated .lt. required_memory) then
        call bagit(NOT_ENOUGH)
      endif
* allocate 1 Mbyte of extended memory
      print *,'just ahead of memory allocation'
      call getxtd(ARRAY_DIM,ARRAY_SIZE,REAL4,XMS,handle,key,
     +      kb_allocated,return_status, eflag)
      if (eflag .ne. 0) call bagit(GETXTD_ERROR)
      print *,'loading extended memory'
      key1 = key
      temp = 0.0
      increment = SIZE*REAL4
      do j = 1,SIZE
            do k = 1,SIZE
                  a(k) = 0.00025
            enddo
            call putback(1,SIZE,REAL4,a,key1,bytes_moved,eflag)
            if (eflag .ne. 0) call bagit(PUTBACK_ERROR)
            if (bytes_moved .ne. increment) call bagit(PUTBACK_BADCNT)
            key1 = key1 + increment
      enddo
* process triangular array
      print *,'processing triangular array'
      keyj = key
      keyk = key
      chunk = SIZE*REAL4
      do j=1,SIZE
        print *,'outer loop j = ',j
        ! get arr(x,j) from extended into arrj(x)
        call a2axtd(1,SIZE,REAL4,keyj,arrj,bytes_moved,eflag)
        if (eflag.ne.0) call bagit(A2AXTD_ERROR)
        if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
        do k=1,j-1
          keyk = key + (k-1)*chunk
          ! get arr(x,k) from extended into arrk(x)
          call a2axtd(1,SIZE,REAL4,keyk,arrk,bytes_moved,eflag)
          if (eflag.ne.0) call bagit(A2AXTD_ERROR)
          if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
          ! do the manipulation
          do i=k+1,SIZE
            arrj(i) = arrj(i) + arrk(i)*arrj(k)
          enddo
        enddo
        ! put arrj(x) back to extended memory
        call putback(1,SIZE,REAL4,arrj,keyj,bytes_moved,eflag)
        if (eflag.ne.0) call bagit(A2AXTD_ERROR)
        if (bytes_moved.ne.chunk) call bagit(A2AXTD_BADCNT)
        keyj = keyj + chunk
      enddo
* sample selected members of the array in extended memory
      do i=1,SIZE,125
        do j=1,SIZE,125
          index(1)=i
          index(2)=j
          call sgtrnm(ARRAY_DIM,ARRAY_SIZE,key,index,retval,eflag)
          if (eflag.ne.0) call bagit(SGTRNM_ERROR)
          print *,i,j,retval
        enddo
      enddo
      call bagit(DONE)
      stop
      end





<a name="015c_0015">
<a name="015c_0016">

[LISTING FOUR]

<a name="015c_0016">

* putback.for--interface a2axtd for conventional to extended memory block moves
* B. E. Bauer  3/20/92
*
      interface to subroutine a2axtd(i1,i2,i3,r1,i4[VALUE],i5,i6)
      integer*4 i1,i2,i3,i4,i5
      integer*2 i6
      real*4 r1
      end

      subroutine putback(i1,i2,i3,r1,i4,i5,i6)
      integer*4 i1, i2, i3, i4, i5
      real*4 r1(*)
      integer*2 i6
      call a2axtd(i1,i2,i3,r1,i4,i5,i6)
      return
      end




<a name="015c_0017">
<a name="015c_0018">

[LISTING FIVE]

<a name="015c_0018">

* bagit.inc--symbols and declarations used for error handling and the examples.
* B. E. Bauer  3/20/92
*
      integer*4 INQXTD_ERROR,WRONG_MMANAGER,STOPPING,GETXTD_ERROR
      integer*4 PUTBACK_ERROR,PUTBACK_BADCNT,A2AXTD_BADCNT
      integer*4 A2AXTD_ERROR,A2FXTD_BADCNT,A2FXTD_ERROR
      integer*4 F2AXTD_ERROR,F2AXTD_BADCNT,SSMRNM_ERROR
      integer*4 SMPRNM_ERROR,NOT_ENOUGH,SGTRNM_ERROR
      integer*4 FLASHR_ERROR,DONE

      integer*4 ARRAY_DIM,REAL4,XMS,SIZE,ON,LOWER_RIGHT


      parameter (INQXTD_ERROR=1)
      parameter (WRONG_MMANAGER=2)
      parameter (STOPPING=3)
      parameter (GETXTD_ERROR=4)
      parameter (PUTBACK_ERROR=5)
      parameter (PUTBACK_BADCNT=6)
      parameter (A2AXTD_BADCNT=7)
      parameter (A2AXTD_ERROR=8)
      parameter (A2FXTD_BADCNT=9)
      parameter (A2FXTD_ERROR=9)
      parameter (F2AXTD_ERROR=10)
      parameter (F2AXTD_BADCNT=11)
      parameter (SSMRNM_ERROR=12)
      parameter (SMPRNM_ERROR=13)
      parameter (NOT_ENOUGH=14)
      parameter (SGTRNM_ERROR=15)
      parameter (FLASHR_ERROR=16)
      parameter (DONE=99)

      parameter (ARRAY_DIM = 2)       ! 2D array
      parameter (REAL4 = 4)           ! size of real*4
      parameter (XMS = -1)            ! use available mmanager
      parameter (SIZE = 512)          ! size of array
      parameter (ON = 1)              ! convenient symbol
      parameter (LOWER_RIGHT = 3)     ! where flashr flashes




<a name="015c_0019">
<a name="015c_001a">

[LISTING SIX]

<a name="015c_001a">

* bagit.for--error handler. Prints an appropriate message then calls endxtd
*   to ensure allocations are freed.
* B. E. Bauer  3/20/92
*
      subroutine bagit(iflag)
      integer*4 iflag
      integer*2 return_status, eflag

      include 'bagit.inc'

      select case (iflag)
        case (INQXTD_ERROR)
          print *,'error reported by inqxtd'
        case (WRONG_MMANAGER)
          print *,'XMS or Mondified LIM memory manager not found'
        case (STOPPING)
          print *,'stopping...'
        case (GETXTD_ERROR)
          print *,'error reported by getxtd'
        case (PUTBACK_ERROR)
          print *,'error in putback(a2axtd)'
        case (PUTBACK_BADCNT)
          print *,'wrong number of bytes moved by putback(a2axtd)'
        case (A2AXTD_BADCNT)
          print *,'wrong number of bytes moved by a2axtd'
        case (A2AXTD_ERROR)
          print *,'error in a2axtd'
        case (A2FXTD_BADCNT)
          print *,'wrong number of bytes moved by a2fxtd'
        case (A2FXTD_ERROR)
          print *,'error in a2fxtd'
        case (F2AXTD_ERROR)
          print *,'error in f2axtd'
        case (F2AXTD_BADCNT)
          print *,'wrong number of bytes moved by f2axtd'
        case (SSMRNM_ERROR)
          print *,'error in ssmrnm (scalar multiply)'
        case (SMPRNM_ERROR)
          print *,'error in smprnm (el-by-el multiply)'
        case (NOT_ENOUGH)
          print *,'inadequate extended memory available'
        case (SGTRNM_ERROR)
          print *,'error in sgtrnm (real*4 get)'
        case (FLASHR_ERROR)
          print *,'error in flashr'
        case (DONE)
          print *,'freeing extended memory'
      end select
      call endxtd(return_status, eflag)
      stop 'done, exiting...'
      end

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Tools

Accessing Large Data Arrays With X-Array

X-arRAY Memory Management

Table 1: X-arRAY extended-memory access routines.

Figure 1: Equivalent calls using getxd.

Table 2: Extended-memory data-manipulation array routines.

Extended-memory Strategy

Figure 2: (a) Summing a two-dimensional array; (b) using the block-move strategy to sum a two-dimensional array.

Figure 3: A multidimensional array with the index order following loop nesting.

Figure 4: Two column vectors (,k) and (,j) must be moved into their corresponding temporary arrays and processed. Then the (,j) column vector is put back into its original place in the array in extended memory.

Manipulating Data in Extended Memory

Conclusion

Products Mentioned

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Tools

Accessing Large Data Arrays With X-Array

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content