The Five Levels of Raid

Discrete-event simulation lets you predict system performance of "Redundant Arrays of Inexpensive Disks"--small disks grouped together forming the virtual image of larger disks.

January 01, 1992
URL:http://www.drdobbs.com/parallel/the-five-levels-of-raid/184408691

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

JAN92: THE FIVE LEVELS OF RAID

THE FIVE LEVELS OF RAID

Mike Wiebel and Steve Johnson

Mike is a development engineer for StorageTek's Louisville Modeling Group. He studies the performance of advanced disk and tape system architectures through simulation and can be reached via Internet at [email protected] or at 303-673-7082. Steve manages the Longmont Modeling Group for StorageTek. He is involved in the simulation of sophisticated real-time systems and teaches programming at the University of Colorado. Steve can be contacted at StorageTek.

In the competitive world of sophisticated, real-time fault-tolerant systems, it is imperative to know what kind of performance a system will give before proceeding with its development. Imagine investing in a new computer system only to discover that it doesn't perform any better than the old one! Fortunately, programmers can use discrete-event simulation to model both existing and projected real-time system performance, thus saving development time and money. This article describes a revolutionary concept in data storage and retrieval known as RAID (Redundant Arrays of Inexpensive Disks) and shows how discrete-event simulation can be used not only to predict the performance of different RAID architectures, but also to project performance over a product's life cycle.

RAID

In the last five years, CPU speeds have dramatically increased, memory costs have significantly dropped, and storage capacity has more than doubled. The mechanical magnetic disk device access times, dominated by seek and latency delays, have improved at a snail's pace compared to electronic device speed improvements. From 1971 to 1981, the raw seek time for a high-end IBM disk improved by a factor of only two, while the rotation time did not change. Clearly, users are paying for speedy CPUs that spend too much time waiting for I/O devices. RAID meets the challenge of improving I/O performance by improving the disk subsystem back-end bandwidth while simultaneously improving the reliability of the disk storage subsystem.

RAID technology is comprised of a disk subsystem based on the smaller, less expensive disks developed for the PC. Several small disks grouped together into an array form the virtual image of a larger, more expensive disk. Because the failure rate of small disks is higher than that of large disks, the data must be protected to guard against its loss. Several methods exist for protecting the data in the subsystem, from simple mirroring to more sophisticated error correction codes.

Five Levels of RAID

Researchers at the University of California, Berkeley published a few years ago a paper describing five classifications of RAID. Each RAID level offers different advantages in reliability and performance. Combining many small disks forms an array of disks. Dividing this array into "parity" groups improves the system performance by reducing error recovery time and the length of disk accesses needed to perform an I/O. A parity group contains data disks and additional data check disks that contain redundant data. The redundant information on the backup disk recreates the lost data when a disk failure occurs. To speed recovery time, the backup disk is "hot," residing in the subsystem and activating immediately after failure occurs. The failed disk's replacement then becomes the new hot backup disk. Every RAID level offers its own performance or reliability advantages. An overview of the five levels of RAID points out the advantages of each architecture.

The first level of RAID (shown in Figure 1) is a mirrored disk architecture, in which every data disk has a duplicate backup disk. Synchronously writing data to both disks ensures the availability and reliability of the data when disk failure occurs. In a pure mirrored system, the check disks are invisible to the software. Adding additional disk controllers and data paths makes these disks visible and effectively increases the bandwidth of the subsystem. This allows read operations to occur in parallel, slightly increasing subsystem performance. Still, on the average, multiple arms seeking to the same cylinder and waiting for the specified sector to rotate underneath the head take longer than a single arm performing the same task, as the entire process waits for the slowest disk. The real value of level one lies in its ability to protect valuable data. The tremendous reliability advantage of mirrored disks has a significant price, as half the storage space is effectively unused. This expense was the driving force behind other RAID organizations.

The Hamming code and other forward error detection/correction codes are veteran performers in the battle against data transmission errors. Second level RAID architectures (see Figure 2) apply Hamming code principles to the disk array to reduce the overhead of data reliability and allow for some error correction without retransmission delays. The Hamming code associates even-parity bits with unique combinations of data bits. By bit-interleaving the data across the disks of a group and adding a sufficient number of check disks, this technique can be duplicated in a RAID architecture. According to Hamming code principles, ten data disks require four check disks, while 25 data disks require five. There is a significant performance advantage in simultaneously streaming data from multiple drives. Level-two RAID outperforms level one in large write and read-modify-write transactions and falls slightly behind level one in doing large reads. Level-one architectures outclass level two in all small data transfers. This is due to level two's need to access all disks in a group to modify part of a sector, causing a reduction in the number of small transactions processed in parallel. Level-two architectures are more appropriate for supercomputers than intensive transaction-oriented applications.

Level-three RAID (single check disk per group) eliminates most of the overhead associated with error detection. Level two used parity checking to detect the erroneous disk and correct it. For a disk subsystem, this is grossly redundant, because the disk controller can determine if its disk failed. A single check disk recovers the lost data when level three determines a disk has failed (see Figure 3). Level-three RAID does not perform better than a level-two architecture; its advantage is the greater storage capacity brought about by the reduced overhead. On a per-disk basis, level three is slightly better than level two because fewer check disks lead to better disk performance. Level three is a poor performer in intensive, transaction-oriented applications for the same reasons as level two; dismally small I/O capabilities.

In levels two and three, the data is bit-interleaved across the disks in the group. The Hamming code can then be utilized in the detection and correction of errors. Level-three RAID advances the technology by using the disk controller to detect and correct errors. The inability to do more than one I/O at a time handicaps level-two and level-three RAIDs. Level four (see Figure 4) stores data in block interleave fashion (a single transfer unit in a single sector), enabling multiple data reads in the same group at one time. Additionally, a single read only accesses one disk, so the entire transfer takes place at device speeds. Small writes are expensive in this architecture, due to the number of disk accesses required. This small data transfer requires at least four moves: a reread of the old data and old parity, and the writing of the new data and new parity. All writes must access the check disk, so this clearly becomes the bottleneck in this architecture.

Fifth-Level RAID: No Single Check Disk

The level-five architecture (Figure 5) distributes check information across all disks, giving level five the ability to do parallel writes. Writes to different sectors are accompanied by check information written to different disks. This removes the bottleneck of a single check disk. The improvement in small read-modify-write events makes level five a viable alternative in transaction-processing orientations. Level five also inherits the excellent large read/write performance level achieved by previous architectures, making it ideal for supercomputer activities as well. Another improvement, not specific to level five, can be made in the disk subsystem. Adding buffers to each drive and increasing the size of the smallest transfer unit to the size of a physical track renders the time lost to rotational delays insignificant. Instead of waiting for the requested sector to rotate into position, begin transfer with the next sector and transfer the entire track, making allowances for data received out of order. Another advantage of this is that if the next sector requested was on the track just read and still resides in a buffer, no disk access is needed to obtain that sector.

Study

StorageTek, the company we work for, is currently developing a RAID device and is successfully applying discrete-event simulation to the development process. By modeling subsystems, we can determine their potential benefits during the development phase and project the effects of design decisions. The StorageTek Corporate Modeling Group develops its models in C and CSIM, a portable C function library that supports discrete-event simulation constructs such as multiple threads of execution. Because CSIM is a C function library, C programmers can create models without having to struggle with unknown languages and support tools. Using C in discrete-event simulation modeling lets us take advantage of the language's power, portability, and efficiency.

Developing models of RAID levels one, three, and five and comparing the results to those of a typical disk-device model demonstrates the usefulness of discrete-event simulation in the development of real-time systems. Of the five architectures discussed here, levels one, three, and five hold the most promise. Running simulations of each during sub-system backup aids in demonstrating their potential. Even in RAID architectures, backup is necessary due to the potential hazards associated with human interaction (delete all, for example). In mainframe environments, MIS managers try to shorten the window of down time due to backups, allowing them to maximize their multimillion dollar investment for commercial applications. Disk performance is the limiting factor in performing backups and RAID technology significantly reduces backup time requirements.

Model Descriptions

SLED (Single Large Expensive Disk) simulates the performance of a typical single large expensive disk, such as the IBM 3390, during backup. The code implementing SLED is shown in sled.c (Listing One, page 78). The system boundaries for this experiment extend only around the disk subsystem, its controller, and the access to these devices. This simulation has one transaction that enters the subsystem, reads a track from the device, and sends the data out through the bus to the control unit and out through the channel toward the host. The transaction continues to read tracks until it finishes backing them all up. In the model, LATENCY defines the time to read one track from the device, because the maximum rate at which a track can be read from the disk is the time it takes the disk to rotate once. Predicting the finish time analytically using the formula Backup time = number_cylinders * track_per_cylinder * LATENCY gives us an expected time of 7.9 minutes. The simulation, which also includes latency delays incurred during cylinder switching, predicts a finish time of 8.09 minutes, well within range of the analytical prediction. Total device capacity is 1.90 gigabytes, so the device backs up at a rate of 3.9 Mbytes per second.

Lvl1.c (Listing Two, page 78) is a simulation model of a level-one subsystem. It involves two identical devices in a backup simulation, one a mirror of the other. Adding additional controllers and channels to the subsystem allows access to different tracks simultaneously, aiding in the realization of the subsystem's potential. In this model, each device has its own thread of execution operating upon it. Each thread reads the next track to be backed up and sends it up through the control unit toward the host. As in sled.c, the rotational speed of the device limits the transfer times. The advantage lies in this subsystem's ability to transfer two tracks simultaneously, nearly doubling the back-up rate of sled.c. Total device capacity is 1.90 gigabytes as in sled.c, but increased bandwidth on the back end reduces backup time to 4.48 minutes. RAID level one backs up at a rate of 7.06 Mbytes per second, an 80 percent improvement over the typical device.

Lvl3.c (Listing Three, page 79) simulates a level-three implementation of RAID. This model has a data group of four disks with one check disk. The disks are smaller and slightly slower than the disks used in the previous models but their combined storage capacity is nearly three times that of sled.c and lvl1.c. The model operates along the same principles as the previous simulations. The model generates a separate thread of execution for each disk device in the subsystem. These threads then access the track to be backed up and transfer the data toward the host. Lvl1.c has more detail at the control unit level. The typical disk (SLED) and level one have one control unit for each storage device. The RAID architectures used here have one main control unit that oversees activity on all disks in the array. Previous models used 4.5-Mbytes-per-second channels to transfer the data from the control unit to the host. These models had one channel for each disk device. Level three uses only two channels to transfer information to the host. Replacing the 4.5-Mbyte channels with 10-Mbyte channels prevents the transfer to the host from becoming a bottleneck. A transaction must wait at the control unit for a channel to become available before transferring the track toward the host. One of the five devices contains only redundant check data, so these devices contain 5.77 gigabytes of data transferred in 12.55 minutes--a rate of 7.66 Mbytes per second. At this point, the need for buffering at the control unit level becomes clear. By buffering the tracks at the control-unit level it is possible to improve the backup speed. In this simulation, the same amount of data can be transferred in 10.15 minutes, or 9.47 Mbytes per second. This a significant improvement over both the typical device and the level-one architecture.

The level-five advantage is parallelism: More than one transfer can take place at a time. The concept of "staging" takes advantage of this new capability. Reading data from the RAID device causes the loading, or staging, of the data next to the desired data. If the next piece of information called for resides in the buffer, then it eliminates a disk access and the transfer takes place at channel speeds. In lvl5.c (Listing Four, page 80), two separate operations take place concurrently. A staging operation joins the backup operation. The backup operation performs as before, obtaining data (tracks) and moving it through the system toward the host. The staging operation is constantly monitoring the control-unit buffer to decide if another track should be staged. For this operation, when the track lies in the buffer device, speeds are no longer the limiting factor; the track transfers at channel speeds. Factors that affect the performance of the backup operation are group size, buffer size, and the number of channels. Increasing the group size and the buffer size reduces the time spent waiting for data from the device. If the data is on one device, the speed of the transfer is the speed of the device. By spreading the data out over many devices, the speed of the transfer is the combined power of the devices. This concept is known as "scaling." A level-five architecture simulation with a group size of 50 devices using staging achieved a backup rate of 20 Mbytes per second, a 150-percent improvement over a staged level five with a group size of five devices.

Conclusion

Discrete-event simulation greatly simplifies the prediction and comparison of the five levels of RAID. Through discrete-event simulation, this article examined the performance capabi ities of three levels of RAID architectures and compared the results with a model of a SLED device performing the same operation. Simulation makes it possible to determine the effect of channel speed improvements, increased group sizes, greater track capacities, and many important known and unknown design decisions. The effect of many of these changes can be seen here using the simplified models presented in this article; most modifications are as easy as changing a variable.

Simulation used to be an expensive luxury because of the long runtimes and enormous memory requirements of the models. High-powered languages such as C and C++, the plummeting price of memory, and the increasing speed of microprocessors all add up to make simulation an increasingly available design tool. The use of simulation enables an engineer to find the strengths and weaknesses of concepts such as RAID, avoiding the pitfalls before it's too late.

Acknowledgments

The authors would like to recognize the research efforts of David A. Patterson, Garth Gibson, and Randy H. Katz and UC Berkeley's department of Electrical Engineering and Computer Science. It is through their efforts that RAID technology is becoming a reality.

References

Hamming, R.W. "Error Detecting and Correcting Codes." The Bell System Technical Journal (April, 1950).

Harker, J.M. "A Quarter Century of Disk File Innovation." IBM Journal of Research and Development (September, 1981).

Patterson, D.A., G.A. Gibson, and R.H. Katz. "A Case for Redundant Arrays of Inexpensive Disks (RAID)." ACM SIGMOD 88 (Chicago, 1988).


_THE FIVE LEVELS OF RAID_
by Mike Wiebel and Steve Johnson

[LISTING ONE]




 /* sled.c */

#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x)        ((TIME)((x)))                   /* 1.0e-05 second */
#define MSEC(x)         ((TIME)(100.0 * (x)))           /* milli second */
#define SEC(x)          MSEC((1000.0 * (x)))            /* second */
#define MINUTE(x)       SEC((60.0 * (x)))               /* minute */
#define HOUR(x)         MINUTE((60.0 * (x)))

#define NUM_DISKS   1   /* num. disks in simulation */
#define NUM_FE      1L   /* num. front end channels */
#define NUM_BE      1L   /* num. back end channels */
#define SEEK_TIME   3.83    /* avg. num. of milliseconds spent seeking */
#define LATENCY      14.22   /* milliseconds required for one rotation */
#define NUM_CYLINDERS   1113   /* num. cylinders on disk */
#define TRACKS_PER_CYL   15    /* num. tracks per cylinder */

SS_PTR    cu;      /* CSIM control unit pointer       */
SS_PTR    disk_drive;   /* CSIM disk drive pointer         */
MS_PTR    fe_bus;      /* CSIM front end channel pointer  */
MS_PTR  be_bus;      /* CSIM back end channel pointer   */

static long track=0L;   /* current track backing */
static int cylinder=0;  /* current cylinder backing */
static long tot_tracks=0L; /* total tracks backed */

void backup_status(done)
int *done;
{

   /* determine if this disk drive is backed up. */

   track++;
   tot_tracks++;
   if (tot_tracks == NUM_DISKS * NUM_CYLINDERS * TRACKS_PER_CYL)
      *done = 1;
}

ECODE seek()
{
long   num_fe; /* num. front end channels available */

   if(track == 1){   /* first track requires seek */
      XacAdvExponen(MSEC(SEEK_TIME));
   }
   if(track == TRACKS_PER_CYL){
      XacAdvance(MSEC(LATENCY));/* switch cylinder and rotate */
      track = 0;
      cylinder++;
   }

   /* if channel is available transmit data; else rotational delay */

   MAvail(fe_bus,&num_fe);
   while(!num_fe){
      XacAdvance(MSEC(LATENCY));
      MAvail(fe_bus,&num_fe);


                /* sled.c - 2 */
   }
   return(SUCCESS);
}

ECODE send_data()
{
double xfr_time;

   /* transfer data back to host */

   xfr_time = LATENCY;
   MSeize(fe_bus,1L);
   XacAdvance(MSEC(xfr_time));
   MRelease(fe_bus,1L);
   return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
int    done=0;
int    i;

   /* configure system and set up CSIM statistics */

   SServer(&cu,"control unit statistics");
   SServer(&disk_drive,"disk drive statistics");
   MServer(&fe_bus,NUM_FE,"front end channels");
   MServer(&be_bus,NUM_BE,"back end channels");

   /* set simulation time */

   SimWarmUp(0,0);
   SimRun(MINUTE(30),1);

   /* simulate backup */

   SSeize(disk_drive);
   SSeize(cu);
   MSeize(be_bus,1L);
   for(i=0; i<NUM_DISKS; i++){
      do{
         backup_status(&done);
         assert(seek() == SUCCESS);
         assert(send_data() == SUCCESS);
      }while(!done);
      done = 0;
      track = 0;
   }
   printf("Backup complete\n");
   SimPrint();
   SRelease(cu);
   MRelease(be_bus,1L);
   SRelease(disk_drive);
   XacTerminate();

}

[LISTING TWO]



 /* RAID level 1 */
#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x)        ((TIME)((x)))                   /* hund.milli second */
#define MSEC(x)         ((TIME)(100.0 * (x)))           /* milli second */
#define SEC(x)          MSEC((1000.0 * (x)))            /* seconds */
#define MINUTE(x)       SEC((60.0 * (x)))               /* minutes */
#define HOUR(x)         MINUTE((60.0 * (x)))

#define NUM_DISKS       1   /* one disk; other disk is mirror */
#define NUM_FE          2L      /* num. front end channels */
#define NUM_BE          2L      /* num. back end channels */
#define NUM_CU      2L   /* num. of control units */
#define SEEK_TIME       3.83    /* avg. num. of milliseconds spent seeking */
#define LATENCY         14.22   /* avg. num. milliseconds spent seeking */
#define NUM_CYLINDERS   2226
#define TRACKS_PER_CYL  15

MS_PTR  cu;             /* CSIM control unit pointer */
MS_PTR  disk_drive;     /* CSIM disk drive pointer   */
MS_PTR  fe_bus;         /* CSIM front end channel pointer */
MS_PTR  be_bus;         /* CSIM back end channel pointer */
Q_PTR   tsend;

static long track=0L;
static int curr_cylinder=0;

void backup_status(done)
int *done;
{
        /* determine if this disk drive is backed up. */

   track++;
   if (track == TRACKS_PER_CYL){
      track = 0;
      curr_cylinder++;
   }
        if (curr_cylinder == NUM_CYLINDERS * NUM_DISKS)
                *done = 1;
}

ECODE seek(cylinder_backing)
int *cylinder_backing;
{
long    num_fe;

        if(curr_cylinder == 0 && track == 1){    /* first track requires seek */
                XacAdvExponen(MSEC(SEEK_TIME));
        }
        if(*cylinder_backing != curr_cylinder){
                XacAdvance(MSEC(LATENCY));
      *cylinder_backing = curr_cylinder;
        }
        return(SUCCESS);
}




                   /*   lvl1.c - 2   */

ECODE send_data()
{
double xfr_time;


   /* transfer data back to host */
   xfr_time = LATENCY;

   MSeize(fe_bus,1L);
        XacAdvance(MSEC(xfr_time));
   MRelease(fe_bus,1L);
        return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
XAC_PTR xp;
BOOLEAN orig;
static int done=0;
int cylinder_backing=0;
int i;

/* configure system and set up CSIM utilities */
   MServer(&disk_drive,2L,"Disk drive server statistics");
   MServer(&cu,NUM_CU,"Control unit server statistics");
   MServer(&fe_bus,NUM_FE,"front end channel statistics");
   MServer(&be_bus,NUM_BE,"back end channel statistics");
   Queue(&tsend,"Time to send data");
 /* set simulation time */
        SimWarmUp(0,0);
        SimRun(MINUTE(17),1);
/* simulate backup */
   XacSplit(&xp,&orig);   /* allow each disk subsys to transfer */
   MSeize(cu,1L);
   MSeize(be_bus,1L);
   MSeize(disk_drive,1L);
        for(i=0; i<NUM_DISKS; i++){
                do{
                        assert(seek(&cylinder_backing)==SUCCESS);
         QEnter(tsend);
                        assert(send_data()==SUCCESS);
         QLeave(tsend);
                        backup_status(&done);
                }while(!done);
                done = 0;
                track = 0;
        }
   SimPrint();
        MRelease(cu,1L);
        MRelease(be_bus,1L);
        MRelease(disk_drive,1L);
        XacTerminate();
}

[LISTING THREE]



 /* RAID level 3 */

#include <stdio.h>
#include <csimdefs.h>

#define HMSEC(x)        ((TIME)((x)))                   /* 1.0e-05 second */
#define MSEC(x)         ((TIME)(100.0 * (x)))           /* milli second */
#define SEC(x)          MSEC((1000.0 * (x)))            /* second */
#define MINUTE(x)       SEC((60.0 * (x)))               /* minute */
#define HOUR(x)         MINUTE((60.0 * (x)))

#define GRP_SIZE   4   /* no. data disks in parity group */
#define NUM_CHECK   1   /* no. check disks in group */
#define NUM_CHAN   2L   /* no. of channels */
#define NUM_BUSS   5L   /* no. busses */
#define SEEK_TIME       11.5    /* avg. num. of milliseconds spent seeking */
#define LATENCY         16.66   /* milliseconds required for one rotation */
#define TRACK_CAPACITY  40.0    /* capacity in KB */
#define XFR_RATE       10000.0  /* KB transferred per sec */
#define NUM_CYLINDERS   1900   /* no. useable cylinders on device */
#define TRACKS_PER_CYL  19   /* no. tracks per cylinder */
#define SWITCH_TIME    4   /* milliseconds */

SS_PTR    raid_cu;   /* CSIM RAID control unit pointer */
MS_PTR   data_disk;   /* CSIM array of data disks pointer */
SS_PTR   check_disk;   /* CSIM check disk pointer */
MS_PTR   buss;      /* CSIM array of busses pointer */
MS_PTR   channel;   /* CSIM array of channels pointer */


ECODE seek(track_tally,original,track_count)
long track_tally;
BOOLEAN   original;
int *track_count;
{

   if (original){
      SSeize(check_disk);
   }
   else{
      MSeize(data_disk,1L);
   }
   if (!track_tally){
      XacAdvExponen(MSEC(SEEK_TIME));
   }
   if (!(*track_count) && track_tally){/* switch cylinders */
      XacAdvance(MSEC(SWITCH_TIME));
   }
   (*track_count)++;
   if (*track_count == TRACKS_PER_CYL)
      *track_count = 0;

   if (original){
      SRelease(check_disk);
   }
   else{
      MRelease(data_disk,1L);
   }
   return(SUCCESS);
}

                 /* lvl3.c - 2  */

ECODE transfer()
{
double xfr_time;
XAC_PTR   xp;
BOOLEAN orig;

   /* send data to control unit */

   MSeize(buss,1L);
   XacAdvance(MSEC(LATENCY));
   MRelease(buss,1L);

   /* send data to host */

   XacSplit(&xp,&orig);
   if (!orig){
      xfr_time = TRACK_CAPACITY/XFR_RATE;
      MSeize(channel,1L);
      XacAdvance(SEC(xfr_time));
      MRelease(channel,1L);
      XacTerminate();
   }
   return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
static    int    done=0;      /* flag indicating completion */
int    track_count=0;
long    track_tally=0;      /* count of tracks backed up */
long    backup_goal;       /* total number of tracks to back up */
int    i;         /* counter */
XAC_PTR xp;         /* CSIM transaction pointer */
BOOLEAN   original;      /* flag indicating presence of original xac */

   /* configure system and set up CSIM */

   SServer(&raid_cu,"Control unit server statistics");
   SServer(&check_disk,"check disk statistics");
   MServer(&data_disk,GRP_SIZE,"data disk statistics");
   MServer(&buss,NUM_BUSS,"buss statistics");
   MServer(&channel,NUM_CHAN,"channel statistics");

   /* set simulation period */

   SimWarmUp(0,0);
   SimRun(MINUTE(30),1);

   for(i=0; i<GRP_SIZE; i++){        /* generate xac for each disk */
      XacSplit(&xp,&original);
      if (!original)
         break;
   }

   /* backup subsystem */


                   /* lvl3.c - 3 */

   backup_goal = TRACKS_PER_CYL * NUM_CYLINDERS;
   while(!done){
      seek(track_tally,original,&track_count);
      transfer();
      track_tally++;
      if (track_tally == backup_goal){
         done = 1;
         SimPrint();
      }
   }
   XacTerminate();
}

[LISTING FOUR]



 /* RAID level 5 */

#include <stdio.h>
#include <assert.h>
#include <csimdefs.h>

#define HMSEC(x)        ((TIME)((x)))                   /* 1.0e-05 second */
#define MSEC(x)         ((TIME)(100.0 * (x)))           /* milli second */
#define SEC(x)          MSEC((1000.0 * (x)))            /* second */
#define MINUTE(x)       SEC((60.0 * (x)))               /* minute */
#define HOUR(x)         MINUTE((60.0 * (x)))

#define GRP_SIZE   50   /* no. data disks in parity group */
#define NUM_CHAN   2L   /* no. of channels */
#define NUM_BUSS   50L   /* no. busses */
#define SEEK_TIME       11.5    /* avg. num. of milliseconds spent seeking */
#define LATENCY         16.66   /* milliseconds required for one rotation */
#define TRACK_CAPACITY  40.0    /* capacity in KB */
#define XFR_RATE       10000.0  /* KB transferred per sec */
#define NUM_CYLINDERS   1900   /* no. useable cylinders on device */
#define TRACKS_PER_CYL  19   /* no. tracks per cylinder */
#define SWITCH_TIME    4   /* milliseconds */
#define BUF_SZ      50   /* number of tracks buffer will hold */

SS_PTR    raid_cu;   /* CSIM RAID control unit pointer */
SAS_PTR   data_disk;   /* CSIM array of data disks pointer */
MS_PTR   buss;      /* CSIM array of busses pointer */
MS_PTR   channel;   /* CSIM array of channels pointer */
long    buffer[BUF_SZ]; /* RAID control unit buffer */
long    update[BUF_SZ]; /* RAID buffer lock */
static  long last_buf=0L; /* last track written to buffer */

void buffer_search(track,bingo,slot)
long track;
int *bingo;
int *slot;
{
int i;

        *bingo = 0;
        for(i=0; i<BUF_SZ; i++)
                if(buffer[i] == track){
                        *bingo = 1;
                        *slot = i;
                }
}

ECODE buf_get(slot)
int slot;
{
double   xfer_time;
XAC_PTR xp;
BOOLEAN orig;

   xfer_time = TRACK_CAPACITY / (double)XFR_RATE;
   XacSplit(&xp,&orig);
   MSeize(channel,1L);
   XacAdvance(SEC(xfer_time/2.0));
   MRelease(channel,1L);


                   /* lvl5.c - 2 */
   if(!orig){
      XacTerminate();
   }
   buffer[slot] = -1;   /* mark slot as writeable */
   return(SUCCESS);
}

int find_track(track)
long track;
{
int number;

        number = track;
        while(number >= GRP_SIZE)
                number -= GRP_SIZE;
   return(number);
}

ECODE track_get()
{

   MSeize(buss,1L);
   MSeize(channel,1L);
   XacAdvance(MSEC(LATENCY));
   XacAdvance(MSEC(1));
   MRelease(channel,1L);
   MRelease(buss,1L);
   return(SUCCESS);
}

ECODE update_buf(track)
long track;
{
XAC_PTR   xp;
BOOLEAN   orig;
static int i;
static int j;
int slot;
long next;
int d_num;

   j = 0;
   for(i=0; i<BUF_SZ; i++){
      XacSplit(&xp,&orig);
      if(!orig) break;
   }
   if(!orig){
      slot = j++;
      if(buffer[slot] < track && update[slot] != 1){
         update[slot] = 1;
         next = last_buf++;
         d_num = find_track(next);
         SASeize(data_disk,d_num);
         XacAdvance(MSEC(LATENCY));
         SARelease(data_disk,d_num);
         buffer[slot] = next;
         update[slot] = 0;
      }

             /* lvl5.c - 3 */

      XacTerminate();
   }
   return(SUCCESS);
}

int sim_main(argc, argv)
int argc;
char *argv[];
{
XAC_PTR xp;         /* CSIM transaction pointer */
BOOLEAN   original;      /* flag indicating presence of original xac */
static  int done=0;
static  long track=0L;
int    disk_num;
int    i;
int   slot;
int   bingo;
BOOLEAN avail;

   /* configure system and set up CSIM */

   SServer(&raid_cu,"Control unit buffer statistics");
   SArray(&data_disk,GRP_SIZE,"data disk statistics");
   MServer(&buss,NUM_BUSS,"buss statistics");
   MServer(&channel,NUM_CHAN,"channel statistics");

   for(i=0; i<BUF_SZ; i++){ /* mark buffer as empty */
      buffer[i] = -1;
      update[i] = 0;
   }

   /* set simulation period */

   SimWarmUp(0,0);
   SimRun(MINUTE(30),1);

   while(!done){
      XacSplit(&xp,&original);
      if(original){
         buffer_search(track,&bingo,&slot);
         if(bingo){
            assert(buf_get(slot) == SUCCESS);
         }
         else{
            disk_num = find_track(track);
            SAAvail(data_disk,disk_num,&avail);
            SASeize(data_disk,disk_num);
            buffer_search(track,&bingo,&slot);
            if(bingo){
               assert(buf_get(slot) == SUCCESS);
            }
            else{
               assert(track_get() == SUCCESS);
            }
            SARelease(data_disk,disk_num);
         }


                       /* lvl5.c - 3 */

         if (track == GRP_SIZE * TRACKS_PER_CYL * NUM_CYLINDERS)
                                done = 1;
         track++;
      }
      else{            /* maintain buffer */
         assert(update_buf(track) == SUCCESS);
         XacTerminate();
      }
   }
   printf("Backup complete\n");
   SimPrint();
}