Implementing the Rhealstone Real-Time Benchmark

It's been over a year since DDJ first introduced the Rhealstone, a set of benchmarking operations for real-time multitasking systems. Robin presents the "refined" definition, along with a suite of C programs to implement the benchmark.

April 01, 1990
URL:http://www.drdobbs.com/cpp/implementing-the-rhealstone-real-time-be/184408332

Figure 2

APR90: IMPLEMENTING THE RHEALSTONE REAL-TIME BENCHMARK

IMPLEMENTING THE RHEALSTONE REAL-TIME BENCHMARK

Where a proposal's rubber meets the real-time road

Rabindra P. Kar

Robin is a senior engineer with the Intel Systems Group and can be reached at 5200 N.E. Elam Young Parkway, Hillsboro, OR 97124-6497.

In February 1989, the late Kent Porter and I proposed a set of benchmarking operations for real-time multitasking systems (see "Rhealstone: A Real-Time Benchmarking Proposal," DDJ, February 1989). That article generated a lot of interest and many valuable suggestions from DDJ readers, as well as others in the real-time software community. The reader response made it possible for us to refine and clarify several important aspects of the original benchmark proposal. I am very grateful to those of you who shared your insights with us.

This article presents the refined definition of the Rhealstone benchmark. It also contains a suite of C programs that implement the benchmark under iRMX, a real-time operating system from Intel. Refer to the original proposal for the rationale behind proposing Rhealstones in the first place, and for background information on the real-time multitasking operations that comprise it.

First, I'll give a quick summary of what the Rhealstone benchmark is and what it seeks to measure. The benchmark identifies the execution times (or time delays) associated with six operations that are vital indicators of real-time multitasking system performance. These six operations are named "Rhealstone components." When a real-time system is benchmarked, each of these components is measured separately. The empirical results are combined into a single figure of merit (Rhealstones per time unit). There are two ways of calculating the Rhealstone number: One of them generic and one of them appropriately weighted for a particular type of application (an application-specific Rhealstone).

Rhealstones are intended to provide a standard of comparison between real-time computers across the industry. Hence, their specification is: a. Independent of the features found in any CPU; b. Independent of any computer bus architecture; c. Independent of the features or primitives of any operating system or kernel (collectively referred to hereafter as real-time executives).

More Details.

The C language implementation of the benchmarks is, of course, specific to an operating system (iRMX in this case). However, the OS-specific part of the code is confined to a few system calls (to create tasks, put them to sleep, accept/relinquish semaphores, and so on) that are found in almost every multitasking executive. The C benchmark source is easily portable to most other executives if the iRMX system calls in it are replaced by the equivalent system calls of the target executive.

When a computer is used in a real-time-control situation, the solution usually fits the model described in Figure 1.

Figure 1: Typical real-time control model

  Real-time     Real-time       Human-machine and/or
  computer  +  application  +  machine-machine       =  Real-time solution
  system        code            interface(s)

"Real-time computer system" refers jointly to the CPU and system software (OS, kernel, or combination thereof) that provide a base execution vehicle for the application software. The Rhealstone benchmark helps the real-time solution designer choose the highest-performance real-time computer available as the execution base for the project. Rhealstones are not designed to measure how good the complete solution is, and they may not be an appropriate measure for the end user.

Rhealstone Components

The specifications for the six real-time operations (Rhealstone components) that comprise the benchmark are detailed in the following section. For a graphical specification of each component, refer to Figure 2 through Figure 7. Each component's specification is followed by a paragraph (or two) describing the C benchmark program used to measure the Rhealstone component on iRMX II. It is important to realize that the verbal and graphical specifications, not the C programs, are the essential core of the benchmark. It is entirely possible to obtain more accurate Rhealstone component values by using different algorithms or programming languages or special performance-analysis hardware.

The task-switch time (see Figure 2) is the average time to switch between two active tasks of equal priority. The tasks should be independent -- that is, there should not be contention between them for hardware resources, semaphores, and so on. Task switching must be achieved synchronously (without preemption) -- for example, when the running task puts itself to sleep or when the executive implements a round-robin scheduling algorithm for equal-priority tasks.

Listing One (tswit.c), page 100, is a program to measure task-switch time. The code of task1 and task2 is identical and very simple: A loop in which rqsleep gets called in every iteration. The iRMX system call rqsleep( sleep_time, return_code_pointer) lets a task put itself to sleep (suspend execution) for sleep_time system clock ticks (in iRMX, the default clock tick is 10 milliseconds long). If sleep_time is 0, the task is not necessarily put to sleep; rather, iRMX will switch execution to any other equal-priority task that is ready to run (which is why this program calls rqsleep).

The return_code_ pointer is the last parameter of every iRMX system call. It points to an unsigned variable in which iRMX places the status of the call. If the status returned is 0, the call has been executed correctly; if this is not so, the status returned is an error or warning code indicating why the call did not execute as it should have.

The call rqgettime ( return_code_pointer) returns the number of seconds that have elapsed since a fixed point in time. It is called twice, to measure the elapsed time (in seconds) between any two points in a program.

The call rqcreatetask ( priority_level, start_address, data_seg, stack_ pointer, stack_size, task_ flags, return_code_pointer) creates a new task, as its name suggests. The first parameter sets the new task's priority between 0 (highest priority) and 255 (lowest priority). The other parameters are either self-explanatory or iRMX programming details. This call returns a task_token that becomes iRMX's identifier for the task. Consequently, the rqdeletetask( task_token, return_code_pointer) call uses task_token to identify which task is to be deleted. If task_token is NULL or 0, the task deletes itself.

The rqgetpriority( task_token, return_code_pointer) call lets a task find out the priority level of any existing task in the system. If task_token is NULL or 0, the priority of the calling task itself is returned.

Finally, the call rqsetpriority( task_token, priority_level, return_code_pointer) lets a task dynamically change its own or any other task's priority. I've used it to set the main task's priority to be lower than those of task1 and task2 so those tasks can run to completion without interference from the main program.

The preemption time (see Figure 3) is the average time for a high-priority task to preempt a running low-priority task. Preemption usually occurs when the high-priority task goes from a suspended or sleeping state to a ready state because either the high-priority task wakes up from a previously initiated sleep, or some event or signal that the task was waiting for is recognized. The first case will likely yield a lower preemption time on most systems, and that value is acceptable for the Rhealstone benchmark.

Listing Two (preempt.c), page 100, measures preemption time in an iRMX II system. task1 (lower priority) merely sits in a delay loop, waiting to be preempted. task2 loops on an rqsleep call. Every time it calls rqsleep a (non-preemptive) switch to task1 takes place. When one sleep period is over, task2 wakes up and preempts task1.

Interrupt latency (see Figure 4) is the average delay between the CPU's receipt of an interrupt request and the execution of the first application-specific instruction in an interrupt-service routine. The time required to execute machine instructions that save the CPU's context (CPU's and coprocessor's data registers, mode registers, and so on) are part of the interrupt latency.

As just defined, interrupt latency reflects only the delay introduced by the executive and the CPU itself. It does not include delays that occur on the system bus or electrical delays in control circuitry external to the computer system because control circuitry is usually specific to the application.

Interrupt latency can be measured under iRMX II on a PC/AT-compatible computer using ltncy.c (Listing Three, page 100) and latch.asm (Listing Four, page 101). The benchmarking technique used here is different from the one I've used for the other Rhealstone components because interrupt latency is about an order of magnitude smaller than those components (under iRMX, hence the need for greater accuracy in measurement. The first difference is that this benchmark involves a C and an assembler program. Secondly, the latency is measured by reading the 8254 timer chip directly (bypassing iRMX). Of course, this makes the benchmark hardware-dependent, which is the major disadvantage of this technique.

The code in ltncy.c sets up a new interrupt vector (pointing to the assembler code in latch.asm) with the rqsetinterrupt( encoded_int_level, int_task_flag, int_handler, int_handler_data_seg, return_code_pointer ) call, reads the timer chip, and simulates a hardware interrupt with an INT causeinterrupt) instruction in software. The processor vectors to latch.asm, which saves context (by pushing CPU registers on the stack) and reads the timer again. The difference in the two timer count values is used to calculate interrupt latency in microseconds.

The call rqresetinterrupt(encoded_interrupt_level, return_code_pointer) merely restores the interrupt vector to its default value (before the rqsetinterrupt call).

Semaphore-shuffle time is the delay/overhead, within the executive, before a task acquires a semaphore that is in the possession of another task when the acquisition request is made. Figure 5 illustrates what is being measured. task2 requests a semaphore that task1 owns. Semaphore-shuffle time is the delay within the executive (excluding the run time of task1 before it relinquishes the semaphore) between task2s request and its receipt of the semaphore.

The objective here is to measure overhead when a semaphore is used to implement mutual exclusion. Many real-time applications involve multiple tasks needing access to the same resource. Semaphore-based mutual exclusion is a convenient way to ensure that the resource is not interrupted by a second task before it finishes an operation started by the first task.

Listing Five (semshuf.c, page 101) shows a program that measures semaphore-shuffle time. task1 and task2 are first executed a fixed number of times, without any semaphore-related calls. Then these tasks are executed again, the same number of times, with a semaphore being shuffled between them at every iteration. The difference in execution time with and without semaphore shuffling is the overhead within the executive. This program uses the following three iRMX system calls associated with semaphores.

The call sem_token = rqcreatesemaphore( initial_value, max_value, queuing_method, return_code_pointer ) creates a new counting semaphore. A counting semaphore can be incremented up to the max_value parameter. Because this program sets max_value to 1, it is using a simple binary semaphore. The parameter queuing_method is set to 0 for a FIFO queuing scheme when more than one task is waiting on the semaphore; a value of 1 would make it a priority-based queue.

The rqsendunits( sem_token, units_sent, return_code_pointer ) call increments the value of the semaphore. In this program, the calling task uses it to relinquish the semaphore.

Finally, the call remaining_units = rqreceiveunits( sem_token, units_requested, max_wait_time, return_code_pointer ) decrements the value of the semaphore. The task makes this call to acquire the semaphore if available. The max_wait_time parameter specifies (in system clock ticks) the period it is willing to wait for the semaphore. A value of 0xffff means "wait forever."

Deadlock-break time is the average time to break a deadlock caused when a high-priority task preempts a low-priority task that is holding a resource the high-priority task needs.

Figure 6 illustrates how a deadlock situation occurs. Task 1 gains control of the resource and is preempted by medium-priority task 2. High-priority task 3 preempts task 2 and requests the resource from the executive at some point. Because task 1 still holds the resource, task 3 is now blocked. At this point, an unsophisticated executive would resume task 2. Task 2 knows nothing about the critical resource and may block higher-priority task 1 indefinitely (a deadlock situation!). A good real-time executive would not resume task 2 here (below the ? in Figure 6). To avoid the deadlock, the executive might temporarily raise task 1's priority to the same level as task 3's, until it relinquishes the resource. In any case, the benchmark measures the delay between task 3's request and its acquisition of the resource, excluding task 1's run time before it relinquishes the resource.

The program deadbrk.c (Listing Six, page 102) measures deadlock-break time in iRMX II. The algorithm is similar to the semaphore-shuffle benchmark. Three tasks of different priorities are executed a fixed number of times without competing for a critical resource. The same tasks are executed again, but this time task1 and task3 both access the same resource, with a potential deadlock situation occurring in each iteration. The difference in total execution time between the two cases is a measure of the deadlock break time.

Access to a critical resource is guarded by an iRMX object called a "region." The region is created by the region_token = rqcreateregion( queuing_method, return_code_pointer ) system call. The queuing_method parameter has the same function here as in the rqcreatesemaphore call. Because only one task should access the critical resource at a time (mutually exclusive access), each task waits at the rqreceivecontrol(region_token, return_code_pointer) call until the region is free. The task must relinquish control with the rqsendcontrol ( return_code_pointer ) call when it has finished with the resource.

Intertask message latency is the latency/delay within the executive when a nonzero-length data message is sent from one task to another (see Figure 7 ). To best measure intertask message latency, the sending task should stop executing immediately after sending the message and the receiving task should be suspended while waiting for it.

The message-passing mechanism must obey two important conditions: First, the intertask message-passing link must be established at run time. (Passing data in a predefined memory area, such as a global variable, is not permitted.) Second, if multiple messages are sent on the same link, the sending task must not be allowed to overwrite an old message with a new one before the receiving task gets a chance to read it. Multitasking executives typically offer mechanisms such as pipes, queues, and stream files for intertask data communications.

The program it_msg.c (Listing Seven, page 104) measures message latency in iRMX II. Data messages are passed between tasks in an iRMX mailbox. The mailbox is created by the mailbox_token = rqcreatemailbox( type_flags, return_cod_pointer ) system call. A task sends a data message to another task by calling rqsenddata( mailbox_token, message_pointer, message_length, return_code_pointer ).

The receiving task calls message_length = rqreceivedata( mailbox_token, receive_buffer, max_wait_time, return_code_pointer ) to receive the message, if available. If not, the task is made to wait until max_wait_time clock ticks have elapsed (if this parameter is 0xffff, the receiving task is willing to wait as long as is necessary).

Note: This Rhealstone component is a modification of "datagram throughput," which was proposed in the original article. The specification of intertask message latency partly reflects reader input (see acknowledgment 1).

Computing a Rhealstone Performance Number

Measurement of all six Rhealstone components yields a set of time values (in the tens of microseconds to milliseconds range, for most PCs). Although the individual measurements are of significance by themselves, it is useful to combine them into a single real-time figure of merit, so overall comparisons between real-time computers can be made. To get a single Rhealstone performance number, the following computational steps are necessary:

1. All Rhealstone component time values should be expressed in the same unit (seconds).

2. The arithmetic mean of the components must be computed.

3. The mean (from step 2) must be arithmetically inverted to obtain a consolidated real-time figure of merit, in Rhealstones/second.

Given the following set of measured values:

task-switch time = t1 seconds preemption time = t2 seconds interrupt latency = t3 seconds semaphore-shuffle time = t4 seconds deadlock-break time = t5 seconds intertask message latency = t6 seconds

the arithmetic average of the Rhealstone components is:

  t' = (t1 + t2 + t3 ..... + t6)/6

and the system's consolidated real-time performance number is:

1/t' Rhealstones/second

Application-Specific Rhealstones

The operational definition of Rhealstones, in the previous section, is generic for any application that may be executed on a real-time computer. It treats all the Rhealstone components as equally important parameters of real-time performance. Generic benchmarks are useful when evaluating real-time system performance without a particular application in mind.

When a real-time computer is "dedicated" to a type of application, the Rhealstone figure can be computed in a way that is appropriate to it. This performance figure is called an "application-specific Rhealstone." It gives unequal weight to different Rhealstone components because the application's performance is not influenced by all of them equally. For example, the application may be heavily interrupt-driven, and the software may not use semaphores at all. The application's designer can compute the application-specific number if he/she knows (or can estimate) the relative frequency of different Rhealstone components in the application.

The steps for computing application-specific Rhealstones are as follows:

1. Measure the individual components (t1 to t6) as before.

2. Estimate the relative frequency of each Rhealstone component's occurrence when the application is executed, and assign nonnegative real coefficients (n1 to n6) proportional to the frequencies. For example, if interrupts occur five times more often than task switches do, and semaphores and intertask communication are not used in the application code, the value of n3 should be three times the value of n1, and n4 and n6 should be set to 0.

3. Compute a weighted average of the Rhealstone components:

t' = (n1*t1 + n2*t2 + n3*t3 + .... +         n6*t6) / (n1 + n2 + n3 ... + n6)

4. Invert the average to get the result 1/t' application-specific Rhealstones/second.

The procedure for computing generic and application-specific Rhealstones outlined in this article is different from that specified in the original proposal. The original procedure specified that each Rhealstone component should be arithmetically inverted separately and the average taken thereafter. I am grateful to the many readers who wrote to point out that, with the previous procedure, a computer system with high performance in one or two components and bad performance in the others would outshine one with moderately good performance in all categories.

The revised algorithm specifies that the components be averaged first and then arithmetically inverted (see acknowledgment 2). This algorithm ensures that if a real-time system shows bad performance in even one category, its overall score will suffer badly. This is intentional, because to guarantee quick response time, moderately good performance is needed in all Rhealstone categories. In other words, a real-time system that takes several seconds to respond to an interrupt will have a low Rhealstone rating, even if it delivers microsecond-range performance when switching context or exchanging semaphores.

Acknowledgment

1. Robert Wilson, GE Huntsville, Alabama, and Ketan Sampat, Intel Hillsboro, Oregon, for their input on measurement of inter-task communication performance.

2. Marco Pellegrino, Siemens AG Munich, Federal Republic of Germany, for suggestions on computation of Rhealstone performance number.

Reader's Rhealstone Recommendations

When the original Rhealstone proposal was put forth more than a year ago, DDJ solicited comments, suggestions, and recommendations from readers. In addition to the individuals Robin has acknowledged in this article, the following readers contributed comments. If any contributors were left off this list, it is unintentional and we apologize -- please let us know who you are. The version of the benchmark presented in this article does not necessarily represent Rhealstone as it will look in years to come. We look forward to your suggestions and recommendations for this version too.

-- Eds.

Mark Smotherman, Clemson University; Glenn Yeager, Applied Integration Management Corp.; Colburn L. Norton, Baytown, Texas; Gary Osborne, Apricot Computers; Jim D. Hart, Papillion, Nebraska; John Morgan, Bellingham, Washington; G. Bruce Lott, Real Time Systems; Michael S. Sossi, Leo Burnett USA; Rudi Borth, Stratford, Ontario; Carol Sigda, Industrial Programming Inc.; Phil Daley; Hillsboro, New Hampshire; Tim Olson, Advanced Micro Devices.

IMPLEMENTING THE RHEALSTONE REAL-TIME BENCHMARK by Rabindra Kar

[LISTING ONE]



/*************************************************************************\
 *  tswit.c  -- iRMX II task switch time measurement.
 *  Compiler: iC-286 V4.1 (LARGE model). Q1 1989, by R. P. Kar
\************************************************************************/

#include <stdio.h>
#include <rmxc.h>

#define MAX_LOOPS 500000L

unsigned       el_time, pri, status;
unsigned long  strt_sec, end_sec;
selector       task1_t, task2_t;
unsigned long  count1, count2;
float          ts_time;

/* "union" used to decompose a pointer into segment:offset */
typedef struct {unsigned offset; selector sel;} ptr_s;
union { unsigned *pointer; ptr_s ptr; } ptr_u;

void task1()
{
  for (count1 = 0; count1 < MAX_LOOPS; count1++)
    rqsleep(0, &status);                    /* Task switch happens here */
  rqdeletetask(NULL, &status);                           /* delete self */
}

void task2()
{
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
    rqsleep(0, &status);                   /* Task switch happens here */
  rqdeletetask(NULL, &status);                          /* delete self */
}

/*************************  MAIN PROGRAM  *************************/

main()
{

printf("\nTask Switch measurement\n   Each task runs %D times...\n\n",
           MAX_LOOPS);

/* Measure execution time of task1 and task2 when they are executed
   serially (without task switching). */
strt_sec = rqgettime(&status);                   /* Start of timing period */

  for (count1 = 0; count1 < MAX_LOOPS; count1++)
    /* rqsleep(0, &status) */ ;
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
    /* rqsleep(0, &status) */ ;
end_sec = rqgettime(&status);                      /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec);

/* Place a pointer to any variable in union "ptr_u", so the data segment
   of this program becomes known. */
ptr_u.pointer = &status;

/* Get main program's priority level */
pri = rqgetpriority (NULL, &status);

/* Create two (identical) tasks, which just switch between themselves */
task1_t = rqcreatetask (pri+1, task1, ptr_u.ptr.sel, 0L, 512, 0, &status);
if (status != 0) printf("rqcreatetask error\n");

task2_t = rqcreatetask (pri+1, task2, ptr_u.ptr.sel, 0L, 512, 0, &status);
strt_sec = rqgettime(&status);                  /* Start of timing period */

/* Set main program's priority below task 1,2 so they run to completion */
rqsetpriority( (selector)0, pri+2, &status );
rqsleep( 0, &status );

end_sec = rqgettime(&status);                     /* End of timing period */

/* Set main program back to initial priority */
rqsetpriority( (selector)0, pri, &status );

el_time = (unsigned)(end_sec - strt_sec) - el_time;
ts_time = ( (float)el_time * 1000000.0 ) / ((float)MAX_LOOPS * 2.0) ;
printf("   Task switch time = %5.1f microseconds\n", ts_time);
dqexit(0);
}

[LISTING TWO]



/*************************************************************************\
 *  preempt.c  -- iRMX II preemption time benchmark.
 *  Measures the time for 1 preemptive task switch + 1 non-preemptive task
 *  switch. Compiler: iC-286 V4.1 (LARGE model). Q4 1989, by R. P. Kar
\*************************************************************************/

#include <stdio.h>
#include <rmxc.h>

/* NOTE: 100,000 iterations takes about 35 minutes on a 16 MHz 386 PC */
#define MAX_LOOPS 100000L

/* Note: This is a CPU-dependent value. It must be set such that
 * the execution time for this loop: for (j=0; j < ONE_TICK; j++) spare++
 * is slightly longer than one iRMX sleep period.
*/
#define ONE_TICK 4200

unsigned       pri, status, i, spare, el_time;
unsigned long  strt_sec, end_sec;
selector       task1_t, task2_t, co_conn;
unsigned long  count1, count2;
float          preempt_time;

/* "union" used to decompose a pointer into segment:offset */
typedef struct {unsigned offset; selector sel;} ptr_s;
union { unsigned *pointer; ptr_s ptr; } ptr_u;

/* The lower priority task. It sits in delay loop waiting to be preempted. */
void task1()
{
  unsigned loc_status;
  for (count1 = 0; count1 < MAX_LOOPS; count1++)
    for (i = 0; i < ONE_TICK; i++) ++spare;              /* Waste time */
  printf("deleting task 1\n\n");
  rqdeletetask(NULL, &loc_status);                       /* delete self */
}

/* The higher priority task. When it goes to sleep (once in every loop) iRMX
 * makes a non-preemptive switch to the other task; when the sleep period ends
 * this task preempts the other task.
 */
void task2()
{
  unsigned loc_status;
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
    /* When rqsleep is called, task switch to lower priority task happens.
     * When 1 clock period is over, other task is preempted and control
     * returns to the next line.
     */
    rqsleep(1, &loc_status);
  printf("\ndeleting task 2\n");
  rqdeletetask(NULL, &loc_status);                 /* delete self */
}

/*************************  MAIN PROGRAM  *************************/
main()
{

printf("\nPreemption time benchmark\n   Each task runs %D times...\n\n",
           MAX_LOOPS);

/* Measure execution time of task1 and task2 when they are executed
 * serially (without task switching or preemption).
 */
strt_sec = rqgettime(&status);                   /* Start of timing period */
  for (count1 = 0; count1 < MAX_LOOPS; count1++)
    for (i = 0; i < ONE_TICK; i++) ++spare;
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
end_sec = rqgettime(&status);                    /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec);

printf("    Execution without premption & task switching took %u seconds\n",
               el_time);

/* Place a pointer to any variable in union "ptr_u", so the data segment
   of this program becomes known.
 */
ptr_u.pointer = &status;

/* Get main program's priority */
pri = rqgetpriority (NULL, &status);

task1_t = rqcreatetask (pri+2, task1, ptr_u.ptr.sel, 0L, 512, 0, &status);
if (status != 0) printf("rqcreatetask error\n");

task2_t = rqcreatetask (pri+1, task2, ptr_u.ptr.sel, 0L, 512, 0, &status);

strt_sec = rqgettime(&status);                  /* Start of timing period */

/* Set main program's priority below task 1,2 so they run to completion */
rqsetpriority( (selector)0, pri+3, &status );
rqsleep( 0, &status );

end_sec = rqgettime(&status);                  /* End of timing period */

/* Set main program back to initial priority */
rqsetpriority( (selector)0, pri, &status );
el_time = (unsigned)(end_sec - strt_sec) - el_time;
preempt_time = ( (float)el_time / (float)MAX_LOOPS ) * 1000000.0;
printf("     Preemption time + task switch time = %5.1f microseconds\n",
                   preempt_time);
dqexit(0);
}

[LISTING THREE]



/**************************************************************************\
 *  ltncy.c -- iRMX II interrupt latency benchmarking program.
 *  Method:  This program first sets up an interrupt handler for an unused
 *    interrupt level. It then reads the count in the system timer (timer
 *    0 on the 8254 chip) and simulates an external interrupt to the CPU by
 *    a cause$interrupt instruction. The interrupt handler latches timer 0,
 *    so this program can read it again after the handler returns control.
 *    The difference in the two timer-count values is the interrupt latency.
 *                    Oct 1989, by R. P. Kar
\**************************************************************************/

#include <stdio.h>
#include <rmxc.h>

/* Define base address of 8254 (Programmable Interval Timer) chip */
#define        PIT_ADDR   0x40

unsigned       status, ticks, timer_cnt1, timer_cnt2;
unsigned       dummy_w;
unsigned char  pri, lo_cnt1, hi_cnt1, lo_cnt2;

extern void int_hndlr();

/***************************   MAIN PROGRAM   **************************/
main ()

{
printf("                   ***  WARNING ***\n\n");
printf("  This program assumes that timer and interrupt controller\n");
printf("  hardware is fully compatible with the IBM PC/AT\n\n");

/* Set up local handler for IRQ3 on master 8259 */
rqsetinterrupt( 0x38, 0, int_hndlr, (selector)0, &status );
disable();                                         /* Disable interrupts */

/* Latch and read timer 0 value. Interrupt handler will latch it again */
outbyte( PIT_ADDR + 3, 0 );

/* The following two instructions read the value latched in counter 0. They
    are unavoidable measurement overhead and inflate the interrupt latency
    by a few clock cycles.
 */
lo_cnt1 = inbyte( PIT_ADDR );
hi_cnt1 = inbyte( PIT_ADDR );

/* Activate the interrupt handler. It will latch timer 0 and return. */
causeinterrupt(59);

/* The interrupt handler has latched the timer 0 count. Now read it. */
lo_cnt2 = inbyte( PIT_ADDR );
dummy_w = (unsigned int)inbyte( PIT_ADDR );
timer_cnt2 = (unsigned int)lo_cnt2 + (dummy_w << 8);

enable();                                        /* Re-enable interrupts */

dummy_w = (unsigned int)hi_cnt1;
timer_cnt1 = (unsigned int)lo_cnt1 + (dummy_w << 8);

/* Calculate difference in timer counts (timer counts DOWN to 0) */
if (timer_cnt1 > timer_cnt2)
    ticks = timer_cnt1 - timer_cnt2;
else                           /* Rare case when timer has wrapped around */
     ticks = timer_cnt1 + (0xffff - timer_cnt2 + 1);

/* Display results */
printf("   Interrupt latency = %u timer ticks\n", ticks);

/* Note that timer is pulsed by 1.19 MHz crystal */
printf("                     = %4.1f microseconds\n\n", ((float)ticks)/1.19 );

rqresetinterrupt( 0x38, &status );
dqexit(0);
}

[LISTING FOUR]




; latch.asm -- Interrupt handler. Merely latches timer 0 in a
; PC/AT (or hardware compatible computer).

      NAME latch

latch SEGMENT PUBLIC

int_hndlr PROC FAR
PUBLIC int_hndlr
  PUSHA
  XOR  AX,AX
  OUT  43H, AL    ; Latch 8254 counter 0
  POPA
  IRET
int_hndlr ENDP
latch ENDS
      END

[LISTING FIVE]



/************************************************************************\
 *  semshuf.c -- iRMX II semaphore shuffle measurement.
 *  Measures the latency (within iRMX) for a task to acquire
 *  a sempahore that is owned by another equal-priority task.
 *  Compiler: Intel iC-286 V4.1 (LARGE model). Q3 1989, by R. P. Kar
\************************************************************************/

#include <stdio.h>
#include <rmxc.h>

#define MAX_LOOPS 100000L

enum     YESNO {NO, YES} sem_exch;
unsigned       el_time, status;
selector       task1_t, task2_t, sem_t;
unsigned char  pri;
unsigned long  count1, count2, maxloop2;
unsigned long  strt_sec, end_sec;
float          semshuf_time;

/* "union" used to decompose a pointer into segment:offset */
typedef struct {unsigned offset; selector sel;} ptr_s;
union { unsigned *pointer; ptr_s ptr; } ptr_u;

void task1()
{
  unsigned rem_units, t1_status;
  for (count1 = 0; count1 < MAX_LOOPS; count1++)
  {
    /* Task waits here until other task relinquishes semaphore */
    if (sem_exch == YES)
      rem_units = rqreceiveunits( sem_t, 1, 0xffff, &t1_status );
    rqsleep(0, &t1_status);
    if (sem_exch == YES)
      rqsendunits( sem_t, 1, &t1_status );
    rqsleep(0, &t1_status);
  }
  rqdeletetask( (selector)0, &t1_status );                 /* delete self */
}

void task2()
{
  unsigned rem_units, t2_status;
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
  {
    /* Task waits here until other task relinquishes semaphore */
    if (sem_exch == YES)
      rem_units = rqreceiveunits( sem_t, 1, 0xffff, &t2_status);
    rqsleep(0, &t2_status);
    if (sem_exch == YES)
      rqsendunits( sem_t, 1, &t2_status );
    rqsleep(0, &t2_status);
  }
  rqdeletetask( (selector)0, &t2_status );                /* delete self */
}

/*************************  MAIN PROGRAM  *************************/
main()
{

printf("\nSemaphore shuffle benchmark\n   %U shuffles...\n\n", MAX_LOOPS*2);

/* Get priority of main program */
pri = rqgetpriority( (selector)0, &status );


/* Create 2 tasks; measure their execution time WITHOUT semaphore shuffling */
 sem_exch = NO;

task1_t = rqcreatetask( pri+1, task1, ptr_u.ptr.sel, 0L, 512, 0, &status );
if (status != 0) printf("Create task error\n");

task2_t = rqcreatetask( pri+1, task2, ptr_u.ptr.sel, 0L, 512, 0, &status );

strt_sec = rqgettime(&status);                /* Start of timing period */

/* Set main program's priority below task 1,2 so they run to completion */
rqsetpriority( (selector)0, pri+2, &status );
rqsleep( 0, &status );

end_sec = rqgettime(&status);                 /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec);
printf("    Execution time without semaphore shuffle = %u secs\n",el_time);

/* Set main() back to original priority level */
rqsetpriority( (selector)0, pri, &status );

sem_t = rqcreatesemaphore( 1, 1, 0, &status );
if (status != 0) printf("Create sem error\n");

/* Re-create 2 tasks. This time they will shuffle semaphore between them. */
sem_exch = YES;

task1_t = rqcreatetask( pri+1, task1, ptr_u.ptr.sel, 0L, 512, 0, &status );
if (status != 0) printf("Create task error\n");

task2_t = rqcreatetask( pri+1, task2, ptr_u.ptr.sel, 0L, 512, 0, &status );

strt_sec = rqgettime(&status);                   /* Start of timing period */

/* Set main program's priority below task 1,2 so they run to completion */
rqsetpriority( (selector)0, pri+2, &status);
rqsleep( 0, &status );

end_sec = rqgettime(&status);                   /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec) - el_time;
printf("    %U semaphore exchanges took %u seconds\n", MAX_LOOPS*2, el_time);

semshuf_time = ( (float)el_time / ((float)MAX_LOOPS * 2.0) ) * 1000000.0;
printf("    ..... %5.1f microseconds per shuffle\n\n", semshuf_time);

dqexit(0);
}

[LISTING SIX]



/************************************************************************\
 *  deadbrk.c -- iRMX II Deadlock break-time measurement.
 *  A low, medium and high priority task is created. Deadlock occurs
 *  when the following chronological sequence happens:
 *    (1) low priority task takes exclusive control of a critical resource
 *    (2) medium or high priority task preempts it.
 *    (3) high priority task requests resource; gets suspended
 *    (4) Medium priority task runs, blocking other two tasks indefinitely
 *  This situation is handled in iRMX by acquiring a "region" before
 *  using critical resource, and relinquishing it after use. This benchmark
 *  measures the overhead involved in "breaking the deadlock".
 *  Compiler: Intel iC-286 V4.1 (LARGE model). Q3 1989, by R. P. Kar
\************************************************************************/

#include <stdio.h>
#include <rmxc.h>

#define MAX_LOOPS 10000
/* Note: This is a CPU-dependent value. It must be set such that the
 * execution time for this loop: for (j=0; j < DELAY; j++) spare++
 * is slightly longer than one iRMX sleep period.
 */
#define DELAY     4000

unsigned       el_time, spare, status;
selector       task1_t, task2_t, task3_t, region_t;
unsigned char  pri;
enum           YESNO {NO, YES} dead_brk;
unsigned long  count1, count2, count3, max_loops;
unsigned long  strt_sec, end_sec;
float          deadbrk_time;

/* "union" used to decompose a pointer into segment:offset */
typedef struct {unsigned offset; selector sel;} ptr_s;
union { unsigned *pointer; ptr_s ptr; } ptr_u;

/* Low priority task */
void task1()
{
  unsigned t1_status, j;
  while (1)
  {
    if (count1 == max_loops)
    { printf("deleting task1\n");
      rqdeletetask( (selector)0, &t1_status );            /* delete self */
    }
    /* Get control over critical region */
    rqreceivecontrol( region_t, &t1_status );

    for (j = 0; j < DELAY; j++) spare++;                   /* delay loop */

    count1++;
    rqsendcontrol( &t1_status );
  }
}

/* Medium priority task. Only uses CPU time and sleep periodically. */
void task2()
{
  unsigned j, t2_status;

  while (1)
  {
    if (count2 == max_loops)
    { printf("deleting task2\n");
      rqdeletetask( (selector)0, &t2_status );           /* delete self */
    }
    for (j = 0; j < DELAY/4; j++) spare++;                /* delay loop */

    rqsleep(1, &t2_status);
    count2++;
  }
}

/* High priority task. Potential deadlock when it tries to gain control
   of the "region" resource, because low-priority task holds region mostly.
 */
void task3()
{
  unsigned t3_status;

  while (1)
  {
    if (count3 == max_loops)
    { printf("deleting task3\n");
      rqdeletetask( (selector)0, &t3_status );            /* delete self */
    }
    rqsleep(1, &t3_status);

    /* Ask for control of the region. Relinquish control immediately after
       receiving it. If task1 is not already holding region, this should
       take very little time. Otherwise, OS must break deadlock.
     */
    if (dead_brk == YES)
    { rqreceivecontrol( region_t, &t3_status );
      rqsendcontrol( &t3_status );
    }

    count3++;
  }
}

/********************** Main program ***********************/
main( argc, argv )

unsigned argc;
char *argv[];

{
if (argc > 1)
  max_loops = (unsigned)atoi(argv[1]);
else max_loops = MAX_LOOPS;

printf("\nDeadlock break time benchmark\n   %U loops...\n\n",max_loops);

/* Get priority of main program */
pri = rqgetpriority( (selector)0, &status );

/* Create three tasks. Task1 has lowest priority, task3 has highest.
 * Measure their execution time WITHOUT deadlocks.
 */
count1 = count2 = count3 = 0;
dead_brk = NO;

task1_t = rqcreatetask( pri+3, task1, ptr_u.ptr.sel, 0L, 512, 0, &status );
if (status != 0) printf("Create task error\n");

task2_t = rqcreatetask( pri+2, task2, ptr_u.ptr.sel, 0L, 512, 0, &status );

task3_t = rqcreatetask( pri+1, task3, ptr_u.ptr.sel, 0L, 512, 0, &status );

strt_sec = rqgettime(&status);                  /* Start of timing period */

/* Set main program's priority below task 1,2,3 so they run to completion */
rqsetpriority( (selector)0, pri+4, &status );
while ( (count1 < max_loops) || (count2 < max_loops) || (count3 < max_loops) )
  rqsleep( 10, &status );

end_sec = rqgettime(&status);                  /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec);
printf("    Execution time without deadlocks = %u secs\n\n",el_time);

/* Set main() back to original priority level */
rqsetpriority( (selector)0, pri, &status );

/* Create a "region". To ensure mutually exclusive access to a critical
   resource a task must acquire the region first */
region_t = rqcreateregion( 1, &status );
if (status != 0) printf("Create region error\n");

count1 = count2 = count3 = 0;
dead_brk = YES;

/* Re-create tasks 1,2,3. Now tasks 1 & 3 will compete for region */
task1_t = rqcreatetask( pri+3, task1, ptr_u.ptr.sel, 0L, 512, 0, &status );
if (status != 0) printf("Create task error\n");

task2_t = rqcreatetask( pri+2, task2, ptr_u.ptr.sel, 0L, 512, 0, &status );

task3_t = rqcreatetask( pri+1, task3, ptr_u.ptr.sel, 0L, 512, 0, &status );

strt_sec = rqgettime(&status);                  /* Start of timing period */

/* Set main program's priority below tasks 1,2,3 so they run to completion */
rqsetpriority( (selector)0, pri+4, &status);
while ( (count1 < max_loops) || (count2 < max_loops) || (count3 < max_loops) )
  rqsleep( 10, &status );

end_sec = rqgettime(&status);                  /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec) - el_time;
printf("    %U deadlock resolutions took %u seconds\n", count3, el_time);

deadbrk_time = ( (float)el_time/(float)count3 ) * 1000000.0;
printf("    ..... %6.1f microseconds per resolution\n\n", deadbrk_time);

dqexit(0);
}

[LISTING SEVEN]



/***********************************************************************\
 *  it_msg.c -- iRMX II inter-task data message latency measurement.
 *  First run the code of two tasks serially (no messages sent). Then
 *  create two tasks and a "mailbox" and measure how much extra time is
 *  needed to send a fixed number of messages from task 1 to task 2.
 *      Compiler: iC-286 V4.1 (LARGE model). Q4 1989, by R. P. Kar
\***********************************************************************/

#include <stdio.h>
#include <rmxc.h>

#define MAX_LOOPS 200000L

unsigned long  strt_sec, end_sec;
selector       task1_t, task2_t, mbox_t;
unsigned       pri, el_time, msg_length, status;
unsigned long  count1, count2;
float          it_msg_time;
char           msg_buf[10] = "MESSAGE\0",
               recv_buf[];

/* "union" used to decompose a pointer into segment:offset */
typedef struct {unsigned offset; selector sel;} ptr_s;
union { unsigned *pointer; ptr_s ptr; } ptr_u;

/* This task sends data messages, to task 2 that is waiting to receive */
void task1()
{
  unsigned loc_status;

  for (count1 = 0; count1 < MAX_LOOPS; count1++)
  {  /* Put a serial # on the message */
     msg_buf[8] = (unsigned char)count1 / 256;
     rqsenddata( mbox_t, msg_buf, 10, &loc_status );
  }
  printf("Task 1 exiting....\n");
  rqdeletetask(NULL, &status);                           /* delete self */
}

/* This task receives the data messages */
void task2()
{
  unsigned loc_status;

  for (count2 = 0; count2 < MAX_LOOPS; count2++)
    msg_length = rqreceivedata( mbox_t, recv_buf, 0xffff, &loc_status );
  printf("    Last message received... %s %u (length %u)\n", recv_buf,
                          (unsigned)recv_buf[8], msg_length );
  rqdeletetask(NULL, &status);                          /* delete self */
}

/***************************  MAIN PROGRAM  ***************************/
/* First parameter to "rqcreatemailbox" ==> data mailbox, FIFO queues */
#define MBOX_FLAG 0x0020

main()
{
printf("   Inter-task message latency measurement\n");
printf("      Sending %D data messages...\n\n", MAX_LOOPS);

/* Set up a mailbox for inter-task data communication */
mbox_t = rqcreatemailbox( MBOX_FLAG, &status );
if (status != 0) printf("rqcreatemailbox error\n");

/* Measure serial execution time of tasks 1,2 (without messages) */

strt_sec = rqgettime(&status);                  /* Start of timing period */
  for (count1 = 0; count1 < MAX_LOOPS; count1++)
  {  /* Put a serial # on the message */
     msg_buf[8] = (unsigned char)count1 / 256;
     /* rqsenddata( mbox_t, msg_buf, 10, &loc_status ); */
  }
  for (count2 = 0; count2 < MAX_LOOPS; count2++)
    /* msg_length = rqreceivedata( mbox_t, recv_buf, 0xffff, &loc_status ) */;
end_sec = rqgettime(&status);                   /* End of timing period */

el_time = (unsigned)(end_sec - strt_sec);

/* Place a pointer to any variable in union "ptr_u", so the data segment
   of this program becomes known.
 */
ptr_u.pointer = &status;

/* Get main program's priority level */
pri = rqgetpriority (NULL, &status);

task1_t = rqcreatetask (pri+2, task1, ptr_u.ptr.sel, 0L, 512, 0, &status);
if (status != 0) printf("rqcreatetask error\n");

/* Task 2 is created with a higher priority than task 1. This ensures that if
 * it is waiting at a mailbox for a message from task 1, it will be scheduled
 * as soon as the message is sent.
 */
task2_t = rqcreatetask (pri+1, task2, ptr_u.ptr.sel, 0L, 512, 0, &status);

strt_sec = rqgettime(&status);                  /* Start of timing period */

/* Set main program's priority below task 1,2 so they run to completion */
rqsetpriority( (selector)0, pri+3, &status );
rqsleep( 0, &status );

end_sec = rqgettime(&status);                  /* End of timing period */

/* Set main program back to initial priority */
rqsetpriority( (selector)0, pri, &status );

el_time = (unsigned)(end_sec - strt_sec) - el_time;

it_msg_time =  ( (float)el_time * 1000000.0 ) / (float)MAX_LOOPS ;
printf("    Inter-task message latency + task switch time = %6.1f microsecs\n",
         it_msg_time);

/* Delete mailbox */
rqdeletemailbox( mbox_t, &status );

dqexit(0);
}