Benchmarking Real-Time Operating Systems

The benchmarks Eric describes give you an intuitive feel for real-time operating-system performance and help you match an operating system to your design requirements.

May 01, 1996
URL:http://www.drdobbs.com/embedded-systems/benchmarking-real-time-operating-systems/184409881

Figure 1

Figure 2

MAY96: Benchmarking Real-Time Operating Systems

Benchmarking Real-Time Operating Systems

Using a modified Dhrystone to represent application workload

Eric McRae

Eric, an embedded-systems consultant, can be contacted at [email protected] or by telephone at 206-885-4107.

As an embedded-systems consultant, I'm regularly confronted with questions like "Should we use a commercial operating system? If so, which one?" While I always have an opinion, I remain frustrated with the lack of useful, objective data to help answer these questions. While there are numerous test suites for real-time operating systems (RTOS), they generally don't give prospective customers an intuitive feeling for how a product will perform on their hardware. For example, knowing that Vendor A's RTOS can task switch in 26 microseconds on a 60-MHz SuperZotsCPU doesn't tell me a whole lot about whether I can use that RTOS on my 20-MHz ZotsCPU.

Consequently, designers waste time reinventing RTOSs even though they probably would use a commercial RTOS if they understood its benefits and performance drawbacks. I began to study existing benchmark work with the intent of developing a more useful set of metrics. After conducting numerous e-mail, phone, and fax conversations with other engineers and vendor representatives, I came up with the suite of benchmarks described here. I hope that your comments will lead to further refinements.

Target Class

Although they could be easily modified otherwise, these benchmarks are designed to approximate demands placed on an RTOS by what I call "medium-sized target systems." These systems have the following characteristics:

Use a 32-bit CPU.
Require 20 or fewer tasks.
Do not use dynamic task creation and destruction.
Have "sufficient" memory.
Have high-priority interrupt service requirements.
Have time-varying processing-load levels.

The tests are intended to characterize those aspects of an RTOS most important to embedded-systems designers:

Task-switch performance.
Task-priority management performance.
Memory-allocation performance.
Message-passing performance.
Interrupt latency.
Determinism (the measure of predictability of performance).

The Dhrystone Metric

While the Dhrystone benchmark is well known, it is generally not associated with measuring RTOS performance. The standard Dhrystone benchmark (available at http://www.netlib.org/benchmark/ file dhry-c) was designed to be run in an environment with an interactive user interface. The test has three execution phases:

Initialization.
Looping.
Reporting.

During initialization, the program asks users to enter the desired number of Dhrystones to execute, then records execution start time. During the looping phase, the program executes the specified number of Dhrystone loops. Upon entering the reporting phase, the execution stop time is recorded and the Dhrystone results are computed and displayed based on the time required for execution.

I use a modified version of the Dhrystone benchmark to represent the application workload for an RTOS. The Dhrystone benchmark as modified for RTOS benchmarking does not interact with a user. When the hard-coded initialization is complete, the program commences an endless Dhrystone loop, which instead of decrementing the traditional limit count, will pulse a digital "DhryPass" output. (Interrupts will be disabled before and enabled after the instructions used to pulse the output.) Additionally, the code examines the state of certain global variables and may branch depending on their state. For each test, the RTOS will be restarted with specific values placed in global variable locations. The mcp-main() function in the test suite (see Listing One) will initialize all devices and start up tests appropriate to the benchmark being run.

The Benchmark Control/Monitor hardware counts pulses on the DhryPass output for arbitrary intervals (see the accompanying text box entitled "Benchmark Laboratory Setup"). The measurement interval can be increased to any reasonable time to gain resolution. The average rate at which the output pulses reflects the speed of the hardware and the time spent executing RTOS code. The average pulse rate drops as the RTOS consumes more processing time.

The key to this approach is that the benchmark results are presented as a percentage of the CPU/hardware resource consumed by the RTOS. For example, 9 percent of the CPU is consumed by managing 20 time-sliced tasks." This ratio approach allows designers using different hardware configurations to better understand expected performance in their particular hardware configuration. The ratio is determined by running the system with the RTOS disabled to get a baseline and then running tests during which the RTOS must perform certain duties.

RTOSB Baseline

The RTOSB Baseline test establishes the baseline performance of the system. It is used to normalize subsequent test results so they are relatively independent of the hardware used for benchmarking.

The modified Dhrystone loop is allowed to run as a stand-alone function with interrupts disabled and no operating-system calls made. The number of DhryPass transitions are counted over a 20-second period.

The resulting number is the baseline Dhrystone performance of the system. The value is scaled to 100. The scale factor will be applied to the results of subsequent tests so that they can be stated in percentage of the baseline performance. The difference between subsequent results and 100 represents the percentage of the processing resource consumed by the RTOS.

The graphical representation of this benchmark is a single-valued result. The value measured here is published only for subsequent verification of results.

It is important that any RTOS benchmark be largely independent of the hardware platform used to measure performance. Since CPU performance in different platforms varies widely, benchmarks that are valid only for a specific target are fairly useless to those who are not using that platform in their projects. By measuring both baseline and burdened performance, a relative metric can be created that will hold true for any system that is not specifically designed for a particular RTOS product.

RTOSB RoundRobin

The RTOSB RoundRobin test determines the CPU resource consumed by the RTOS when it has to manage task switching. This test uses the same benchmark function as Baseline RTOSB, but with a periodic interrupt and one or more tasks ready to run at the same priority level. Control switches from task to task once every interrupt. The test will be made with 1, 5, 10, and 20 tasks, each of which will be instances of the same basic Dhrystone routine. Each task still pulses the same DhryPass output. The DhryPass output could pulse twice very quickly if a task that has just pulsed the output is preempted by a task that immediately pulses the output. The Benchmark Control/Monitor, however, can safely account for transitions occurring 2 microseconds apart. This could be substantially shorter than any task-switch time.

The length of time that a particular task has control of the CPU is determined by the period of the timer interrupt. If the interrupt period is short, the RTOS will consume more processing resources than if the period is long because of the time required to manage task switching. In order to cover a range of nominal timer-interrupt periods, the test described in the next paragraph will be run for periods of 1, 5, 10, 25, and 50 milliseconds.

The results of this benchmark show the processing resource consumed by the RTOS as a function of the number of running tasks and the task-switch rate.

According to the graph in Figure 1, 7 tasks execute in 10 milliseconds. You can assume that this RTOS will consume approximately 5 percent of the available processing resource in order to manage task switching.

This benchmark makes use of round-robin tasking and therefore does not specifically involve priority-management influences. In spite of this, round-robin tasking exercises a significant portion of RTOS task-management code and thus gives a measure of the implementation.

RTOSB Priority

The RTOSB Priority metric reveals the performance resource consumed by the RTOS when it uses priority-based task management.

This test is similar to RTOSB RoundRobin except that, instead of round-robin scheduling, each of the 1, 5, 10, or 20 tasks will be started with a different priority. Tests will be run with timer-interrupt periods of 10 and 20 milliseconds. The timer-interrupt handler will suspend or resume tasks according to the following algorithm:

1. The initial mode will be "suspend." In this mode, the timer-interrupt handler suspends the highest-priority task in the active set.

2. When the size of the active set reaches 1, the mode is switched to "resume." In this mode, the timer-interrupt handler resumes the lowest-priority task in the suspended list.

3. When no tasks remain in the suspended list, the mode returns to suspend.

4. In the case of one task, the mode will always be resume. This last case is somewhat degenerate in that no suspend or resume calls will be made. However, the results from this case will identify the overhead associated with the RTOS timer-interrupt processing. This will help calibrate the results from the cases of 5, 10, and 20 tasks.

This benchmark shows the processing resource consumed by the RTOS's management of tasks with differing priorities, as a function of the number of tasks being run and the rate of priority change.

According to the graph in Figure 2, if 10 tasks are running, you could assume that this RTOS will consume approximately 7 percent of the available processing resource in order to manage priority switching at a rate of 100-priority changes-per-second. A similar system that changes task priorities only 10-times-per-second should consume only 0.7 percent of the CPU.

It is difficult to define a priority-management benchmark that produces universal results since the use and manipulation of task priority can be varied. This benchmark assumes that the overhead of priority management is linearly proportional to the rate at which task priorities are changed. If the expected rate of priority change is known, this benchmark produces data from which linear interpolation or extrapolation may provide a reasonable estimate of the resulting overhead.

RTOSB Semaphore

The RTOSB Semaphore test reveals the performance resource consumed by the RTOS when it has to manage semaphores and occasional priority inversion.

The test uses one normal and one high-priority task. The normal task is free running but it requests and releases a semaphore once during each Dhrystone pass. The high-priority task requests the same semaphore, runs one Dhrystone pass, releases the semaphore, and suspends itself. The timer-interrupt handler will resume the high-priority task. Tests will be run with timer-interrupt periods of 5 and 20 milliseconds and with no timer interrupts. This benchmark shows the processing resource consumed by the RTOS's implementation of semaphores.

For example, Figure 3 shows heavy semaphore usage and occasional priority inversion. You could assume that this RTOS will consume approximately 1 percent of the available processing resource in order to manage 200 inversions/sec.

Semaphore performance is difficult to benchmark because overhead is highly dependent on the state of the system. For this test, I assume that in an average system, semaphore conflicts are rare and they never require more than a single level of priority inversion. Constant requesting of semaphores is probably not the average case. Since semaphore overhead is generally low, this test makes heavy use of them in order to obtain significant result data.

RTOSB Memory

The RTOSB Memory benchmark determines the overhead associated with allocation and deallocation of memory blocks. Many RTOS vendors supply both fixed and variable-sized block management. Where available, this benchmark will be run against both types of allocation.

The basic task load is the same as in RTOSB RoundRobin. The timer interrupt, however, sets global variables that cause the current task to either allocate or free memory. Pointers to allocated memory are maintained in a global allocation table.

To cause allocation of memory, the interrupt handler places the index of an allocation table entry in a global alloc variable. The Dhrystone tasks check this variable at the end of each pass; if it is not -1, they will request the allocation and place the returned pointer in the allocation table. To cause a block to be freed, the timer handler sets a global free variable to the index of the table entry containing the pointer to the block. The Dhrystone tasks also check this variable; if it is not -1, the pointer stored in the allocation table at the given index is passed to the free function.

When the test begins, no blocks are allocated and the mode is alloc. For every timer interrupt, one block will be allocated. When the total number of allocated blocks reaches 64, the mode will be set to free. In free mode, the interrupt handler causes deallocation of blocks from the list using a Gray code counter to supply the index of the block to be freed. (A Gray code counter changes only 1 bit as the counter is incremented or decremented. It still can identify the same number of objects as a straight binary counter. However, when used as a set index, it will "jump around" rather than progressing linearly through the set.) This creates an allocation table with "holes." When the number of allocated blocks drops to 16, the mode will be set back to alloc.

This test shows the processing resource consumed by the RTOS while managing multiple tasks and memory allocation as a function of the number of tasks and the rate of allocation activity. The actual results are determined by subtracting the results determined in the RTOSB RoundRobin from this benchmark's corresponding test results. This gives a value that reflects the overhead due to memory management.

In Figure 4, if 10 tasks are running,you can assume that this RTOS will consume approximately 10 percent of the available processing resource in order to manage memory allocation at a rate of 100 operations/second.

If we can assume that processing associated with memory management is independent of that associated with priority management, then someone who is doing both can add the results shown in RTOSB Priority and RTOSB Memory to get a reasonable estimate of the CPU resources consumed by the RTOS.

Baseline Interrupt Latency

The Baseline Interrupt Latency test establishes the baseline interrupt latency inherent in the hardware. The results of this test will be used to normalize subsequent interrupt-to-task latency tests. Task switching and other RTOS operations are disabled during this test.

The RTOSB Baseline test is rerun, but with the addition of a high-level interrupt handler that will pulse the MEASURE output. The interrupt request signal will be asserted by the Benchmark Control/Monitor with a period sweeping from 200 microseconds to 10 milliseconds over a period of 100 seconds in steps of 2 seconds. For example, during the first 2 seconds, there will be an interrupt every 200 microseconds. For the nth subsequent 2-second period, the interrupt period in microseconds will be (n+1)x200.

The Benchmark Control/Monitor will measure the time from the assertion of the interrupt until the assertion of the MEASURE signal, resulting in minimum, maximum, and average interrupt-response times. These values will provide a baseline against which subsequent interrupt-to-task latency can be compared. The graphical representation of the test result is three scalar values.

It is necessary to determine this baseline performance experimentally even though the interrupt latency is predominantly a function of the particular CPU and instructions used in the target. Variations in the hardware environment supporting the CPU can also have significant effect on interrupt latency. Determining this baseline will assure the hardware independence of the next benchmark, RTOSBInterrupt. In addition, this test will serve as a validation of the Benchmark Control/Monitor test measurements, since the results here should be in general agreement with latencies published by the CPU vendor.

RTOSB Interrupt

The RTOSB Interrupt test measures the overhead incurred when using the RTOS to manage the connection between an interrupt and the routine that must respond to the interrupt.

This test uses the load specified in RTOSB Memory at a fixed execution period of one millisecond. In addition, a highest-priority-handler task will initially start and then block on a signal. The interrupt service routine (ISR) is activated directly by the interrupt and behaves according to RTOS rules in that it makes use of any required entry and exit wrapper functions. The ISR will assert a signal to the handler function and then exit. The handler function, upon receipt of the signal, will immediately pulse the MEASURE output and reblock, awaiting the next signal.

The time from the assertion of the interrupt-request signal until the assertion of the MEASURE signal by the handler will be recorded. Minimum, maximum, and average times for the series of 131,072 interrupts, as described in the Baseline Interrupt Latency test, will be recorded for task loads of 1, 5, 10, and 20 tasks.

The results of this test are the times required for the RTOS to receive an interrupt and invoke a handler while managing a variety of task loads and memory allocation. These results are tabulated differently in that they are presented as multipliers of the baseline interrupt latency (see Figure 5). If the baseline interrupt latency average is 10 microseconds and the average latency from this test is 35 microseconds, the result would be tabulated as 3.5. The distance between the minimum and maximum curves represents the predictability or determinism of the RTOS's response to an interrupt. When the minimum, average, and maximum are close together, the RTOS's interrupt-response time is very predictable. The latency factors for the three curves are determined by dividing the measured response times by the corresponding baseline response times.

This test produces an approximation of the performance of the RTOS. The minimum and maximum values recorded may not represent the actual best and worst cases, which depend on the synchronization of many small, independent time windows.

Since the baseline latency of the hardware may be quite small, the calculated latency factors may be large. It is still necessary to avoid using the actual response times here to maintain result independence from the hardware test platform. For a baseline interrupt latency of 3.2 microseconds, for example, it may be desirable to add a footnote to a result chart saying, "A latency factor of 30 implied a response time of 96 microseconds on the test platform."

RTOSB Message

The RTOSB Message metric reveals the performance of the RTOS when it has to manage queue-based messaging.

This test uses a fixed set of 10 tasks and timer-interrupt periods of 50, 25, 10, 5, and 1 milliseconds. The tasks all have assigned message queues. Nine tasks are instances of a simple retrieve and forward routine, which blocks until it receives a message and then immediately forwards that message to the next task. The remaining task is the Dhrystone task. It will start and remain in the Dhrystone loop until a global-message flag value changes. The Dhrystone task will then send one or two messages to the first message-forwarding task, depending on the new state of the global-message flag. The Dhrystone task will then block until it has received the same number of messages from its queue.

At each timer interrupt, the global-message flag is toggled between 0 and 1. The total number of messages passed cycles between 10 and 20 during every timer interrupt, corresponding to the current state of the flag. No Dhrystone looping takes place until all messages have passed through all tasks.

This benchmark shows the overhead consumed by the RTOS when managing message origination and retrieval as a function of message rate. In Figure 6, a system expecting to manage 1500 messages/second will consume approximately 10 percent of the CPU resources.

This test forces the RTOS to manage bursts of messages. It is difficult to predict whether an RTOS will behave differently if the messages are dispersed in time. It is assumed that the difference is small.

Summary

These benchmarks provide an intuitive sense of an RTOS's performance. The benchmarks are not designed for comparing different vendors' offerings, and they are not intended to be combined into a single figure of merit. The task of evaluating an RTOS still is complex, but I hope these benchmarks will assist you in matching an RTOS product to your system requirements.

I would like to thank the many people who have shown support for this project. Special thanks to Linda Thompson, Robin Kar, and John Fogelin, each of whom provided significant technical input.

After modifying the current benchmark suite as indicated by feedback from you, I might run the tests against the various RTOS products, create a large-system equivalent of these tests, or create an equivalent for POSIX-compliant RTOSs. I welcome your comments and suggestions.

Benchmark Laboratory Setup

The figure shown here illustrates the components necessary to implement the benchmarks I describe. In general, these include the following:

Test Target System.The target for this test series is any average microcontroller meeting the aforementioned target class assumptions. Three I/O connections to this system are required. The first is a digital "Measure" output used to indicate that the test system is ready for measurement. The second is an interrupt request (IRQ) input that will be used to invoke asynchronous interrupts. The remaining signal is a DhryPass output. Most microcontroller evaluation boards make suitable test targets.

Benchmark Control/Monitor.An off-the-shelf evaluation board with custom software will be used to provide stimulus and take performance measurements on the RTOS test target. The Benchmark Control/ Monitor (BenchMon) consists of a 20-MHz Motorola 68332 Evaluation Board. The MC68332 contains an integral module called the Time Processor Unit (TPU), capable of high-resolution timing and control of discrete I/O signals. Custom TPU microcode developed for this benchmark will be used to measure the time between the assertion of signals and to count the transitions of a signal. Minimum, maximum, and average time values can be calculated with a 200-nanosecond resolution over millions of test cycles. Control software running on the BenchMon CPU will interact with the TPU microcode to control the onset of each test and to display results via the serial data port.

In-Circuit Emulator.An in-circuit emulator (ICE) is connected to the target CPU as a suggested part of the test setup. It provides a means of control and visibility into the target without the inclusion of an RTOS monitor task in the system. Also, the trace and overlay features of the emulator facilitate implementing each RTOS and verifying their real-time behavior.

Other Equipment.A workstation capable of compiling target code and operating the ICE and BenchMon program must be available. Oscilloscopes and other test equipment may also be needed.

--E.M.

Figure 7: Benchmark Laboratory Setup

Figure 1: RTOSB RoundRobin.

Figure 2: RTOSB Priority.

Figure 3: RTOSB Semaphore test.

Figure 4: RTOSB Memory test.

Figure 5: RTOSB Interrupts.

Figure 6:RTOSB Message.

Listing One

/* RTOS Benchmark Master Control Program
** The main() function here is invoked as the first user task after the RTOS
** has been configured and started. It is responsible for configuring the
** task set and operation mode based on the global test configuration
** parameters. The Dhrystone code is not included here due to its size. 
** However its flow is modified to be as follows.
**  Initialize data;
**  while( 1 )
**  {
**     Invoke callback routine supplied when this task started;
**     Do normal Dhrystone computations;
**  }
** When a Dhrystone task is started, it is supplied with a callback argument
** pointing to a function in this file.  That callback is invoked at the end
** of every pass through the Dhrystone loop. Some RTOS may be very restrictive
** in what can be done during an interrupt. If task control cannot be exerted 
** during that time parts of this code will have to be restructured.
** Caveat emptor: This code has not yet been tested on any RTOS.
** There are subtle differences between this code and the article text.
** The code is more recent.
*/
/* RTOS specific routines compiled separately */
extern void enableSliceModeV();     /* Enable RoundRobin multitasking */
extern void enableTaskModeV();      /* Enable multitasking */
                /* Install a callback in the timer interrupt */
extern void installTimerHandlerV( void (*callbackPF)(void) );
                /* Start a high priority task */
extern void startHighTaskV( void (*callbackPF)(void) );
extern void waitForSignalV( void ); /* For High Priority latency task */
extern void signalHighTask( void ); /* signal high priority task */
                /* Install interrupt handler */
extern void installIntHandler( void (*callbackPF)(void) );
extern void killSelfV( void );      /* Stop current thread */
extern void suspendTaskV( int );    /* Suspend this task */
extern void suspendSelfV( void );   /* Suspend current task */
extern void resumeTaskV( int );     /* Resume a given task */
extern void createSemaphoreV( void *semaphorePV );/* construct a semaphore */
extern void releaseSemaphoreV( void * );
extern void * requestSemaphorePV( void );
                /* Starts one task at a given priority,
                ** passes it the given callback, rtns ID */
extern int startTaskN( int priorityN, void (*callbackPF)(void) );
                /* Starts countN tasks at same priority */
extern void startRRTasksV(int priorityN, int countN, void (*callbackPF)(void));
                /* Starts countN tasks at different priority */
extern void startTasksV(int priorityN, int countN, void (*callbackPF)(void) );
extern void rtosFreeV( void * );    /* free a block */
extern void *rtosAllocPV( void );   /* allocate a block */
extern void waitMsgV( void );       /* wait for message from someone */
extern void sendMsgV( void );       /* send message to next task */
extern void sendFirstMsgV( void );  /* send message to first task */
/* CPU specific macros */
#define DISABLE_INTERRUPTS      /* Assembler statement for disable */
#define ENABLE_INTERRUPTS       /* Assembler statement for ensable */
/* Target board specific functions compiled separately */
extern void resetMeasureV( void );  /* de-asserts the Measure output */
extern void startMeasureV( void );  /* Asserts the Measure output */
extern void dhrypulseV( void );     /* Toggles the Dhrypulse output */
/* Compiler specific definitions */
#define interrupt
#define MAXTASKS 20     /* Maximum number of tasks in list */
#define MAXALLOC 64     /* Maximum number of allocated blocks */
#define MINALLOC 16     /* Minimum number of allocated blocks */
enum TEST
{
    BASELINE =      1,
    ROUNDROBIN =    2,
    PRIORITY =      3,
    SEMAPHORE =     4,
    MEMORY =        5,
    BASEINTLAT =    6,
    INTLATENCY =    7,
    MESSAGE =       8
};
/* Global test configuration values set by emulator at startup */
enum TEST   testN;          /* Determines which test to run */
int taskCountN;             /* Number of tasks (where appropriate */
int highPriorityIsBig;      /* Selects priority direction for OS */
int basePriN;               /* Starting task priority */
enum STATE
{
    NotStarted = 0,
    Running,
    Suspended
};
enum MODE
{
    Suspend,
    Resume
};
struct TASKLISTENT
{
    int idN;                    /* numeric task ID */
    int priN;                   /* task priority (bigger is higher) */
    enum STATE stateN;          /* Current state */
} taskListAH[MAXTASKS];
static void *semaphorePV;       /* generic semaphore pointer */
static int resumeN;             /* resume flag for semaphore test */
static int taskN;               /* Task ID */
static int allocN, allocFillN;  /* allocation table indicies */
static void * semaphorePV;
static int messageCountN;       /* determines when and how many */
static void *allocAP[MAXALLOC]; /* Allocation table */
/* Gray Decode Table */
static int const grayTabAN[MAXALLOC] = 
{
     0,  1,  3,  2,  6,  7,  5,  4, 12, 13, 15, 14, 10, 11,  9,  8,
     24, 25, 27, 26, 30, 31, 29, 28, 20, 21, 23, 22, 18, 19, 17, 16,
     48, 49, 51, 50, 54, 55, 53, 52, 60, 61, 63, 62, 58, 59, 57, 56,
     40, 41, 43, 42, 46, 47, 45, 44, 36, 37, 39, 38, 34, 35, 33, 32
};
/* Function declarations */
static void performanceHandlerV( void );
static void nullCB( void );
static void semaSuspendV( void );
static void semaphoreV( void );
static void semaphoreHandlerV( void );
static void memoryHandlerV( void );
static void memoryCB( void );
static void latencyHandlerV( void );
static void interrupt intHandlerV( void );
static void interrupt latencyHandlerV( void );
static void latencyTaskV( void );
static void waitSendMsgV( void );
static void messageCB( void );
static void messageHandlerV( void );
static void messageCB( void );
/* The entry point for the first user task */
void mcp_main( void )
{
    int i;
    resetMeasureV();            /* Reset Measure output port */
    switch(testN)
    {
    case BASELINE:
    startTaskN( basePriN, nullCB);  /* Start 1 task, minimal callback */
    DISABLE_INTERRUPTS;             /* No more interrupts */
    startMeasureV();                /* Begin measurement */
    break;
    case ROUNDROBIN:            /* Run with various timer tick rates */
    enableSliceModeV();         /* Enable RR multitasking */
                                /* Start countN tasks, min. callback */
    startRRTasksV(basePriN, taskCountN, nullCB);
    startMeasureV();            /* Begin measurement */
    break;
    case PRIORITY:
    enableTaskModeV();      /* Enable multitasking */
                            /* Start countN tasks, min. callback */
    startTasksV(basePriN, taskCountN, nullCB);
                           /* Install routine in timer int. handler */
    installTimerHandlerV( performanceHandlerV );
    startMeasureV();       /* Begin measurement */
    break;
    case SEMAPHORE:
    createSemaphoreV( &semaphorePV );   /* construct a semaphore */
                            /* Start a normal task that uses a semaphore */
    enableTaskModeV();      /* Enable multitasking */
    startTaskN( basePriN, semaphoreV );
                /* Then start a higher priority task that suspends itself
                ** and then resumes after every timer tick. */
    resumeN = 0;        /* prevent next task from going far */
    if( highPriorityIsBig )
        taskN = startTaskN( basePriN + 1, semaSuspendV );
    else
        taskN = startTaskN( basePriN - 1, semaSuspendV );
    installTimerHandlerV( semaphoreHandlerV );
    startMeasureV();        /* Begin measurement */
    break; 
    case MEMORY:
    for( i = 0; i < MAXALLOC; i++ )
        allocAP[i] = 0;     /* clear allocation table */
    allocFillN = 0;         /* initialize index */
    allocN = -1;            /* No allocations yet */
    enableSliceModeV();     /* Enable RR multitasking */
    startRRTasksV(basePriN, taskCountN, memoryCB);
    installTimerHandlerV( memoryHandlerV );
    startMeasureV();        /* Begin measurement */
    break;
    case BASEINTLAT:
    startTaskN( basePriN, nullCB );
    installIntHandler( intHandlerV );   /* Handler pulses Measure */
    break;
    case INTLATENCY:
    for( i = 0; i < MAXALLOC; i++ )
        allocAP[i] = 0;     /* clear allocation table */
    allocFillN = 0;         /* initialize index */
    allocN = -1;            /* No allocations yet */
    enableSliceModeV();     /* Enable RR multitasking */
    startRRTasksV(basePriN, taskCountN, memoryCB);
    installTimerHandlerV( memoryHandlerV );
    /* At this point we have a good system load.  Now
    ** set up the high priority task and the interrupt handler. */
    startHighTaskV( latencyTaskV );
    installIntHandler( latencyHandlerV );
    break;
    case MESSAGE:
    startRRTasksV(basePriN, 9, waitSendMsgV );  /* start msg tasks */
                          /* start Dhrystone/initiator task */
    startRRTasksV(basePriN, 1, messageCB );
    installTimerHandlerV( messageHandlerV );    /* message trigger */
    break;
    default:
    break;
    }
    killSelfV();        /* Remove this thread */
}
/* Function:    addNewTaskV
** Purpose: Callback from RTOS specific task start routines. This function
** updates the active task list.
*/
void addNewTaskV( int idN, int priorityN )
{
    static int startedTasksN = 0;
    taskListAH[startedTasksN].idN = idN;
    taskListAH[startedTasksN].priN = priorityN;
    taskListAH[startedTasksN].stateN = Running;
    startedTasksN++;
}
/* Function:    performanceHandlerV
** Purpose: Called from the RTOS timer interrupt code.  This routine
**      suspends and resumes tasks in the task list.
*/
void performanceHandlerV( void )
{
    static enum MODE modeN = Suspend;
    int i, theTaskN, itsPriN, tasksN;
    if( taskCountN == 1 ) return;    /* don't do anything if just 1 task */
    tasksN = taskCountN;
    if( modeN == Suspend )
    {   /* search for the highest priority task that is still running */
    if( highPriorityIsBig ) itsPriN = 0;
    else itsPriN = 32767;
    for( i = 0; i < taskCountN; i++ )
    {
        if( taskListAH[i].stateN == Running )
        {       /* if the task is running, check it's priority */
        if( highPriorityIsBig )
        {
            if( taskListAH[i].priN > itsPriN )
            {
            itsPriN = taskListAH[i].priN;
            theTaskN = i;   /* remember which task is highest */
            }
        }
        else
        {           /* Low numbers are higher priority */
            if( taskListAH[i].priN < itsPriN )
            {
            itsPriN = taskListAH[i].priN;
            theTaskN = i;   /* remember which task is highest */
            }
        }
        }
    }
    suspendTaskV( taskListAH[theTaskN].idN );    /* suspend highest task */
    taskListAH[i].stateN = Suspended;
    tasksN--;
    if( tasksN == 1 ) modeN = Resume;
    }                /* End of mode == suspend */
    else             /* Mode must be Resume */
    {
    if( highPriorityIsBig ) itsPriN = 32767;
    else itsPriN = 0;
    tasksN = 0;
    for( i = 0; i < taskCountN; i++ )
    {
        if( taskListAH[i].stateN == Suspended )
        {       /* if the task is suspended, check it's priority */
        if( highPriorityIsBig )
        {
            if( taskListAH[i].priN < itsPriN )
            {
            itsPriN = taskListAH[i].priN;
            theTaskN = i;   /* remember which task is lowest */
            }
        }
        else
        {           /* Low numbers are higher priority */
            if( taskListAH[i].priN > itsPriN )
            {
            itsPriN = taskListAH[i].priN;
            theTaskN = i;   /* remember which task is lowest */
            }
        }
        }
        else tasksN++;  /* Count number of running tasks */
    }
    /* If there were no suspended tasks */
    resumeTaskV( taskListAH[theTaskN].idN ); /* resume the task */
    tasksN++;
    if( tasksN++ == taskCountN )    /* If all tasks are now running */
        modeN = Suspend;            /* Switch back to suspend mode */
    }
}
/* Function:    nullCB
** Purpose: Do nothing call back
*/
void nullCB( void )
{
}
/* Function:    semaphoreV
** Purpose: Callback for normal priority semaphore task. Releases semaphore
**   (if owned), then grabs it right back. If a higher priority task has 
**   requested it, then I guess we won't come right back from the release.
*/
static void semaphoreV( void )
{
    static int haveIt = 0;
    if( haveIt )    /* No semaphore to release the first time */
    releaseSemaphoreV( semaphorePV );
    semaphorePV = requestSemaphorePV();
    haveIt = 1;
}
/* Function:    semaSuspendV
** Purpose: Callback for high priority semaphore task. Releases semaphore 
**   (if owned) and then suspends itself. When resumed, it requests semaphore.
*/
static void semaSuspendV( void )
{
    static int haveIt = 0;
    if( haveIt )
    {
    releaseSemaphoreV( semaphorePV );
    suspendSelfV();
    }
    semaphorePV = requestSemaphorePV();
    haveIt = 1;
}
/* Function:    semaphoreHandlerV
** Purpose: Called during a timer interrupt.  Resumes the desired task. */
static void semaphoreHandlerV( void )
{
    resumeTaskV( taskN );
}
/* Function:    memoryHandlerV
** Purpose: Invoked from the timer interrupt handler.  This routine places an
**   index of an allocation table entry in a location that can be read by the 
**   memory callback routine. This routine has alloc and free modes. In alloc 
**   mode, it finds the index of the first non-allocated entry.  In free mode,
**   it uses a Gray code hash to get the index of an entry to be freed.
*/
static void memoryHandlerV( void )
{
    static int freeModeN = 0;
    int i;
    if( ! freeModeN )
    {               /* If allocating blocks */
    for( i = 0; allocAP[i]; i++ ) ; /* Scan for unallocated slot */
    allocN = i;     /* set this for callback routine */
    allocFillN++;       /* track number of allocated blocks */
    if( allocFillN == MAXALLOC )
        freeModeN = 1;
    }
    else
    {           /* Must be freeing blocks, use Gray code index */
    allocN = grayTabAN[--allocFillN];
    if( allocFillN == MINALLOC )    /* if we hit lower allocation limit */
        freeModeN = 0;
    }
}
/* Function:    memoryCB
** Purpose: Called from the dhrystone loop. Looks at the allocAP[allocN] entry
**   and requests an alloc or free depending on whether the value is 0.
*/
static void memoryCB( void )
{
    if( allocAP[allocN] )
    {               /* if this entry is allocated */
    rtosFreeV( allocAP[allocN] );
    allocAP[allocN] = 0;
    }
    else            /* entry was null, assign it a block */
    allocAP[allocN] = rtosAllocPV();
}
/* Function:    intHandlerV
** Purpose: invoked directly by the discrete interrupt input
**      Used for establishing baseline interrupt latency
*/
static void interrupt intHandlerV( void )
{
    startMeasureV();    /* Asserts the Measure output */
    resetMeasureV();    /* de-asserts the Measure output */
}
/* Function:    latencyHandlerV
** Purpose: invoked directly by the discrete interrupt input. Used for 
**  establishing baseline interrupt latency. Signals task waiting for interrupt
*/
static void interrupt latencyHandlerV( void )
{
    signalHighTask();   /* assert signal to waiting task */
}
/* Function:    latencyTaskV
** Purpose: Runs as a thread, waits on a signal and then pulses the
**      measure output. The test measures the time from the onset
**      of the interrupt to the assertion of the measure output.
*/
static void latencyTaskV( void )
{
    while( 1 )
    {
    waitForSignalV();   /* Hang here waiting for interrupt */
    startMeasureV();    /* Asserts the Measure output */
    resetMeasureV();    /* de-asserts the Measure output */
    }
}
/* Function:    waitSendMsgV
** Purpose: Self contained thread that lives for message exchange. The RTOS 
**   specific message functions determine the recipient of messages sent such 
**   that all running tasks send receive messages. Started from a normal 
**   Dhrystone loop but never return; don't contribute to Dhrystone rate.
*/
static void waitSendMsgV( void )
{
    while( 1 )
    {
    waitMsgV();     /* wait for message from someone */
    sendMsgV();     /* send message to next task */
    }
}
/* start Dhrystone/initiator task */
/* Function:    messageCB
** Purpose: Called from the dhrystone loop.  If the messageCountN
**      value has changed, initiate a message loop.
*/
static void messageCB( void )
{
    static int lastCountN = 0;
    if( messageCountN != lastCountN )
    {           /* if it's time to do some messages */
    if( (messageCountN = lastCountN ) == 2 )
    {       /* if sending/receiving 2 messages */
        sendFirstMsgV();
        sendFirstMsgV();
        waitMsgV();
        waitMsgV();
    }
    else
    {   /* just sending one message */
        sendFirstMsgV();
        waitMsgV();
    }
    }
}
/* Function:    messageHandlerV
** Purpose: Called from the timer interrupt handler.  Bounces the global
**      message counter between 1 and 2.
*/
static void messageHandlerV( void )
{
    if( ++messageCountN == 3 )
    messageCountN = 1;
}