Channels ▼
RSS

Design

Examining Windows CE 3.0 Real-Time Capabilities

Source Code Accompanies This Article. Download It Now.


Dec01: Examining Windows CE 3.0 Real-Time Capabilities

Bart is a project manager at Dedicated Systems Experts. He can be contacted at b.van.beneden@dedicated-systems.com.


What does the term "real time" truly mean? When can an operating system be deemed an RTOS? All too often, real-time behavior is associated with raw speed. Popular conclusions are that the faster a system responds or processes data, the more real time it is. Or vice versa, a "slow" system could never be real time. Statements like these are incorrect. The key issues in a real-time system are predictability and reliability. What is important to know is how a system will respond, and how long at most it will take to do so. Such features are essential in industrial applications such as process control systems, aerospace, and the like.

These were some of the issues we faced at Dedicated Systems Experts, a company specializing in the verification and validation of real-time systems, when Microsoft commissioned us to provide an independent assessment of the real-time performance of the preview release of Windows CE 3.0. After accepting our report, we'd like to think that Microsoft incorporated our findings into the final OS.

Since the production version of CE 3.0 has been available for a while, we decided to apply our standard real-time operating system (RTOS) test suite to the system, resulting in a detailed evaluation report. The test suite covers thread handling, advanced interrupt handling (simultaneous and nested interrupts), synchronization mechanisms, file system and network stack performance, as well as various stress tests that monitor the system for memory leaks and performance degradation under loaded conditions.

In this article, I summarize some key elements and findings of our report. (The complete report is commercially available at http://www.dedicated-systems.com/encyc/buyersguide/rtos/rtosmenu.htm).

Real Time

A system is dubbed "hard" real time when deadline misses may result in a catastrophe — a fly-by-wire system, for example. In a "soft" real-time system, on the other hand, consequences are less severe, and are more often than not a business issue. Just think of a laser printer where a deadline miss may cause blank or only partially printed lines. Obviously, such an error would be unacceptable if it occurred every five pages or so. However, would it be significant if it only affected one out of 10,000 pages?

So how can you be certain that a particular deadline will never be missed? This requires a system with a response time that never exceeds a defined maximum. No matter what the average response time is, all that really matters in deadline management is the maximum response time. Consider a test where a thread is repeatedly created and deleted, thousands of times. Figure 1 shows that Windows CE 3.0 takes an average of about 100 s to create a thread. However, on two occasions, which only constitute 0.01 percent of the samples, it takes longer than 1 s to create this thread. The two samples that represent the maximum latency deserve far more attention from the real-time system designer than all the other samples combined.

An ideal system is one where average and maximum response times are the same. This is not at all the case in Figure 1. Luckily, the thread creation latency is not that important in this context because it is not a good design practice to dynamically create/delete threads in the real-time part of an application.

Real Time and Windows CE

With Windows CE 2.x, Microsoft made its first attempt to enter the RTOS market. However, the OS was never completely accepted because of its shortcomings. It suffered from memory leaks, didn't have nearly enough thread priority levels, did not support nested interrupts, and CE 2.x was missing some basic synchronization primitives (such as semaphores).

All these issues were addressed in the latest version of the system. Compared to its predecessor, CE 3.0 is a new operating system.

Measurement Method

The most popular way to measure real-time system performance is by using the software timers provided by the RTOS. However, there are several reasons why this method is not ideal:

  • Timers in different RTOSs do not necessarily have the same resolution, nor do they always have the necessary precision.
  • The RTOS is responsible for both running the tests and making timing measurements. This adds overhead that is likely to render the results unreliable.

To avoid such problems, we used external equipment. Time intervals are measured using a PCI bus analyzer. When a certain system operation needs to be timed, it is preceded and followed in the test source code by an operation that writes a trace to a particular PCI address. Each trace written to that PCI address is captured and timestamped by the analyzer's high-precision clock. Hence, the time difference between these two traces is the execution time of the system operation. This method allows for a 100-ns resolution on the measurements. Each test loops a minimum of 1000 times, generating sufficient samples for analysis using statistical tools.

The system is also stimulated by means of an external device — the PCI bus exerciser. The exercisers are programmable, standalone devices allowing carefully controlled interrupt sequences to be generated, while its operation remains independent and asynchronous with the main system under test. This device provides the necessary flexibility to simulate real-load conditions and peripherals.

Every RTOS we evaluate is submitted to the same test suite and is tested on the same platform — an Intel Pentium 200-MHz MMX-based PC with a Chaintech motherboard. This allows for a correct comparison between different RTOSs.

Interrupt Handling

The test suite includes a host of interrupt handling tests. The basic interrupt test generates a stream of more or less periodic interrupts and measures the interrupt latency. This latency is the time the system needs to respond to an external event (in this test case, the event is the interrupt generated by the PCI exerciser, but in a field application, this could be the signal from a sensor or another peripheral). Industrial applications require this latency to be finite and never exceed a defined maximum. Predictability is crucial.

Figure 2 displays the result for the interrupt latency test and compares it with those of VxWorks/x86 5.3.1 and pSOSystem/x86 2.2.6 from WindRiver Systems, and QNX 6.0 from QNX Software Systems. This test clearly shows that Windows CE's maximum interrupt latency remains under control at all times.

The evaluation report also covers more sophisticated interrupt handling tests involving interrupts with different priority levels being generated by two independent interrupt sources. This was accomplished by adding a second PCI exerciser to the test system.

In the first test of this kind, both exercisers were programmed to generate interrupts nearly simultaneously (less than 300-ns delay between both interrupts). The test measures the time it takes to service both interrupts. CE 3.0 handled this test very well; the system never took longer than 11 s to service both interrupts.

The next step is to verify that interrupts can be nested and that the handling is prioritized. This means the exercisers were programmed to generate the low-priority interrupt first, with its high-priority counterpart following shortly thereafter. During the test, the time interval between both interrupts was bumped up from 1.5 s to 7.5 s, in 1.5 s increments. The results clearly showed that the RTOS is capable of handling nested interrupts in a prioritized way, which is an absolute necessity for real-time operating systems.

Thread Handling

Windows CE 3.0 is a multiprocess and multithread system. A thread represents a path of execution in a process. Every time the OS creates a process, it creates at least one thread for it. To make the system as robust as possible, Microsoft opted to let every process run in its own virtual memory space. This makes it impossible for faulty applications to compromise system stability by accidentally writing in another component's address space. Threads, on the other hand, share all of its process's resources, including the address space, with the other threads in the same process. This allows for fast interthread communication.

A basic test is to measure the thread switch latency; this is the time it takes to switch from one thread to another. The fact that nearly every real-time application is multithreaded makes the thread switch latency an important piece of information to system designers. It is of the utmost importance that not too much time is lost in switching to a high-priority task when it is ready to start or resume execution.

Figure 3 shows the results of the thread switch latency test executed with 10 threads of equal priority belonging to the same process. As soon as one of these threads becomes active during the test, it yields the processor so the next thread in the ready queue will take over. Two important conclusions were drawn from this test:

  • The thread switch latency is independent of the number of threads being switched between. This was verified by repeating the same test with two and 128 threads, and the results were similar. This is the way it should be in an RTOS. In RT-Linux 2.2 (from FSMLabs), for example, we observed a thread switch latency that was directly proportional with the number of threads being used. This behavior is caused by an oversimplified (linear) ready-queue structure and is unacceptable in a real-time system.
  • Again, the maximum (worst case) latency is far more important than the average in a real-time system. The maximum thread switch latency of 32 s only occurred in the first sequence of the test; that is, when switching to a thread that becomes active for the very first time.

Priority Inversion Recovery

To synchronize the access of different tasks to a shared, common resource, an RTOS provides synchronization primitives. A mutex (MUTual EXclusion) is an example of such a synchronization object. On some occasions when a resource is shared between three tasks, a situation called "priority inversion" can occur. Simply stated, priority inversion means that a high-priority task has to wait for the completion of a lower priority task before it can resume execution.

Although it is often claimed that priority inversion can be avoided by carefully designing your application, an RTOS should nevertheless have the capability to recover from such a situation, if it happened to occur (there is no such thing as a perfect design). Remember that it was a priority-inversion problem that rendered the Mars Pathfinder inoperable.

Windows CE 3.0 has implemented the priority-inheritance mechanism to recover from such a situation. We tested this by creating a situation with three threads where the priority-inversion problem occurs: A high-priority thread (thread A) wants to acquire a mutex that is owned by a low-priority thread (thread C). A medium-priority thread, however, (thread B) keeps thread C from running and releasing the mutex so that thread A can't acquire it and continue with its work.

Priority inheritance implies that when the priority-inversion situation is detected, the system boosts thread C's priority level to that of thread A, so it can continue its work and release the mutex. As soon as this is accomplished, the system restores the thread's priority level to its original (low) value so thread A can resume execution and acquire the mutex. Problem solved.

In this test (see Listing One), we measured the time it takes for the highest priority thread to acquire the mutex. This includes the time it takes to boost the priority of the lowest priority thread, have it release the mutex, and switch back to the highest priority thread so it can acquire the mutex. Figure 4 shows the test results. The results of VxWorks/x86 5.3.1 are included for the sake of comparison.

The User Experience

The user friendliness of Windows CE can be looked at from two different perspectives. On the one hand, there is the end user who will be familiar with the typical Windows GUI. Things won't even be that different for the application developers, since Windows CE uses a subset of the well-known Win32 API.

On the other hand, the OEM's mission is to build a custom CE-based platform for its devices. To accomplish this, Microsoft developed a tool called "Platform Builder." Although Platform Builder is a graphical tool, the majority of the configuration work is still done by manually editing registry files, manipulating environment variables, and modifying other configuration scripts. Configuring Windows CE 3.0 is certainly not a trivial task, even more so because the standard documentation does not provide a clear and structured overview.

Conclusion

We found that Windows CE 3.0 did indeed exhibit real-time behavior during our evaluation. The system keeps responding and performing predictably, independent of the system load. Stress tests did not reveal any problems concerning robustness. At this point, Windows CE 3.0 seems ready to be a candidate RTOS for use in a variety of real-time and industrial applications.

DDJ

Listing One

///////////////////////////////////////////////////////////////////////////// 
// PrioInversion.c  This file defines an entry point for the DLL application.
//       The test code creates a priority inversion situation.
//       The test code is part of a DLL and is loaded, locked, and called
//       from an external application to eliminate all paging overhead.
//////////////////////////////////////////////////////////////////////////////

#include "stdafx.h"
#include "test.h"
#include "trace.h"

#define LOW_PRIORITY       10
#define MEDIUM_PRIORITY     5
#define HIGH_PRIORITY       0

BOOL    VirtualCopy( LPVOID, LPVOID, DWORD, DWORD );
int     MediumPriorityThread(LPVOID);
int     HighPriorityThread(LPVOID);

int             iTest           = 0;
unsigned long*  pulpPCIMemory   = NULL;
HANDLE          HandleMutex;
HANDLE          HandleSemA, HandleSemB;
HANDLE          HandleMediumPrioThread, HandleHighPrioThread;

////////////////////////////////////////////////////////////////////////////// 
// This is the start of the program, and is an exported function of the DLL 
// that is called by an external application.
// This is also the low-priority thread in the test; after it has set up 
// all the necessary items for the test, it will grab the mutex and release 
// higher priority threads.
///////////////////////////////////////////////////////////////////////////// 
TEST_API int Start(void)
{
    DWORD   dwThreadId;
    int     i;
    // Set the priority of this thread to a low value
    CeSetThreadPriority(GetCurrentThread(), LOW_PRIORITY);
    // Set the thread quantum to 0 i.e.; run to completion
    if (CeSetThreadQuantum(GetCurrentThread(), 0) == 0)
    {
        RETAILMSG( 1, (TEXT( "Failed to set thread quantum\r\n" )));
        return -1;
    }
    // Allocate virtual memory region and bind physical PCI memory to it
    pulpPCIMemory = 
         (unsigned long *) VirtualAlloc( 0, 0x4, MEM_RESERVE, PAGE_NOACCESS );
    if ( !pulpPCIMemory )
    {
        RETAILMSG( 1, (TEXT( 
                        "Failed to allocate virtual address space\r\n" )));
        return -1;
    }
    VirtualCopy((LPVOID)pulpPCIMemory, (LPVOID)(0xE1000000 / 256), 
                       0x40, PAGE_READWRITE | PAGE_NOCACHE | PAGE_PHYSICAL );
    // Create the mutex
    HandleMutex = CreateMutex(NULL, FALSE, NULL);
    if (HandleMutex == NULL) 
    {
        RETAILMSG( 1, (TEXT( "Failed to create mutex\r\n" )));
        return -1;
    }
    // Create the semaphores necessary to synchronize the threads
    HandleSemA = CreateSemaphore(NULL, 0, 1, NULL);
    HandleSemB = CreateSemaphore(NULL, 0, 1, NULL);
    if ((HandleSemA==NULL)||(HandleSemB==NULL)) 
    {
        RETAILMSG( 1, (TEXT( "Failed to create semaphores\r\n" )));
        return -1;
    }
    // Create the other threads
    HandleMediumPrioThread  = 
               CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE) 
               MediumPriorityThread, NULL, 0, &dwThreadId);
    HandleHighPrioThread    = 
               CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE) 
               HighPriorityThread, NULL, 0, &dwThreadId);
    if ((HandleMediumPrioThread==NULL)||(HandleHighPrioThread==NULL)) 
    {
        RETAILMSG( 1, (TEXT( "Failed to create threads\r\n" )));
        return -1;
    }
    // Let the other threads initialize properly
    Sleep(1);
    for (i=0; i<11000; i++)
    {
        // This variable set to 1 so medium priority thread has work to do.
       iTest = 1;
        // Acquire mutex so it is not available for high-priority thread
        WaitForSingleObject(HandleMutex, INFINITE);
        // Release semaphore B so medium priority thread can start running
        ReleaseSemaphore(HandleSemB, 1, NULL);
        // Write trace to check that this thread's priority is boosted when 
        // system attempts to resolve the priority inversion situation
        *pulpPCIMemory = TRC(0, 0, TRC_MT, TRC_RLS, 0, 0);
        // Release the mutex so the high-priority thread can grab it
        ReleaseMutex(HandleMutex);  
    }
    return 0;
}
///////////////////////////////////////////////////////////////////////////// 
// Medium-priority thread. Its main purpose is to keep the low-priority
// thread from running once it has grabbed the mutex.
///////////////////////////////////////////////////////////////////////////// 
int MediumPriorityThread(LPVOID pArg)
{
    // Set the priority of this thread to the medium value
    CeSetThreadPriority(GetCurrentThread(), MEDIUM_PRIORITY);
    // Set the thread quantum to 0 i.e.; run to completion
    if (CeSetThreadQuantum(GetCurrentThread(), 0) == 0)
    {
        RETAILMSG( 1, (TEXT( "Failed to set thread quantum\r\n" )));
        return -1;
   }
    for (;;)
    {
        // Wait here until we are sure low-priority thread has grabbed mutex
        WaitForSingleObject(HandleSemB, INFINITE);
        // Release semaphore A to unblock the high-priority thread
        ReleaseSemaphore(HandleSemA, 1, NULL);
        // While this variable is set, perform a busy loop. This is crucial: 
        // if system does not implement priority inheritance properly, this 
        // loop prevents low-priority thread from releasing mutex, resulting 
        // in deadlock
       while(iTest);
    }
    return 0;
}
///////////////////////////////////////////////////////////////////////////// 
// High-priority thread. It will try to grab a mutex was acquired
// by the low-priority thread earlier.
///////////////////////////////////////////////////////////////////////////// 
int HighPriorityThread(LPVOID pArg)
{
    // Set the priority of this thread to the highest value
    CeSetThreadPriority(GetCurrentThread(), HIGH_PRIORITY);
    // Set the thread quantum to 0 i.e.; run to completion
    if (CeSetThreadQuantum(GetCurrentThread(), 0) == 0)
    {
        RETAILMSG( 1, (TEXT( "Failed to set thread quantum\r\n" )));
        return -1;
    }
    for (;;)
    {
        // Wait here until you are sure medium-priority thread is set 
        // to execute its busy loop.
        WaitForSingleObject(HandleSemA, INFINITE);
        // Write trace before acquiring mutex
        *pulpPCIMemory = TRC(0, 2, TRC_MT, TRC_ACQ, 0, 0);
        // Acquire mutex (system will now detect priority inversion situation
        // and resolve it).
        WaitForSingleObject(HandleMutex, INFINITE);
        // Write trace after acquiring mutex (the difference between both 
        // traces is the time it took to recover from the deadlock caused 
        // by priority inversion).
        *pulpPCIMemory = TRC(0, 2, TRC_MT, TRC_ACQ, 0, 1);
        // This test cycle is finished ; let the medium-priority thread now to stop looping
        iTest = 0;
        // Release the mutex to start the next sequence
        ReleaseMutex(HandleMutex);
    }
    return 0;
}

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video