Web Development

Thread Pools and Server Performance

By John Calcote, July 01, 1997

As John points out here, thread pools provide one way of improving server performance.

Dr. Dobb's Journal July 1997: Internet Programming

John, an engineer on Novell's Directory Services team, can be contacted at [email protected].

Network service protocols such as HTTP, NNTP, and SMTP have traditionally been implemented on UNIX platforms as heavyweight process servers. When such a server receives a request, it starts a child process, passing it the request and the client connection handle via some form of interprocess communication -- usually command-line arguments, environment variables, and inherited file handles. This child process services the request, then terminates. These UNIX environments are optimized for this sort of activity. Processes are cached in memory and code segments are reused for much-improved efficiency. Still, it is fairly time consuming in CPU terms to create the process-control block, and maintain the run-time resources and bookkeeping information for each executing process.

Today's operating systems emphasize threads over processes. Under Windows NT, for instance, creating a new process is much more CPU intensive then creating a new thread of execution in an existing process. Furthermore, operating systems now give you more control over the scheduling of threads than of processes. Even so, starting a new thread can carry a fair amount of overhead with it as well. Some HTTP servers on the World Wide Web service hundreds of thousands of requests per day. If such a server processes only 100,000 requests in a 24-hour period, each request must be serviced in an average of 0.864 seconds. Such servers cannot afford to use even a quarter second creating a new thread for each incoming client request. In reality, many of these requests are being serviced simultaneously. Even UNIX process-oriented servers run several copies of the child process request handler at a time.

In actuality, the number of requests per second varies from minute to minute. Sometimes, requests come in so fast the server doesn't have time to accept the connection in a reasonable amount of time. Clients experience this in the form of a timeout, usually accompanied by a message to the effect that the server is currently unavailable. This all depends, of course, on the ability of the client to correctly interpret a lack of response from a server. Some client applications simply tell users that the server cannot be found, which can be misleading, if not annoying.

A well-written server application is optimized in two important areas. First, it can process connection requests as fast as possible. Note that a connection request refers specifically to the connection indication event generated by the underlying transport service provider. Second, it can be tailored by the administrator to politely reject all simultaneous connection requests above a certain threshold number.

Tried and True

Typically, a server application is written with a listen thread for each port or socket to be monitored. This thread might sit in a loop sleeping on a call to an operating-system socket monitor function, such as the Winsock select or TLI listen function. When a connect indication is received, this function wakes up and returns control to the monitor thread, which then attempts to receive the indication (or data, in the case of a datagram protocol). Once the data or indication is retrieved, it must be processed, but the monitor thread needs to do this as quickly as possible to get back to listening for more connection indications. Remember, if the monitor thread is not blocking on a call to select, then it's not listening for connection requests. Example 1 is pseudocode that depicts a commonly used monitor algorithm.

What the server thread does with the client's connection handle is of no concern to the monitor thread. As far as the monitor is concerned, the client has been serviced. Although this looks fairly efficient at first glance, the acts of retrieving the connect indication, establishing the connection, and starting the new server thread can be overly time consuming for a truly high-performance server. It would be nice if one could write code such as Example 2, a thread-pool-based monitor. Essentially, the server application maintains a pool of threads that may be used to execute the monitor function. Each time the currently executing monitor thread returns from a call to select, the first thing it does is reschedule itself with the thread pool, starting a new copy of the monitor thread and blocking on select. Meanwhile, the original monitor thread continues, servicing the client request to completion.

Listing One is a simple thread-pool-based server program that monitors a well-known socket for client requests. This server uses a connectionless protocol, which, in this case, means that the client passes all request data in a single initial packet (called a "datagram"). The server responds by also returning a single packet. In this protocol, the client must send the five-character string "Ping" (including the terminating NULL character). If the client sends the correct request, the server answers by returning another five-character string ("Pong"); otherwise, the error message "101 Invalid request" is returned. Granted, this is a bit contrived, but the general process is the basis for nearly all standard high-level, text-based Internet protocols. (Ping.h and related files are available electronically; see "Availability," page 3.)

Pools of Threads

I've based my thread pool's interface on Novell's Asynchronous Event Services (AES) API. The AES API essentially contains the two functions in Example 3(a), with AESEVENT being defined as Example 3(b).

In Novell's implementation, the AESEVENT structure passed to ScheduleAESEvent must be declared in a manner that assures its validity for the life of the event. This is because the data structure instance itself is linked (by way of the next field) into the system's internal list of events waiting to be serviced, then passed to the event handler as event data. This may seem a bit odd, but the reasons for this sort of one-track, efficiency-minded attitude at Novell are historical in nature -- and not necessarily bad -- in most instances. There are significant advantages to passing the event's AESEVENT data structure to the event handler. In the first place, nothing can go wrong with ScheduleAESEvent -- thus, the void return value. It doesn't even allocate memory. To pass private event data, you can allocate your structure instance larger and define your own data fields after the system-defined fields.

Cancelling an event is simply a matter of passing the address of the same structure instance to CancelAESEvent. If the event has not already been scheduled, and thus removed, from the waiting event list, it can be found and removed using the pointer address as a lookup field. If it has already been scheduled, then the allocated thread is already running your handler, and must be terminated in a more proprietary manner. This is indicated by the return value of CancelAESEvent.

My version of this interface encapsulates all of the necessary data for an entire thread pool in a C++ class called AES (see Listing Two). This class contains two private internal class definitions, one called PoolThread and the other, Event. PoolThread is used to manage a single pool thread, while Event is used to store event information for each scheduled event. Events are scheduled by the ScheduleEvent member function, which takes a delay time in seconds, a pointer to an event-handler function, and a generic pointer to data to be passed through to the event handler. It returns a handle to an event, which is implemented internally as a cast from the allocated data-block pointer. This handle may be passed to the CancelEvent member function. My version of ScheduleEvent can fail in low-memory conditions. One of the main reasons I chose to implement this API as a C++ class is the freedom it allows users to reimplement internal functionality without noticeably affecting the interface.

Listing Two shows my implementation of the AES class and, subsequently, its private classes. The pool monitor and pool thread-control functions are static members of the PoolThread and AES classes (respectively, Monitor and Worker). Because they are static members, they have no implied this pointer. I solve this problem by passing the instance pointer as the thread data for each thread created. The pool monitor gets a pointer to the AES instance, while the pool thread gets a pointer to its PoolThread instance.

The AES constructor takes a single parameter -- the number of threads to create on instantiation. This parameter defaults to 8 if left unspecified. When an event is scheduled, a new event block is allocated, initialized, and inserted into the AES waiting list. Then the monitor semaphore is released and an event handle is returned to the client. Releasing the monitor semaphore wakes up the monitor thread, which rescans the list of waiting events for the nearest event due. If there are no events ready to be scheduled, the monitor sleeps on its semaphore for a time period calculated to wake it up when the closest event is due. If events are due, but no pool threads are available, a debug message is posted, indicating that the pool needs to be tuned, and the monitor sleeps until a new thread becomes available. Simply sleeping may seem inefficient for a low-thread scenario, but when tuned properly, the pool will rarely run out of threads during a session. If it runs out too often, you need to increase the number of threads in the pool for your application.

More than Meets the Eye

The uses of this AES-based implementation of a thread pool extend beyond those I've established here. This thread pool may also be used to schedule timed events. For example, Novell's Service Advertising Protocol (SAP) uses AES to broadcast a service-advertising packet every 60 seconds. Using class AES, the broadcast function can be written like Example 4. In this example, presumably there is a SAP class instance, which contains an AES instance member and whose pointer has been passed as event data to this event handler. The AES object might just as well have been declared as a static global, accessible to this handler by virtue of its definition in this module.

There is a lot of room for improvement in this thread pool. The stack size might be an issue in your application, and thus become a tunable parameter to the AES class or the individual threads. The one problem this might present would show up during creation of the thread pool in the AES constructor. The pool is allocated as an array of PoolThread objects. If you need to pass a stack-size parameter to the constructor for the PoolThread objects, you will not be able to use array allocation. One solution to this limitation is to allocate raw memory for the thread-pool array using operator new, then use placement new to instantiate a PoolThread object for each block in the array. If you do this, you should call the destructors for each PoolThread directly, then deallocate the raw memory using operator delete. Another possible feature that the pool could provide is thread-local storage on platforms that support it for maintaining thread-specific data.

However you choose to implement a thread pool, it's important to keep a few things in mind. First, this pool's monitor thread does not poll the waiting list during idle time. Neither do the pool threads poll their respective event pointers. When not working, they are always sleeping on a semaphore. Virtually no CPU overhead is unnecessarily maintained. The whole idea is to improve efficiency. In the high-performance server discussed earlier, it was critically important that the least amount of cycles be used between the current thread's return from select and the next thread's call to select. Wasting cycles during a call to ScheduleEvent defeats the purpose of the thread pool. For this reason alone, you might consider reimplementing the thread pool using Novell's strategy of linking the event blocks right into the system's waiting list.

Listing One

/* Ping.cpp */

</p>
#define WIN32_EXTRA_LEAN
#include <windows.h>
#include <winsock.h>
#include <wsipx.h>
#include <conio.h>
#include "aes.h"


</p>
#define SERVER_SOCKET   0x3200


</p>
SOCKET g_skt;
BOOL g_serverUp;
AES *g_aesp;
AES::AESHandle g_hEvent;


</p>
void ServerThread(void *)
{
    if (!g_serverUp)
       return;
    fd_set rdSkts;
    FD_ZERO(&rdSkts);
    FD_SET(g_skt, &rdSkts);
    struct timeval tv = {5,0};
    int rval = select(0, &rdSkts, 0, 0, &tv);


</p>
    // schedule another pool thread to listen
    g_hEvent = g_aesp->ScheduleEvent(0, ServerThread, 0);


</p>
    // use this thread to process client request to completion
    if (rval < 0)
        printf("Winsock: select returned %d.\n", WSAGetLastError());
    else if (rval > 0)
    {
        SOCKADDR_IPX addr;
        int recvLen, addrLen = sizeof addr;
        char recvBuf[5];
        if ((recvLen = recvfrom(g_skt, recvBuf, sizeof recvBuf, 0,
            (struct sockaddr *)&addr, &addrLen)) == SOCKET_ERROR) 
        {
          printf("Winsock: recvfrom returned %d.\n",WSAGetLastError());
          return;
        }
        char *resp = strncmp(recvBuf,"Ping",5)?"101 Invalid Request.":"Pong";
        int respLen = strlen(resp) + 1;
        if (sendto(g_skt, resp, respLen, 0, (struct sockaddr *)&addr, 
                sizeof addr) == SOCKET_ERROR)
            printf("Winsock: sendto returned %d.\n", WSAGetLastError());
    }
    return;
}
BOOL StartServer(WORD wSocket) 
{
    int err;
    SOCKADDR_IPX addr;
    WSADATA wsaData;


</p>
    if (!(g_aesp = new AES))
    {
        printf("Unable to create AES object.\n");
        return FALSE;
    }
    if (err = WSAStartup(MAKEWORD(1,1), &wsaData))
    {
        printf("WSAStartup returned %d.\n", err);
        goto errout1;
    }
    if (LOBYTE(wsaData.wVersion) < 1 || HIBYTE(wsaData.wVersion) < 1)
    {
        printf("Winsock version too low.\n");
        goto errout2;
    }
    if ((g_skt = socket(AF_IPX, SOCK_DGRAM, NSPROTO_IPX)) == INVALID_SOCKET)
   {
        printf("Windsock: socket returned %d.\n", WSAGetLastError());
        goto errout2;
    }
    memset(&addr, 0, sizeof(SOCKADDR_IPX));
    addr.sa_family = AF_IPX;
    addr.sa_socket = htons(wSocket);
    if (bind(g_skt, (struct sockaddr *) &addr, sizeof(SOCKADDR_IPX)) != 0)
    {
        printf("Winsock: bind returned %d.\n", WSAGetLastError());
        goto errout3;
    }
    g_serverUp = TRUE;
    if (!(g_hEvent = g_aesp->ScheduleEvent(0, ServerThread, 0)))
    {
        printf("AES: Unable to schedule event.\n");
        g_serverUp = FALSE;
        goto errout3;
    }
    return TRUE;
errout3:
    closesocket(g_skt);
errout2:
    WSACleanup();
errout1:
    delete g_aesp;
    return FALSE;
}
void StopServer(void) 
{
    g_serverUp = FALSE;
    g_aesp->CancelEvent(g_hEvent);
    delete g_aesp;
    closesocket(g_skt);
    WSACleanup();
    return;
}
void main(void)
{
    if (StartServer(SERVER_SOCKET))
    {
        printf("Server running, press any key to terminate...\n");
        getch();
        StopServer();
    }
    printf("Server terminated, press any key to exit...\n");
    getch();
    return;
}

Back to Article

Listing Two

/* AES.cpp */

</p>
#include "AES.h"


</p>
#include <process.h>
#include <limits.h>
#include <crtdbg.h>


</p>
unsigned __stdcall AES::PoolThread::Worker(void *ref)
{
    PoolThread& This = *(PoolThread *)ref;
    while (!This.dying)
    {
        Event *event = This.event;
        This.event = 0;
        if (!event)
            WaitForSingleObject(This.sem, INFINITE);
        else
        {
            event->Work();
            delete event;
        }
    }
    return 0;
}
AES::PoolThread::PoolThread(int stack /* = 0 */ ) : dying(FALSE), 
        sem(0), handle(0), event(0), InitState(0) 
{
    if (!(sem = CreateSemaphore(0, 1, 0, 0))
            || !(handle = (HANDLE)_beginthreadex(0, stack, 
                    Worker, 0, 0, 0)))
        InitState = ERR_CANT_CREATE_HANDLE;
    return;
}
AES::PoolThread::~PoolThread(void)
{
    dying = TRUE;
    if (sem)
    {
        if (handle)
        {
            ReleaseSemaphore(sem, 1, 0);
            WaitForSingleObject(handle, INFINITE);
            CloseHandle(handle);
        }
        CloseHandle(sem);
    }
    return;
}
unsigned __stdcall AES::Monitor(void *ref)
{
   AES &This = *(AES*)ref;
    _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_DEBUG);
    while (!This.dying)
    {
        int eventsWaiting = 0;
        ULONG now = ULONG(clock() / CLOCKS_PER_SEC);
        ULONG nextEventTime = ULONG_MAX, lastComplaintTime = 0;
        EnterCriticalSection(&This.critSec);
        for (EventList::iterator eli = This.eventList.begin(); 
                eli != This.eventList.end(); /**/ )
        {
            for (int i = 0; i < This.size && This.pool[i].Busy(); i++)
                ;   // find next available worker
            if (i < This.size && (*eli)->When() <= now)
            {
                Event *event = *eli;
                eli = This.eventList.erase(eli);
                This.pool[i].Alloc(event);
            }
            else
            {
                if ((*eli)->When() <= now)
                    eventsWaiting++;
                else if ((*eli)->When() < nextEventTime)
                    nextEventTime = (*eli)->When();
                eli++;
            }
        }
        LeaveCriticalSection(&This.critSec);
        if (!eventsWaiting)
            WaitForSingleObject(This.sem, (nextEventTime - now) * 1000);
        else
        {
            if (lastComplaintTime + 3 < now)
            {
                _CrtDbgReport(_CRT_WARN, 0, 0, 0,
                    "Thread pool overflow. %d events delayed.", eventsWaiting);
                lastComplaintTime = now;
            }
            Sleep(100);
        }
    }
    return 0;
}
AES::AESHandle AES::ScheduleEvent(ULONG ulDelay, 
                                           void (*pFunc)(void*), void *pData)
{
    Event *event;
    _ASSERT(pFunc);
    if (!(event = new Event(ulDelay, pFunc, pData)))
       return AESHandle(0);
    EnterCriticalSection(&critSec);
    eventList.push_front(event);
    LeaveCriticalSection(&critSec);
    ReleaseSemaphore(sem, 1, 0);
    return AESHandle(event);
}
BOOL AES::CancelEvent(AESHandle hEvent)
{
    Event *event = (Event *)hEvent;
    BOOL cancelled = FALSE;
    EnterCriticalSection(&critSec);
    for (EventList::iterator eli = eventList.begin(); 
            eli != eventList.end(); eli++)
        if (*eli == event)
        {
            eventList.erase(eli);
            delete event;
            cancelled = TRUE;
            break;
        }
    LeaveCriticalSection(&critSec);
    return cancelled;
}
AES::AES(int threads /* = 10 */) : dying(FALSE), sem(0), 
        handle(0), size(threads), pool(0), InitState(0)
{
    // Initialize event list critical section 
    InitializeCriticalSection(&critSec);
    // Allocate thread pool
    if (!(pool = new PoolThread[size]))
    {
        InitState = ERR_INSUFFICIENT_MEMORY;
        return;
    }
    // Check init state of threads in pool
    for (int i = 0; i < size; i++)
        if (pool[i].InitState)
        {
            InitState = pool[i].InitState;
            return;
        }
    // Create monitor semaphore and thread
    if (!(sem = CreateSemaphore(0, 0, 1, 0))
            || !(handle = (HANDLE)_beginthreadex(0, 0, Monitor, this, 0, 0)))
        InitState = ERR_CANT_CREATE_HANDLE;
   return;
}
AES::~AES(void)
{
    dying = TRUE;
    if (sem)
    {
        if (handle)
        {
            ReleaseSemaphore(sem, 1, 0);
            WaitForSingleObject(handle, INFINITE);
            CloseHandle(handle);
        }
        CloseHandle(sem);
    }
    delete [] pool;
    DeleteCriticalSection(&critSec);
    return;
}

Back to Article

DDJ

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Web Development