Channels ▼
RSS

C/C++

The NewOS Operating System

Source Code Accompanies This Article. Download It Now.


Dec01: The NewOS Operating System

Travis is a firmware engineer at Danger Research Inc. He can be reached at geist@foobox.com.


NewOS is a freely available, open-source, lightweight operating system written from the ground up for a variety of platforms. At this stage of the project, NewOS (http://newos.sourceforge.org/) is mostly a kernel, with little user-space interaction. As the project moves forward, however, the user-space applications and facilities will improve.

I had numerous design goals in mind when launching the project, most of which are common in modern day operating systems:

  • Preemptive multiprocessing/multithreading.
  • Symmetric multiprocessing (multiple CPUs).

  • Full kernel reentrancy.

  • Modern virtual memory system with protected memory.

  • Modular filesystem layer.

  • Architecture independence, easily ported to different platforms.

The kernel is written mostly in C, with some assembly. It should be possible to write user-space applications in just about any language, but for now, support is only in place for C and assembly.

At this writing, I've ported NewOS to the common Intel- and AMD-based PC platform, as well as the Sega Dreamcast — a gaming console running on a Hitachi SH-4 processor. Ports to systems using MIPS and UltraSPARC processors are underway, with plans (but little progress) for ports to systems using Alpha and Motorola 68030 processors.

The build environment for the project consists of the GNU gcc, make, and binutils development tools. These tools can be built for the majority of common operating systems and target every platform that has a reasonable shot of being ported to. The project is built regularly on BeOS, Linux, FreeBSD, Solaris, Irix, and Windows NT. The current targeted processors are Intel IA-32 and Hitachi SH-4. The flexibility of these tools is a great boon to the project.

Design Highlights

NewOS implements the traditional multiple threads per process model. Each process is defined as a set of threads sharing a single address space, file descriptors, and permissions. A special process exists that all of the kernel threads belong to.

The first thread in a process is treated in a slightly different way from the others. This thread is considered the primary thread and the process cannot exist without it. If the primary thread ever dies for any reason, the entire process will be killed. In the case of single-threaded applications, this has no effect, but in a multithread app, the primary thread must be kept alive, or the app can quickly shut down by simply exiting the primary thread. There is no facility for changing the primary thread currently, but it could be implemented if necessary.

The act of creating and deleting processes differs from UNIX in many ways. The model implemented is not based on the UNIX fork/exec model, but a simpler model whereby a process simply creates another with a proc_create_proc call, without the complexities of file descriptor inheritance or address space copy details that traditional UNIX systems have to deal with or are blessed with, depending on your perspective. This, however, adds some limits to the flexibility of the system, and may be expanded in the future. Listing One contains function declarations for the threading API.

Threads are created in a similar way, with a call that simply creates a new one in the target address space. Most of the time this will be the current address space, but it does allow a thread to be created as part of the kernel process for system tasks and the like.

As each thread dies or exits with a sys_exit system call, the user space and kernel stack that had been created for it is destroyed and the thread is removed from the process and all structures returned. I took care to allow the thread that is exiting to clean up after itself completely. This manifests itself in the way it removes its own kernel stack. The last thing the thread does is switch to a temporary stack and remove its old kernel stack. Many other operating systems get around this complexity by having a worker thread clean up the kernel stack, but this can lead to bottlenecks in a busy system. The last thread in the process returns all of the process's resources and removes the last traces of the process.

Each thread dies or quits with a return code. The return code is passed into the sys_exit system call or the kernel supplies it if the thread is killed. Any thread in the system can obtain the return code of a thread if it was blocked on the thread's destruction using the sys_thread_wait_on_thread system call. The return address of this function is the return address of the thread. The return address of an entire process is simply the return address of the process's primary thread. The system call sys_proc_wait_on_proc is largely a special case of sys_thread_wait_on_thread that blocks on the primary thread. To implement blocking on a thread, a semaphore is associated with each thread with a count of 0, so that any thread that attempts to acquire it will immediately block. When the thread is killed, the semaphore is destroyed as well, releasing all threads blocked on it. A special argument can be passed to the semaphore destruction function that passes the argument's value on to anyone blocking on the semaphore. This facilitates passing the return code on to the blocking threads without having to deal with any sort of leftover death records or zombie processes.

Timer Services

NewOS utilizes a somewhat different timer mechanism than many traditional UNIXs. The traditional method for keeping track of time is to have a timer go off at a fixed interval and increment the current time counter by the timer's interval value in the timer's interrupt handler. NewOS's system is more dynamic and utilizes some more interesting processor features found on many modern processors.

The NewOS timer code is simple, relying on a few features of many architectures. First, the platform must have at least one variable timer, preferably at least one per CPU. Also, the platform must have some ability to determine the system time to a reasonable accuracy relative to some fixed point in the past, without using the system timer itself. Reasonable accuracy is preferably one millisecond or lower. On the Intel IA-32 architecture, for example, the system time can be determined by reading the CPU's instruction count register using the rdtsc instruction. This returns a 64-bit value, which is a counter of the number of CPU cycles since some arbitrary point in the past, usually power on time. This then has some math applied to it to convert it to microseconds since the OS was booted. Other architectures either have similar functionality, or have it emulated.

The timer queue is a list of events, sorted by the absolute system time at which the event needs to be triggered. As the system timer goes off, the head of this list is consulted, has its scheduled time compared with the current time and acted on if its scheduled time is in the past. This is repeated until the head of the list is not ready to be triggered. A new timer interrupt is then scheduled to fire at the time of the new head of the list. Utilizing this method, the system timer can be accurate to the microsecond, and no extra timer events go off, wasting CPU cycles. Listing Two presents function and structure declarations for the timer module.

A timer event is simply defined as a callback function pointer with an optional pointer argument that needs to be called. The system reschedules if the callback returns INT_RESCHEDULE in the same way any other interrupt handler forces a reschedule. This is used by the thread scheduler to arrange a reschedule at the end of the current thread's quantum.

A separate timer queue with a separate timer is kept per CPU, triggering independently of each other. When an event is set to fire, it is put into the list of the CPU that made the call to timer_set_event. The separate list is kept because in some cases, the timer has to go off on the same CPU, such as the rescheduling event.

Locking Primitives

To keep things simple, NewOS is built around a single type of locking primitive: the multicount semaphore. This type of semaphore is composed of a counter and a thread queue. The counter is initialized to whatever the user wants at the semaphore creation time. This semaphore model is borrowed heavily from the semaphore design used on the BeOS.

The semaphore works as follows: When a thread acquires it using sem_acquire, a count is passed. The semaphore's counter is decremented by this value. If the new value of the counter is less than 0, the thread that acquired it is blocked by being placed on the semaphore's wait queue. The count that the thread had acquired the semaphore to be blocked is stored in the thread structure. When a thread releases the semaphore with a particular count, the semaphore's counter is incremented by the passed-in counter value, and the appropriate number of blocked threads in the semaphore's wait queue are released. Listing Three presents the semaphore API.

A semaphore can also be acquired with a timeout value. This allows the call to sem_acquire to return after a specified amount of time if the semaphore cannot be acquired. This is accomplished by setting a timer event to fire and release the calling thread from the semaphore's queue. Using this facility, the threading code suspends a thread temporarily on behalf of thread_snooze by acquiring a reserved semaphore with an initial count of zero with the appropriate timeout.

When a semaphore is deleted, all threads blocked on it are released immediately. If sem_delete is called with an optional return code argument, and the blocked threads acquire the semaphore with a different form of sem_acquire, the return code is placed into a variable in the blocked threads' structures as they are awakened, and the blocked threads return with the return code. This functionality is utilized to get the return code of a thread or process.

Symmetric Multiprocessing

Implementing symmetric multiprocessing, or SMP for short, turned out to be relatively easy, since it was taken into account at the beginning of the project. It's easier to implement if the design of the entire system is taken into account; all proper locking is placed at key locations.

The primary implementation details of SMP support that require some time are proper locking of key data structures and inter-CPU interrupts.

Locking data structures against multi-CPU corruption is covered by utilizing "spinlocks" — a simple locking primitive consisting of a single Boolean value, an int in this case. Acquiring a spinlock consists of using the "test and set" atomic operation of the CPU to attempt to set the lock to 1, only passing if the value had previously been 0. If the value is already 1, simply "spin" forever until it changes to 0. Releasing a spinlock is done by simply setting the lock value to 0. Listing Four contains declarations of the spinlock functions.

Since any CPU spinning on a spinlock is using CPU cycles and memory bus cycles doing so, spinlocks must be held as short as possible. They are used largely in areas of the kernel where no other synchronization primitives are available, or where rescheduling is not permissible, such as in the scheduling code itself or the semaphore implementation. Due to the nature of the spinlock, they can only be acquired and released with interrupts disabled, because it would be disastrous to reschedule with one held and potentially have the lock be reacquired for another reason.

Inter-CPU interrupts (ICI) are relatively easy to implement. The interrupts themselves are done in an architecturally dependent way. Once the interrupt is triggered, the ICI code looks in a "mailbox" for a message to the interrupted CPU, performs the action defined on the message, and returns. The ICI messages range from simple thread reschedules, to memory management unit page flushes, to an immediate CPU halt. There is a separate mailbox for each CPU, and a global mailbox for all CPUs for broadcast messages.

Memory Management

Entire books can and have been written on effective memory management. This is an area of OS design that is always expanding. The virtual memory system (VM) design NewOS uses is relatively complex, but fairly straightforward and not particularly unique. The design borrows heavily from UNIX/BSD. The API interface of the VM borrows heavily from BeOS.

The VM operation consists of managing separate address spaces for each process and the kernel. Each address space is broken into smaller regions, which are simply areas of pages that share similar characteristics. A region backs a particular type of VM store object, such as a memory-mapped file, a memory-mapped device, or memory backed by a swap file. Much of the VM code deals with setting up these complex data structures.

The other half of the VM deals with handling page faults on particular pages. In this case, the VM has to determine whether or not to kill the thread that caused the page fault, map a new page into the spot, copy an old page and map it, bring the page into existence from a swap file, or fill the page with data from a memory-mapped file.

Filesystem Layer

To support multiple filesystems as well as filesystem-mapped devices, I had to design a flexible and extensible filesystem layer. This is an area of OS design that seems to vary somewhat between different implementations. The NewOS implementation is no different, borrowing heavily from other implementations. Still, it does a few things differently.

The filesystem layout is similar to most UNIX variants in that all mounted volumes are laid out in a single tree, starting with the root mount point /. A default memory-based filesystem, rootfs, is mounted at the root of the tree and used as a placeholder for other mount points.

Another similarity to UNIX is the placement of device files in the /dev/ directory. However, NewOS differs from other UNIX variants by forcing the device driver itself to provide the filesystem code and mount point within the /dev/ hierarchy. As a result, there are many different filesystems present in a running NewOS system. The filesystem interface is fairly simple, and many common functions can be shared between filesystems to keep the memory overhead down.

The API that a filesystem must implement is relatively simple. As it currently stands, the filesystem must provide 12 calls, most of which are implemented in complementary pairs such as open/close or read/write. The individual filesystem can choose not to implement some of the calls, such as read or write, and a default one will be provided by the filesystem layer.

The main implementation difference in the NewOS filesystem API is the definition of a file stream. An individual file in NewOS has a lot of flexibility in what can be opened or dealt with within that file. In a traditional UNIX system, a file is a sequence of bytes, thus a single stream that can be opened and manipulated. In NewOS, however, a file can have any number of named streams, differentiated by a name, the default being the Null stream "". In addition, a 32-bit stream type identifier is needed, with a few default stream type constants such as STREAM_TYPE _FILE, STREAM_TYPE_DIR, STREAM_TYPE _DEVICE, and so on. This is used to replace the concept of different modes in UNIX, and having to deal with the same file using a different API depending on the mode. In the NewOS method, the treatment of streams is much more explicit, and ultimately more flexible. This allows the filesystem API to be simplified in some ways by using a common set of manipulators (open/close, read/write) to deal with any type of stream. The traditional UNIX opendir/closedir calls do not exist in this API. A directory is simply opened with a STREAM_TYPE_DIR type and dealt with read/write on a record-based interaction. Whether or not all of this flexibility is needed is another story, but existing single-stream applications can work within this framework by opening the default Null stream, and hiding the stream type identifier by implementing open and opendir. Listing Five contains the user-space API to the filesystem layer.

At the time of writing, the filesystem API is under heavy development, and may mutate into something else altogether. A few of the required calls are still missing, like any sort of ability to unlink a file. For the most part, my plans are to simply finish it by implementing the rest of the necessary calls. Also, no disk-based filesystem has been written at this point, so it may turn out that a redesign is in order after unseen obstacles are found.

Future Designs

Another plan for the near future is to merge the NewOS's object IDs of different object types into a single namespace. In this case, the namespace will be the same series of 32-bit or possibly 64-bit numeric IDs. The object IDs this would cover would include file descriptors, semaphores IDs, region/address space IDs, and thread and process IDs. This simplifies many things, such as resource location at process teardown time, or the sharing of objects between processes.

Additionally, I'd like to add the ability to wait or signal multiple objects at a time. This is partially implemented in UNIX via the select call and fully implemented in the Windows operating system. The flexibility this brings is enormous. The current scheme involves blocking a single thread on a single object, most of the time a semaphore. The implementation of large servers or other complex applications usually involves creating too many threads to block on objects, which becomes very inefficient.

Conclusion

NewOS has come a long way in the short time that it has been in existence. It has a long way to go before it can be very useful as a tool, but that day is steadily approaching. The main goal of the NewOS project, however, is to have fun implementing it, try some new ideas, and hopefully act as a source of information and inspiration for other "home brew" operating systems.

Many aspects of the NewOS design were not touched upon here, and many things may have changed since the time of writing. To stay up to date with the NewOS project and for further information and source code, check out http://newos.sourceforge.org/.

DDJ

Listing One

proc_id proc_create_proc(const char *path, const char *name, int priority);
int proc_kill_proc(proc_id id);
int proc_wait_on_proc(proc_id id, int *retcode);
thread_id thread_create_user_thread(char *name, proc_id pid, 
                                                int priority, addr entry);
thread_id thread_create_kernel_thread(const char *name, 
                                          int (*func)(void), int priority);
int thread_suspend_thread(thread_id id);
int thread_resume_thread(thread_id id);
void thread_snooze(time_t time);
void thread_exit(int retcode);
thread_id thread_get_current_thread_id();
int thread_wait_on_thread(thread_id id, int *retcode);

Back to Article

Listing Two

typedef int (*timer_callback)(void *);
typedef enum {
    TIMER_MODE_ONESHOT = 0,
    TIMER_MODE_PERIODIC
} timer_mode;
typedef struct timer_event {
    struct timer_event *next;
    timer_mode mode;
    time_t sched_time;
    time_t periodic_time;
    timer_callback func;
    void *data;
} timer_event;
// caller provides the timer structure
void timer_setup_timer(timer_callback func, void *data, timer_event *event);
int timer_set_event(time_t relative_time, 
                                       timer_mode mode, timer_event *event);
int timer_cancel_event(timer_event *event);

Back to Article

Listing Three

#define SEM_FLAG_NO_RESCHED 1 // force sem_release_etc to not reschedule
#define SEM_FLAG_TIMEOUT 2
sem_id sem_create(int count, const char *name);
int sem_delete(sem_id id);
int sem_delete_etc(sem_id id, int return_code);
int sem_acquire(sem_id id, int count);
int sem_acquire_etc(sem_id id, int count, int flags, 
                                 time_t timeout, int *deleted_retcode);
int sem_release(sem_id id, int count);
int sem_release_etc(sem_id id, int count, int flags);

Back to Article

Listing Four

// lock must be pre-initialized to 0
void acquire_spinlock(int *lock);
void release_spinlock(int *lock);

Back to Article

Listing Five

typedef enum {
    STREAM_TYPE_NULL = 0,
    STREAM_TYPE_FILE,
    STREAM_TYPE_DIR,
    STREAM_TYPE_DEVICE,
    STREAM_TYPE_STRING
} stream_type;
typedef enum {
    SEEK_SET = 0,
    SEEK_CUR,
    SEEK_END
} seek_type;
int open(const char *path, const char *stream, stream_type type);
int seek(int fd, off_t pos, seek_type seek_type);
int read(int fd, void *buf, off_t pos, size_t *len);
int write(int fd, const void *buf, off_t pos, size_t *len);
int ioctl(int fd, int op, void *buf, size_t len);
int close(int fd);
int create(const char *path, const char *stream, stream_type type);
// not complete, needs facility to unlink files among others


Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video