Linux Symposium 2005

By Ed Nisley, December 01, 2005

Ed attends the 7th Annual Linux Symposium to find out what's up with Linux.

December, 2005: Linux Symposium 2005

Ed's an EE, PE, and author in Poughkeepsie, NY. Contact him at [email protected] with "Dr Dobbs" in the subject to avoid spam filters.

The 7th Annual Linux Symposium took place in Ottawa. In addition to formal presentations and tutorials, informal Birds of a Feather Sessions covered specific topics of interest to small groups.

Unlike larger trade conferences, the Linux Symposium (LS) has no exhibit floor, no commercial presence, and no hustle. This makes for a much smaller and quieter show, as the attendees come for education rather than entertainment. Well, at least during the day, as the Whiskey-Purchasers BoFS met every evening.

I was impressed by the intensity of the presentations and eventually figured out what's different. Unlike some speakers at commercial conferences, these folks haven't been picked for their speaking ability or stage presence. They speak from deep, first-hand knowledge, rather than reciting bullet items generated by someone else. In fact, they may be the "center of competence" for one particular part of the Linux system.

The audience also carries knowledge and interest that transcends language barriers. A question from the ranks often triggered an esoteric discussion between people who obviously spoke English as a second or third or fourth language, but who had no trouble at all communicating.

It was refreshing, to say the least.

The Big Picture

The single most significant thing I observed throughout all four days was that Big Business is now pushing Linux development hard. Perhaps the best indication of that can be seen in the companies employing the presenters.

To judge from the 53 Linux Symposium speaker biographical sketches, IBM sent 17 speakers, Intel 8, HP 3, Red Hat 3, and other Linux distros about 10. Yes, half the speakers work at various IBM, Intel, and HP locations around the globe.

In contrast, the 43 speakers at the 2nd Annual Linux Symposium in 2000 came from smaller operations. LinuxCare sent 7, Red Hat 5, VALinux 4, ZeroKnowledge 3. On the big-company end, IBM and HP sent one speaker apiece.

As you might expect, big-company programmers work on big-company problems. That's not to say the rest of us don't benefit from the work, but you probably don't do a lot of memory hot-plugging, either. The era of the lone, unsupported coder seems to be drawing to an end, if only because increasing complexity requires a team of experts. In fact, mere mortals can't afford the gargantuan hardware that exercises the new features.

Of those 53 speakers, about 38 gave talks on kernel or system topics. The remaining presentations covered a wide variety of business, application, and embedded topics. The Linux kernel and its infrastructure are undergoing heavy development, with extensions in a number of useful directions.

Shrinking the Large

Given that many of the LS papers deal with large-scale kernel topics, what's in it for embedded systems? Jonathan Corbet (Linux Weekly News) said it best in his keynote address: "Today's big iron is tomorrow's laptop." As we've seen, today's laptop becomes next week's embedded system.

The majority of embedded systems run 8-bit microcontrollers that simply lack the moxie for Linux. The small slice of the market that can afford the megabytes of memory and megahertz of speed required to run a complex operating system currently supports a mix of commercial RTOS and Linux vendors, plus the usual home-grown solutions.

Within that slice, commercial RTOS vendors continue to hold their own in those projects requiring software behavior certified to specific safety and operational standards. Most systems don't require that level of certification, however, providing a perfect entrée for Linux. As nearly as I can tell from the few numbers I've seen, the growth rate for embedded Linux is far higher than anything else and the absolute usage is at least approaching that of traditional RTOS designs.

Although it's an oversimplification to say that any gadget that can run Linux will run Linux, that's probably not far from the truth. The Consumer Electronics Linux Forum exhibited several such trinkets at its CELF BoFS, including a minuscule mobile phone camera. Problems remain with memory footprint, power consumption, and real-time performance. The examples showed that good-enough solutions are available now and better ones are pending.

Suresh Siddha (Intel) described the changes required for a "Chip Multi Processing Aware Linux Kernel Scheduler." The current scheduler ably wrings maximum performance from multiple separate CPUs, but lacks the information to meet other goals that will become critical in the giga-transistor CPUs now looming over the horizon.

Up to this point, maximum-performance scheduling simply meant keeping each CPU busy while the hardware sorts out nuances such as cache contention and bus bandwidth. That works well for CPUs optimized for single-instruction-stream processing.

Although Intel's Hyper-Threading technology made one physical CPU look like a pair of CPUs sharing one memory interface, memory contention could affect overall performance. When Intel's Pentium roadmap imploded due to terrible power-versus-performance numbers, the ensuing multicore CPUs vastly complicated the memory interfaces. System boards with multiple chips, each with multiple CPU cores, each capable of multithreaded execution, can have spectacular performance, if only we figure out how to take advantage of it.

A multicored, multithreaded gaggle of CPUs can achieve high throughput with a slower clock, simply because more results emerge per tick. That's assuming enough memory bandwidth to keep the pipes full, software amenable to parallel execution, and a scheduler competent to orchestrate the whole affair. What's new and different is the requirement to simultaneously minimize power consumption for a given software-performance level, with the upper power limit and the lower performance limit chosen on the fly.

Power consumption in CMOS circuits varies almost linearly with the hardware clock speed, so slowing a core's clock linearly reduces its power consumption, shutting off the clock to unused hardware helps, and stopping a core drops its power consumption nearly to zero. Unfortunately, the vagaries of chip design may force different cores to share a single power source. The lowest overall power may therefore require clocking a single core at top speed, slow-clocking two cores sharing a power source, or some even more bizarre combination. The rules depend on the exact chip, so the scheduler must use deep hardware knowledge that's currently unavailable.

The eventual solution will involve percolating power and clock rules up to the scheduler so that it can determine the cost of various policies, rather than simply keeping all of the hardware busy all the time.

Although this talk concerned Intel CPUs, various DSP and embedded-processor vendors have introduced multicore chips in the last year or so. Most of these appear in applications that require both high performance and low power, so the trend is clear: The end of single-threaded CPU hardware is at hand. It's not clear just how many embedded applications can realize a major performance boost from hardware multiprocessing, but maybe yours will.

Getting It Right

When the Linux kernel was the exclusive domain of enthusiasts, reliability and stability were less important than bare functionality. Big businesses with the Linux kernel at the core of their operation have begun placing major emphasis on getting the bugs out of both existing code and new functions.

Rusty Russel (IBM) described "nfsim: Untested Code Is Buggy Code." The project resurrected the netfilter test suite from the bitrot morass, created a more comprehensive test-case collection, and rebuilt the testing environment. The netfilter code manages the kernel's network interface by massaging incoming and outgoing packets, so this code is at the heart of any networked system.

The nfsim program allows "netfilter developers to build, run, and test code without having to touch a real network, or having superuser privileges." In effect, nfsim is a virtual environment running actual kernel code with the ability to inject errors into simulated network traffic and observe the results. It is now feasible to test error paths in a way that simply isn't possible on a real network.

Injecting and tracking errors is expensive, turning a five-second mainline test run into a 44-minute exhaustive-error marathon. While developing the simulator and the test cases, they also flushed many longstanding bugs out of the netfilter code. In fact, one run using Valgrind to verify their own memory-allocation code lasted 18.5 hours, but exposed a kernel bug.

Despite testing all the error paths, the total test coverage hovers around 65 percent. The tested code includes otherwise untestable paths as well as mainline code and provides considerable confidence that actual errors will be reported correctly.

The programmers observe that any code lacking test cases almost certainly has errors, while noting that "developers have a certain antipathy to testing." What's needed is a change in mindset and they hope nfsim will provide an example of how to go about getting the job done.

Hardware drivers form a particularly sore point for testing and verification. The central problem is that Linux supports a tremendous amount of hardware that doesn't exist in any one place. A developer working with, say, a new SCSI driver simply cannot test any existing code for Other People's Hardware. As a result, old code suffers bitrot that may not be seen until the single organization owning that hardware installs the new code.

You can see a certain circular pattern lying in wait, can't you? The only solutions seem to be heaving out truly obsolete hardware features and simplifying the remaining code to the point where it must work. That Quixotic process is ongoing.

Embedded Machinations

The new verb "to brick" has been making the rounds, referring to a flash-memory update rendering a device unbootable, whether due to a power failure or a firmware error. Gilad Ben-Yossef (from Codefidence, as in "Code Confidence") described cfgsh, a shell environment designed for safe in-the-field flash memory updates.

It seems that embedded-system designers take the path of least resistance when planning update procedures. UNIX expatriates tend to manually untar archives and twiddle rc files, a process inevitably leading to completely inconsistent and undocumented firmware states, the bane of tech-support desks everywhere. RTOS veterans produce each firmware build as a huge binary lump, with any subsequent changes applied as a delta to the unpacked files. Regardless of the technique, an untimely reset can trash the firmware and leave the system unbootable: a brick.

The cfgsh firmware automates the update process by completely configuring the new firmware, then flipping from the old firmware to the new in a single atomic operation. It also gracefully handles the case of continuous reboots or hangs due to firmware errors. In short, even if you think you can do this stuff yourself, read the paper to discover several failure modes you didn't consider.

Keith Packard (of HP, but probably no relation) described TWIN, a 100-KB Tiny WINdowing system that replaces the 4-5 MB X Window system, in a talk that I couldn't attend. TWIN is designed for "sub-PDA" devices such as phones and a cute distributed computer node in use at HP Cambridge. The secret to size reduction lies in a relentless concentration on not including every bell and whistle found useful in the last four decades of graphics research.

Chasing the Kernel

Dave Jones (Red Hat) presented the final keynote address on the need for better kernel bug reporting and testing, observing that, although the Linux kernel has an enviable reputation for continuity, it's certainly not error free. Perhaps the fundamental problem is that kernels can't be tested under real-world conditions until they're released for general use, but for some unknown reason, nobody wants to run development kernels on production machines.

Version-to-version kernel stability and backwards compatibility is pretty much ruled out by the Linux kernel development methodology, which is the main reason the kernel now accommodates such a wide range of hardware. However, internal great leaps also affect external interfaces, which poses a considerable problem for applications with projected lifetimes longer than Linux has been around.

The best advice I've read is that embedded systems developers using the Linux kernel must break a long-standing habit by not keeping up with the latest kernel changes. Regardless of how nifty a new feature might be or which bug got squashed, you must learn to ignore those changes if they don't affect your application. Of course, that means you must track and evaluate all kernel changes. But you were doing that anyway, right?

Reentry Checklist

The Linux Symposium differs from commercial conferences in another regard: You can freely fetch the proceedings as PDF files from http://www.linuxsymposium.org/2005/.

You can look up odd words in Wiktionary, a multilingual dictionary at http://www.wiktionary.org/. If that's too tame, try http://www.urbandictionary.com/.

Ottawa also hosted a related pair of two-day Linux conferences prior to the LS: the Kernel Summit and the Desktop Conference. The former, a small invitation-only event for kernel developers, gathered key folks together in one place at one time for the sort of discussion, planning, and schmoozing you can't do electronically. The latter concentrated on issues relevant to application development for the desktop, with a smattering of kernel issues. I didn't attend either meeting, but you can read more at http://lwn.net/Articles/143649/ and http://www.desktopcon.org/.

A comprehensive comparison of Windows and Linux metrics, including some embedded OS design estimation, is at http://www.dwheeler.com/oss_fs_why.html.

The Valgrind project may be just what you need to get bugs out of your own code. It's named after the entrance to Valhalla, pronounced "val-grinned," and found at http://www.valgrind.org/.

Subscribe to the Linux Kernel Mailing List at http://lkml.org/.

Reader Motti Shimoni sends in a plea for you to turn off your monitor. He works in a huge corporation that does third shift remote updates, so all PCs must be left on overnight. It seems everyone also leaves the monitor on and nobody uses screensavers with display power management. Maybe that corporation lacks a clue, but yours shouldn't: Turn on power management and turn off the monitors, okay?

DDJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.