Dr. Dobb's | Getting Experienced

Getting Experienced

Whether it's avoiding an error you've made before or recovering quickly from a new problem, experience matters.

March 13, 2007
URL:http://www.drdobbs.com/embedded-systems/getting-experienced/198000606

Ed is an EE and author in Poughkeepssie, NY. Contact him at [email protected] with "Dr Dobb's" in the subject to avoid spam filters.

One of the hidden themes in last year's Embedded Systems Conference was that experience matters. Whether it's avoiding an error you've made before or recovering quickly from a new problem, knowing your field in depth helps you get the job done. Whether that's a marketable skill depends on circumstances beyond your control, but at least you get some satisfaction from knowing what you're doing.

This month, I'll collect some data points to show how this works out in practice.

Missing One Little Point

The fundamental illusion of all desktop and server application development methods goes something like this: Resources are free, unlimited, and readily available. Should your code run out of something, don't worry, more is on the way. Over the decades, we've seen some truly bizarre hardware designs intended to maintain that illusion.

For example, within the lifetime of those yet living, a "personal computer" with 640 KB of memory should have been, as Bill Gates evidently didn't say, enough for anyone. Intel's 8086 architecture grafted 20 physical address bits on a 16-bit datapath, inspiring all manner of bizarre memory expansion techniques culminating in the Lotus-Intel-Microsoft Expanded Memory Specification for paged access to megabytes of agonizingly expensive RAM.

The jump to 32 address bits sufficed for a mere decade, whereupon Intel's Physical Address Extension (another four bits!) gave paged access to 64 GB of physical memory. The jump to 64-bit microcontrollers, with current implementations limited to addresses of a mere 40 or 48 bits, should last longer.

On the other hand, we're chewing through two address bits per year and 2020 isn't all that far away.

Embedded development differs slightly: You don't have enough of at least one resource and you won't get any more in time to finish the job. In the bad old days you ran out of memory or datapath, nowadays there's not enough connectivity or power (which is identical to CPU speed). Sometimes, you have enough of everything, but it doesn't work quite the way you expected or has limits that your tools simply don't notice.

Some years ago, a friend called for some advice on a power-line data logger he'd built to track down utility-power glitches. It wrote a running record of voltage disturbances into a nonvolatile EEPROM, but after a week or two the memory failed. In fact, several different memory chips had failed the same way.

The EEPROM datasheet specified an endurance of 100,000 write cycles, which seemed like enough for years of recording at the expected rate. Alas, it transpired his toolchain had inadvertently put a few "hot" variables, ones updated every few seconds, into the EEPROM's address space. The arithmetic is easy:

605 × 10³sec/week=

168 hour/week × 3600 sec/hour

Perhaps surprisingly, a variable written every six seconds chews through the EEPROM's entire rated lifetime in a week. The remainder of the EEPROM might still be usable, but those addresses were hammered dead, dead, dead. Once we found the problem the fix was trivial, but it still required manual intervention.

Contemporary flash memory includes wear-leveling routines in the on-chip controller that automagically shift repeated writes at the same address to different physical cells. That's particularly helpful for applications designed without regard for memory that expires after some number of uses.

The FAT filesystem used in essentially all USB flash drives comes to mind. All file additions, deletions, or size changes also update the File Allocation Tables at fixed locations near the start of the disk image. In the absence of wear leveling, those disk sectors can fail very quickly on an active drive.

Small embedded systems often include flash memory rather than a hard disk, trading reduced storage capacity for increased mechanical durability. Although the memory can be connected directly to the CPU for use with a flash-friendly filesystem, the trend seems to be toward hardwired disk emulation through standard PC-oid interfaces.

In fact, you might well be able to plug in a stock IDE hard drive during development and replace it with a flash drive sporting an IDE interface for deployment, all without changing a single line of code. What's not to like?

Several ESC presenters described a common interaction between off-the-shelf hardware, standard operating systems, and developers assigned to their first embedded project. It seems that system event logs, so familiar in desktop and server environments and so vital during development, sometimes find their way into the finished product. That might be handy for debugging, but tends to fill up small flash drives while wearing out the underlying memory cells.

Oops.

The trick, of course, involves turning off the log-to-disk daemon or service before releasing the project. Should you forget, you'll certainly get a reminder!

Building Experience

Bill Gatliff describes the fundamental stumbling block for new embedded developers: "Everything must work for anything to work." All those familiar, modern programming and debugging conveniences depend on a vast infrastructure of stuff that we all take for granted, but which must work flawlessly before we can get anything done. Embedded systems development can start with bare silicon and work upward from there, a task that demands intimate familiarity with techniques and gadgetry far removed from the usual high-level language wars.

You may choose one of two paths that lead to a working system—buy or build. Your choice determines the required level of in-house expertise, has surprisingly little to do with the eventual cost or schedule, and interacts strongly with your personnel retention policies.

Conventional wisdom tells companies to concentrate on their core competency, those snippets of knowledge that distinguish whatever it is that they sell from their competitors' offerings. A biz should assemble the largest-possible hunks of generic hardware and software from folks who actually know how to make them work, add some Secret Stuff, and sell the result. This maximizes ROI by not wasting time fiddling with details.

Medium-scale embedded projects, those large enough to justify an OS and small enough to escape the per-unit cost stricture of mass-market gadgets, now tend to start with stock hardware including all the usual interfaces. Coupling such a board with the unique peripherals for your project eliminates the tedious hardware design required to get a microcontroller up and running, while not including too many extraneous parts. The second version may use a custom board that incorporates all the hardware in one lump, but it'll work remarkably like the original collection of parts.

You'll also get a Board Support Package for the OS of your choice; indeed, if the board doesn't come with such a BSP, then you have likely chosen the wrong board. The BSP includes, at a minimum, the boot loader and drivers required to get the OS up and running from power-on and may extend to a preconfigured, ready-to-boot OS upon which you can immediately begin building your application.

In the ideal case, you need never concern yourself with the details hidden in the BSP, drivers, and OS. In actual practice, you'll generally develop drivers for your unique hardware, at which point you must begin tracking down the usual assortment of errors.

Again ideally, those errors will reside entirely in your code and be subject to the same high-level debugging techniques and tools you deploy during desktop and server development. Many errors will be that well-behaved, but you'll surely trace at least one perplexing glitch right out of your code and into the infrastructure, at which point you're in trouble.

The upside of buying huge chunks of proprietary software is that you need not understand how it all works in order to use it. The downside, that you don't understand how it all works, only comes into play after things go wrong. When you cannot read and modify all the source code, you must depend on the vendor to resolve problems, track down errors, and supply patches.

That situation firmly affixes your product plans to your vendor's patch cycle. If your contract is good enough, perhaps even sporting a service-level guarantee, you're in fine shape. If it's not, then the ideal buy-build trade-off can look ugly, indeed.

That trade-off may also affect your future maintenance obligations. When (not if) your vendor discontinues support for the code in your product, you must either drop your support or upgrade the gadgets to whatever the new version requires. In the worst case, you cannot retrofit new software into old hardware, leaving you with orphan products.

This doesn't matter in the consumer electronics field with its near-zero product lifetimes: Phones are now on a 12-month cycle. Classic infrastructure-level embedded systems, however, run for decades and software lifetimes pose a significant long-term maintenance problem.

Worse, perhaps, is the situation where the vendor's key personnel, the ones who actually know what's going on inside that code, move on to other jobs. Not only do you not know what's happening inside, but the vendor doesn't, either. Scary thought, eh?

The alternative to buying proprietary infrastructure is assembling your own, which in this day and age means adopting FOSS: Free and Open-Source Software. In essence, you get big hunks of Other People's Code, source included, and take on the responsibility of integrating your Magic Ingredients.

This imposes significant up-front costs measured in time, money, and your own personnel. The advantage comes later, as your organization acquires a deep knowledge of the entire body of code inside your product, and need not depend on a vendor for critical fixes.

Several ESC speakers pointed out that you can regard the entire FOSS code corpus much as you would a vendor's offering by treating it as a black box with surprisingly good documentation. The difference, a lack of someone to call when it doesn't work, simply means your folks must develop the skills required to dig into the code and figure out what's broken.

Those debugging skills represent the other end of the buy-versus-build trade-off. Very often, it seems, proprietary (and always high-level) tools impose significant overhead that can obscure low-level problems. Developing the ability to use low-level, sometimes do-it-yourself tools, builds a deep knowledge of the inner workings of your system that can't be obtained any other way.

In any event, the consensus seems to be that a project's final cost and schedule don't depend on whether you buy or build, so any advantages represent, at best, specsmanship. In either case, a single showstopper error can torpedo your product, but there's at least anecdotal evidence suggesting such problems don't depend on where you get your infrastructure or who you call for support.

There are, of course, vendors who can help you understand and use FOSS code, get your staff up to speed, and generally get you over the initial hump where nothing works and everything is unfamiliar. Whether you get locked into their offerings or not depends on how much outsourcing you want.

Retaining Experience

A long time ago, in a universe far away, a personnel guy (perhaps in a moment of weakness) related a high truth of corporate hiring policies: "Headcount is interchangeable: You make no difference." Basically, he held that when you needed to get a job done, you simply hired enough engineers with the requisite skills.

Some hallway discussions I overheard at ESC represented much the same attitude from the other side of the paycheck. It seems those attendees regard themselves as employable nearly anywhere and show little loyalty to their current employer. That they believe employers evince little loyalty to them goes without saying, I suppose.

As I see it, though, the core problem comes down to the way the familiar management axiom "Employees are our most valuable resource" plays out against relentless cost cutting. A service business, which software development surely is, has no other resource: The CVS code database won't correct itself or ship itself out the door. Unfortunately, there's little attention paid to keeping good employees around for the long term.

A friend recently related the tale of a group working on a project deemed critical to their company's future products that were told an impending layoff would, surprisingly, include some of their members. In an equally surprising show of solidarity, the entire team quit and, as with any key group, their project knowledge simply disappeared.

Purchasing large lumps of knowledge from vendors puts you at their mercy, as you gain no insight into how your system really works. Devoting sufficient resources to develop your own staff's experience means that you must also trust them with your corporate future. On the other hand, that's already true, so perhaps it's just admitting that people really are your most valuable resource that's so difficult these days.

Factoids

Although phones now have a one-year product cycle, they typically include two CPUs, one DSP, two or three operating systems, and require 2000 developer-years of effort. I'm amused to note that most folks really aren't interested in paying money to watch movies on a postage stamp.

The various computers in GM automobiles currently have 1 MLOC and GM predicts 10 MLOC by 2009, most of it for infotainment systems we never knew we needed. By contrast, engine-control computers have a vanishingly small amount of tightly written, firmly embedded code that doesn't flutter in the winds of fashion.

Recent attention to detail has crunched the Linux kernel so that a useful embedded system can run with 1 MB of RAM and 2 MB of flash. You'd want more on a development system, but, heck, it's getting hard to find memory chips that small.

Indeed, Slackware 11 installs and runs happily on an ancient 32-MB 233-MHz 560Z Thinkpad, using 3 GB for development tools, kernel source, and plenty of other goodies. I couldn't find a drive under 12 GB in my drawer, but I'm saving a 2-GB Compact Flash card until I finish some kernel fiddling and get some other code running.

By comparison, the One Laptop Per Child machine has 128 MB of RAM and 512 MB of flash for its native Fedora-oid OS and apps. Evidently, they added an SD Card slot specifically for an additional gig of flash to hold Windows.

Last Tab

You'll find a discussion of Bill Gates's nonquote, including an e-mail from him, down near the bottom of www.nybooks.com/articles/15180. Just search for "640" and you'll be right there. More on a completely different Bill at www.billgatliff.com.

Not all phones require quite so much code, as seen at www.jitterbugdirect.com. They may be on to something that could simplify the software challenge!

Search for "employees are our most valuable resource" and explore the hits. 'Nuff said?

More on OLPC from http://en.wikipedia .org/wiki/One_Laptop_per_Child, www .laptop.org, and www.vnunet.com/vnunet/ news/2170209/microsoft-looking-windows-olpc.