Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Monoculturalism


September, 2005: Monoculturalism

Ed's an EE, PE, and author in Poughkeepsie, NY. Contact him at [email protected] with "Dr Dobbs" in the subject to avoid spam filters.


Even a quick glance at programming magazine ads shows the unrelenting complexity of software development. A program's source code no longer defines its overall logical structure, as it now must connect the various frameworks and scaffolds and infrastructures that actually do most of the heavy lifting. The days when writing a program required nothing more than a pad of paper and a keyboard have vanished with, oh, the slide rule.

The ads promise that, in exchange for giving up control over major chunks of functionality, we can devote more time to the small pieces that actually differentiate our program from its competitors. We're told that there's little enough value in Yet Another User Interface, Database Back End, Network Stack, and so forth and so on, that we may as well use Other People's Code for those parts of the project and get on with our own stuff.

Embedded systems take this notion to an entirely new level, because hardware design has become subject to the same pressures. I think we have four puzzle pieces that, in combination, pose some interesting challenges for embedded systems designers.

Other People's Code

Although it's hard to be certain, the first instance of code reuse probably occurred while laying out the plugboards for ENIAC's second program. If not, then surely the third program recycled parts of the second, establishing the "Write one to throw away" design pattern.

Fast forward four decades, as programming segues from plugging cables to GUI IDEs. The notion of dynamic linking, which long predates Window's DLLs, now lets programs invoke common routines without including them in every program's executable file (let alone card deck). Programs become nondeterministic on a grand scale, because they depend on Other People's Code that not only isn't included in the executable file, but may differ from run to run.

Such late binding promised that you could fix problems by simply replacing a single common file, rather than relinking all the programs that used it. The reality, known as "DLL Hell," was an ever mutating series of errors and dependency failures. Of course, by the time the symptoms surfaced, they were completely unrelated to the actual causes.

Fast forward another two decades, when software frameworks unifying disparate operating system and utility functions become feasible. Writing a program now requires less overall system knowledge, while simultaneously putting more reliance on vast chunks of Other People's Code. A system-level package management database can help resolve dependencies, although everybody must agree to play by the rules.

Pop quiz: If you weren't such a nice person, what could you do with an arrangement like that? Hint: On forever-running systems, you can delete the file that loads your code.

Building Blocks

On the hardware side, classic small-scale embedded-system design started with datasheets for single integrated circuits that were actually comprehensible to one person. The chip designers specified how the external hardware must behave, to the extent that Motorola and Intel microprocessors might as well have been from different planets, and embedded-system designers continued that process to produce completely unique chunks of hardware.

Eventually though, vendors noticed that most of the gadgets looked pretty much the same: A microcontroller, some external memory, an assortment of digital and analog I/O, plus some communications ports. Add a few status LEDs and a debugging port: Shazam, everyone could use a single board. At least if they were willing to buy a few features they didn't really need, which generally turned out to be the dealbreaker: The cost of the board was relatively high compared to the cost of the final product.

Nowadays, chips have become sufficiently complex that reading the datasheet does not give you confidence that you can actually build a working gadget. Chip vendors now produce "evaluation boards" to provide an existence theorem: If you interconnect the hardware using this exact board layout, then run this software, the chip behaves properly.

Once you've seen it work, the theory goes, you can adapt it for your own use, write your own code, and go on your merry way to satisfy your customers. Any problems can be traced back to differences between the eval board's design and your efforts, so figuring out what's going wrong shouldn't take all that long.

In actual practice, however, the adaptation process sometimes goes along these lines: "This works, so we'll just run with it!" Grafting an eval chip's layout onto your board is one thing, but shoehorning sample code into a production application becomes something quite different and rather scary.

Eval board code tends to be written for the very specific purpose of showing how the circuit works. Considerations of reliability, error handling, overall structure, and security tend to fall by the wayside even in ordinary projects, but particularly for sample code that's written very, very early in the chip's design stage. As a result, eval board code comes with a surprisingly high level of cruft, not to mention outright errors in lesser used functions.

Nevertheless, there's an awful pressure to make as few changes as possible, because you (or your boss) really wants to concentrate on the rest of the project, the part that contains your unique and no-doubt valuable IP, rather than building-block hardware and code.

Sound familiar?

Hammering on the Wall

The next puzzle piece comes from the Internet, with an overall design dating back to the Good Old Days when everyone was on more-or-less friendly terms with everyone else. The degree to which that is a bad assumption has taken many people by surprise, so a quick look at what's going on is in order.

Some consumer-grade firewalling routers, including my D-Link DI-604, can e-mail their status logs on a regular basis. Being that sort of bear, I stored all those status e-mails since installing the firewall in late 2003, knowing that they'd come in handy.

Such a firewall's key job is to block all external packets that do not correspond to traffic originating from the protected "inside" network. From the Internet side, it appears that the firewall's IP address is unused: Blocked packets are not bounced back to the sender, they're simply discarded. These firewalls can also relay external packets to internal servers, but I'm not using that function.

The DI-604 also reveals a closed Port 113, the standard Internet ID Protocol port, so you can tell that there's something at my IP address, although attempted connections won't work and all the other ports simply don't respond.

I wrote a Python script to read those e-mail messages, look up the host name corresponding to each dotted-quad IP address, and tally the number of packets received from each one. Because many of the IP addresses are in blocks reserved for ISPs, many different systems may use a single IP address. Conversely, it's also trivially easy to spoof source IP addresses, so you can't trust everything you see. Nevertheless, we can extract some interesting information from those records.

From January through late May 2005, my firewall discarded 32202 packets from 5787 distinct IP addresses, sending 170 status e-mails in the process. Figure 1 shows the top 15 packet sources and the ports they attempted to connect with at my IP address.

Although it looks like NASA is a perp, there's an innocent explanation. It turns out that if you shut down the viewer for Quicktime movies, the server continues to hose down your IP address until it notices that you've vanished. The firewall discards those incoming packets, because they no longer correspond to outbound traffic from your system. Similarly, the 322 packets from Best Buy were probably triggered by something I viewed on their web site.

I used the Shields-Up port scanner at Gibson Research to verify that my firewall was operating correctly. Scanning your own system from the outside makes sense, as long as you can trust the system doing the scanning, but the firewall can't distinguish the scan from an actual attack.

All the remaining entries represent unsolicited probes of my firewall. Although I don't know if they first tested port 113 to verify that the IP address was in use, that seems unlikely on the face of it.

One system in my optonline.net neighborhood steadily plinks away at the long-patched Microsoft Remote Procedure Call vulnerability, another engages in broad-spectrum port scans, and a third still harbors a Slammer worm. While they may not be geographically nearby, they have easy access to my firewall because they're within the Optimum Online subnetwork, behind any ISP-level firewalls.

All but one of the remaining systems (seem to) reside in mainland China, doggedly trying to stuff pop-up spam through MS Messenger's ports around 1024. You might think that, after 2700 attempts, every eight minutes or so, over the course of two weeks, someone would get a clue, but that's not the case.

Hold that thought for a moment.

Remote Computing

Early this year, a trio of researchers from Shandong and Princeton Universities announced, without giving the details, an attack on the SHA-1 message digest algorithm. Dallas Semiconductor subsequently produced a white paper describing the effect of the attack on its high-security memory devices.

These gizmos use SHA-1 to create a MAC (Message Authentication Code, not to be confused with the Media Access Control address on your network cards) digest over the memory contents, the device's serial number, a challenge string, and a secret key to authenticate and verify both data storage and transmission. They're not your common stick of SDRAM, for sure!

Anyhow, Dallas observes that it's still infeasible to determine the 64-bit secret key given all the rest of the information. The computation requires 264 SHA-1 operations, each of which takes about 1740 "basic arithmetic operations." They estimate 12.4 years of grinding on a 64-CPU Cray X1 at 819 GFLOPS or 2 months on a 4096-CPU machine.

While renting a supercomputer isn't feasible for most of us, there's a much cheaper way to get serious computing power. The street price for zombie Windows boxes (aka bots) is now under $0.10 each in lots of 20,000. That price includes preinstalled Trojan software giving you complete control. Spammers send junk mail and MS Messenger popups from zombies, but we can do better than that!

Dallas allows for 20-percent overhead in the SHA-1 computation, so, if a typical zombie is a 2-GHz Windows PC, it can run a million (2×109/2100) SHA-1 computations per second. Giving each zombie's owner half the CPU cycles still yields 40×109 SHA-1 computations per day, so a single zombie can crack one memory device in 500 million days, worst case.

There are, however, two types of Windows machines: Those behind firewalls and zombies. The former divide into two camps—those with good security practices and zombies. I'm not sure the numbers are reliable, but Symantec estimates a quarter of U.S. PCs are zombies and, as nearly as I can tell, the U.S. has about 150 million PCs. That gives a pool of 37×106 Windows zombies, just in the U.S. alone.

Let's derate that by an order of magnitude to account for the usual handwaving assumptions. If you had half a megabuck to spend (What? No bulk discount?), 4×106 PCs can crack a SHA-1 MAC in four months. That seems like lots of money and a long time, but, by definition, SHA-1 memories appear in high-value, long-lived applications.

Desktop PCs have been stuck around 3 GHz for a few years and, while Moore's Law isn't in abeyance yet, Intel and AMD pretty much admit that more performance requires more CPUs. Assuming that the overall performance per box doubles every two years and the Windows zombie pool remains unchanged, by 2013 we'll crack SHA-1 MACs in a week, just with U.S. PCs—no need for offshoring!

If that sounds like it's too far in the future to worry about, then you must have grown up since, oh, about 1998.

Assembling the Pieces

Embedded systems range from simple microcontrollers to command-and-control networks, but they tend to run all day, every day, through a lifetime of years to decades. With few exceptions, they're designed so their users can forget the "embedded computer" part and get on with their jobs. Alas, keeping up with current security patches isn't usually part of that job.

The complexity of current systems ensures that they will be built with vast chunks of Other People's Code atop common hardware blocks. In practical terms, such code comes with no guarantees of reliability or security, regardless of whether it's from proprietary or Free Software sources.

Internet attackers are no longer amateurs. The current goal is control of your system for financial gain and, as always, money changes everything. High-value systems, those protecting resources that can be converted to money, will endure relentless automated attacks designed by folks at least as smart as you are, with more knowledge and better tools.

The computational resources available to a determined attacker with even a modest budget means that bad crypto will be fatal. SHA-1 is extremely good, but if the system's security depends on "computationally infeasible" tasks, the time horizon must outlast the system: If you're not thinking SHA-256 right now, you're in trouble.

Bottom line? Your gizmo's function depends on your code, but its security depends on everything. Any flaw in any of the common building blocks will expose your product, just like all the rest. Worse, any flaw detected in any other product can provide entry to your systems, particularly if the attacker can gain physical control of any gizmo using similar building blocks.

Monocultures exist in more places than just Wintel boxes. Somehow, we must provide defense in depth, despite using large chunks of common code and hardware, to our embedded gizmos. Surely, it's time to start paying more attention to security than to fancier features?

Reentry Checklist

My capsule review of the progression from source to frameworks surely mangles a detail you consider vital. Much of computing's history can be found in Wikipedia starting at http://en.wikipedia.org/wiki/Main_Page.

More on stealthing Port 113 at http://www.grc.com/port_113.htm. Top 10 lists of scanned ports and scanning IPs from the Internet Storm Center at http://isc.sans.org/top10.php.

Read the Dallas Semiconductor white paper on SHA-1 attacks at http:www.maxim-ic.com/appnotes.cfm/apnote_number/3522. I cannot make their Cray numbers work out, but it doesn't matter.

See the original papers describing various attacks at http://www.cs.ut.ee/ ~helger/crypto/link/hash/mdx.php. A news writeup of the price of zombies is at http://www.usatoday.com/technews/ computersecurity/2004-09-08- zombieprice_x.htm. The links and hints here will give you an idea of the scale of the problem: http://windowssecrets.com/comp/040923/. Symantec's observations on the current state of the Internet are at http://www.cert-in.org.in/ training/21-22april05/internet%20threat.pdf.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.