Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Recovered Memories


Dec03: Embedded Space

Ed is an EE, PE, and author in Poughkeepsie, New York. You can contact him at [email protected].


An idea has reached general acceptance when your in-laws seek your advice about it. Early this year, they replaced their ancient Aptiva and asked me to ensure a decent burial for its hard drive, saying that they did not want their data passed on with the machine.

I used the Aptiva's drive, a whopping 2.5 GB classic, as a Linux swap filesystem until I rearranged my boxes a few months later and gave the whole thing away. I zeroed the drive before and after using it, reinstalled Mandrake Linux and tucked the Windows 95 CD in the box.

As we saw last month, transient memory errors can grant full access to a rogue program, even in a system protected by full-up Java security. This time, let's take a look at what happens when memories don't go away quite as expected. Surprisingly, the problem extends beyond magnetic media into the heart of the hardware, but let's start out on familiar ground.

Magnetic Traces

The Ekert-Mauchley Computer Corporation designed a magnetic storage unit for UNIVAC (the first commercial computer from the first computer company), so magnetic memory dates back to 1951, very nearly to the beginning. That unit, designed for bulk temporary storage, held a million decimal digits and could read 10,000 digits per second, far faster and much more reliably than the punched cards used by contemporary business machines.

In those days, you could see the bits on the tape or disk or drum surface after you daubed on some visualizing liquid. Each bit corresponded to a magnetic polarity reversal that induced a current in the read-head's coil, which meant that one stored bit required two adjacent magnetic domains of opposite polarity.

A half-century of development reduced the size of the domains, introduced direct magnetic sensors, improved the data coding, and squashed the retail price to the point where drives sell for half a buck per gigabyte. Evolution ensured that nearly any mainstream disk drive from the last decade works in nearly any contemporary PC, which has some very interesting consequences.

Various pretenders to the nonvolatile memory throne have emerged during that same half-century, only to be vanquished by magnetism's unbeatable combination of cost, convenience, capacity, durability, and simple inertia. Rotating magnetic disks may not be the best place to store data while the power's off, but they're better than anything else we've come up with. As a result, magnetic disks tend to be taken for granted—they're part of essentially every desktop-style computer in general use.

The disk drive's programming interface has changed significantly over the decades, too. The earliest drives required detailed knowledge of the exact physical location on the disk, with details of the machinery written directly into the operating system code and timing relationships extending deep into user programs. Nowadays, we assume that the OS handles all the details and simply delivers the megabytes on demand.

We refer to data files by name, not by physical location, and let the OS (or, perhaps, the drive itself) decide where the bits should go. Although a background task may shuffle files around on the disk surface to reduce fragmentation, improve performance, or simply verify the disk's integrity, we no longer care. In fact, the file may migrate from one disk to another, then to offline storage as part of an archiving strategy, without our knowledge.

The OS achieves this abstraction by storing the file's name and attributes, including its location, in a directory on the disk. When we tell the OS to "delete" a file, it simply flips a few bits in the file's directory entry and abandons the magnetic domains that represent the bits on the disk. It may eventually write a new file's data into those locations, but until then, the old file's data can be read by programs that bypass the usual OS interfaces.

Decoupling the user interface from the actual hardware is one of the driving forces behind software design. That works well as long as the system is used in the way its designers intended, but peculiar things happen when those expectations aren't met.

Gone But Not Forgotten

Starting in late 2000, a pair of MIT grad students bought 158 used hard drives, plugged each one into a PC running FreeBSD, and attempted to block-copy each drive's contents into an image file. Of the 129 drives that worked correctly (caveat emptor!), 83 had mountable FAT file systems, 46 had damaged FAT file systems, and only 12 had been "sanitized" by writing zeros to all sectors.

Some drives had been reformatted using the DOS format command, some had documents deleted using either DOS or Windows commands, while still others became surplus with their files intact. Much of the data was recoverable, even from the drives without a mountable file system, because formatting, deleting, and even repartitioning doesn't actually delete data written on the drive.

Now, had these been ordinary grad students, the story might have died there. Being Simson Garfinkel and Abhi Shelat, they wrote up their findings, published "Remembrance of Data Passed: A Study of Disk Sanitization Practices" in the January/February 2003 issue of IEEE Security and Privacy, and got far more press coverage than anyone expected. Word even reached my in-laws, hence, the presence of that Aptiva under my electronics workbench.

Much of the coverage was evidently prompted by a pair of UNIX text filters they applied to those image files. One filter recognized valid credit card number patterns (12 digits, good mod-10 check digit, good numeric range) and found potentially valid numbers on 42 drives. The second filter located large numbers of e-mail headers on 66 drives, presumably with the corresponding message bodies intact.

One reformatted drive held 3722 credit-card numbers in a deleted file that was easily recovered. Another held 9500 e-mail messages spanning three years. Others held dozens or hundreds of documents in a variety of standard Microsoft formats: Word, Outlook, PowerPoint, Write, Works, Excel.

What piqued my interest was yet another drive holding 2868 credit-card records. By rummaging around in the data, they discovered that the drive came from an Illinois ATM complete with account numbers, balances, transaction dates, plus, as a bonus, the ATM machine's software.

All the rest were desktop boxes, but an ATM is certainly an embedded system. Got your attention?

Embedded Drives

The overwhelming majority of embedded systems use 4- or 8-bit microcontrollers that generally operate from ROM without disk drives. Simple economic pressure, however, has driven more complex embedded systems into (perhaps ruggedized) PC-oid hardware that's largely compatible with that box beside your desk. Not only is the hardware plug-compatible, but the software generally bears a strong resemblance to desktop code as well.

All of the drives in the Garfinkel and Shelat study originally held FAT filesystems, the type commonly found in DOS and Windows desktop systems. The usual FAT allocation methods reflect desktop usage and, in fact, tend to not immediately reuse deleted sectors. Even when the filesystem structure has been damaged, the directory and file contents can be readily extracted using straightforward tools.

UNIX filesystems, despite their recalcitrant reputation, actually pose little obstacle to data recovery. Even if the file structure cannot be reconstituted, the data remains readable on the drive: If you're looking for only a few dozen characters, you can find them with no trouble at all.

Recovering data from a drive with mechanical or electronic damage requires expert attention and equipment. Garfinkel and Shelat demonstrated that the vast majority of surplus drives work perfectly and require nothing more than an off-the-shelf PC (admittedly, one running UNIX) and some straightforward finger dancing on the command line.

It gets worse.

Patterned RAM

Some embedded systems with stringent power or mechanical vibration specifications cannot use magnetic disk drives. Not too many years ago, those systems would require an elaborate, custom-designed, nonvolatile memory subsystem. Nowadays, you simply plug a block of Flash memory into the same disk-drive connector that you'd use for a magnetic disk.

In fact, you can install either type of drive depending on how the system is used. Magnetic media might be appropriate for a development environment that needs more storage, while the smaller and more rugged Flash would be deployed in field units.

Remember that the same economic pressures that force PC compatibility on an embedded system dictate that the I/O gear will work with the standard PC BIOS and any stock operating system. We've been trained to reuse software, so what's the point of writing your own driver when you can avoid any software at all?

That Flash memory works just like a magnetic disk, at least as seen from the IDE drive connector. Pull it out of that surplus embedded system, plug it into your own PC, and you've got instant access to all its files.

But it gets still worse.

High-volume systems can afford the development cost of a custom interface, resulting in disk-like Flash memory buried on the system board. For example, a Blackberry might not be a classic embedded system, but it's close enough: a few megabytes of static RAM and four or eight times that much Flash memory.

Pop Quiz: Where does a Blackberry store its address book and e-mail messages?

It seems a used Blackberry passed through eBay to a happy purchaser for a mere $15.50. After popping in a new battery he discovered that the previous owner, a former Morgan Stanley VP, hadn't deleted any of the exceedingly sensitive data stored in the device. The VP made the incorrect assumption that removing the battery would clear the memory.

Flash memory doesn't work that way. It'll hold its data until the epoxy package disintegrates, plus a few more years. The LCD may get flaky, but you'll still be able to extract the data.

Oops!

Sanitary Disposal

When a piece of equipment reaches its end of life, it also reaches the end of its mindshare. We lavish our attention on the new box, while the old one gets handed down, recycled, or simply discarded. Unfortunately, the nonvolatile memory in that box remembers its contents far better than we do.

Peter Gutmann's 1996 paper on "Secure Deletion of Data from Magnetic and Solid-State Memory" points out that, while government-class organizations may be able to read data directly from the raw platters using exotic equipment, simply overwriting it with zeros or suitably random numbers will suffice for most purposes. He also describes methods for extracting data from nominally volatile dynamic and static RAM chips, although that requires heroic effort.

Garfinkel and Shelat offer convincing evidence that most users either don't care or don't bother with data sanitization, even on desktop systems where it's straightforward. Worse, many users believe that the operating system actually erases their data when it deletes a file, which is demonstrably false. The authors conclude that we'll change our ways only after some high profile "data recovery" cases.

Sanitizing embedded systems poses an entirely different set of problems, because deeply embedded systems tend to operate with little or no user intervention. Expecting the techs responsible for disposing of a system to remember a data deletion procedure that's probably documented in the back of a PDF-format manual on a disk somewhere isn't reasonable.

Garfinkel and Shelat recommend that disk drives encrypt their contents based on a key held elsewhere in the hardware; deleting that key ensures that the raw bits on the platters can't be used by anyone else. This may be one benign application for digital rights management, but how that will intersect with copyright law remains to be seen. Of course, if the whole system winds up in the surplus channel with the key intact, we're back to where we started.

Designers of embedded systems with nonvolatile memory have an even greater problem, as nobody will ever think to remove secrets held in hardware, and there doesn't seem to be any unambiguous way to determine that a device has reached end-of-life and can destroy its own contents.

Countermeasures

As nearly as I can tell, it's far more profitable to siphon a million credit-card numbers from an under-defended web site's database than to rummage through a few hundred discarded hard drives one at a time. A cache of a few thousand numbers could catapult someone into high-roller territory for a while, but the necessity of learning command-line hacking might deter most of the script kiddies.

There is no security through obscurity, because a whole-disk search will reveal the contents of all the sectors on the disk regardless of their original file attributes. Further, if the security of a system depends on both hardware and software, then plunking the drive in a "hostile" PC pretty much removes any security measures. In fact, you can be sure that the parts of a system will crop up in disparate locations, so each piece must either hold no sensitive data or be self sanitizing.

Data encryption at the drive level will work, assuming that the drive doesn't show up in surplus with the original hardware and its decryption key. That seems like a completely unreasonable assumption, given that many companies dispose of end-of-life hardware in bulk to the highest bidder.

There may not be, in fact, any engineering countermeasures for this problem. Other than providing an obvious method that permanently and irrevocably clears the entire nonvolatile memory of an embedded device, the best we can hope for is educating organizations about how much trouble their recovered memories can cause.

Discarded any data lately?

Reentry Checklist

Read Garfinkel and Shelat's "Remembrance of Data Passed" article at http://www .computer.org/security/garfinkel.pdf. The story of the Blackberry with left-over data appears at http://www.wired.com/news/print/0,1294,60052,00.html. Gutmann's 1996 paper on secure data deletion is at http://www.usenix.org/publications/ library/proceedings/sec96/full_papers/ gutmann/.

The correct abbreviation for gigabyte is "GiB," it's correctly spelled "gibibyte," and pronounced with g-as-in-gag and i-as-in-itch just like it's spelled. Don't believe me? Check it out in the Embedded Systems Dictionary, by Jack Ganssle and Michael Barr (2003, CMP Books, ISBN 1-57820-120-9). I found just one typo in the review copy, so they've really done their homework!

Zeroing out a hard drive is as much security as most folks will need. Linux aficionados can use good old dd or anyone can boot DiskZapper from http://www.diskzapper.com/.

A net search for "data recovery services" solves the converse problem, perhaps even if you've, um, accidentally zeroed a drive (dd is a monster!). The prices make backing up everything to CDs or DVDs every few days look downright attractive.

If you're even faintly interested in the history of computing, read ENIAC: The Triumphs and Tragedies of the World's First Computer, by Scott McCartney (1999, Berkeley, ISBN 0-425-17644-4). What we know as "the von Neumann architecture" actually started with Ekert and Mauchley, two largely forgotten pioneers who opened the frontier for the rest of us.

Following up on my June 2003 column, you should read all 248 pages of the Columbia Accident Investigation Board's report at http://www.caib.us/news/report/ default.html to see what a real investigation looks like. Could your project withstand a similar inquiry?

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.