Long-Time Memories

By Ed Nisley, March 01, 2005

Memory management is one of the big differences between high-level and embedded languages.

March, 2005: Long-Time Memories

Ed's an EE, PE, and author in Poughkeepsie, NY. Contact him at [email protected] with "Dr Dobbs" in the subject to avoid spam filters.

High-level languages use a fairly straightforward model of system memory: It's arbitrarily large, read-write, uniform, and fast. The hideous details of memory management generally reside in the operating-system kernel, which interfaces with whatever hardware features the CPU and its surrounding chipset might provide. Apart from applying the same number of allocate and free operations to a given block, programmers generally don't need to worry much about the exact type of memory they're using.

Embedded programmers, on the other hand, must know far more about those hideous details, if only because their programs must handle weird hardware. Seemingly simple memory chips sport command interfaces, write operations may be six orders of magnitude slower than reads, and the chips can return bad data even before they wear out from overuse. Get one detail wrong and your product will fail unpredictably. Oh, and that's before you even think of writing a hard real-time application.

Let's see how we got here, then examine the sort of Flash memory one might find in a typical gizmo these days. You'll see how we've been spoiled by newfangled semiconductor memory. Perhaps the Bad Old Days are back?

Earliest Memories

ENIAC, one of the earliest digital computers, stored 1 bit in a pair of vacuum tubes wired as a set-reset flip-flop, a configuration called a "bistable multivibrator" in those days. This being a gadget designed by engineers, every flipflop displayed its bit on a neon bulb. Think of it: You could read the entire memory just by looking at the front panel!

Early programmers found ENIAC's 20 "accumulators" somewhat confining and, in 1953, Burroughs spliced on 100 words of magnetic core memory. That still wasn't nearly big enough, it was far too slow, and its three-phase power supply seemed excessive. More research was certainly needed.

Even before ENIAC, Atanasoff concocted a rotating-drum memory based on 1600 discrete capacitors. Pictures show spine-like contacts sticking out of the drum, so this thing obviously had serious reliability issues.

Replacing those capacitors and contacts with a smooth magnetic surface and a row of read-write heads boosted drum capacity to a few megabits, but the access time remained painfully slow, even by mid-1940's standards. Programmers learned to use the latency of the drum's rotation for I/O timing and compute-bound sections of code.

EDSAC, built in England around 1949, used what I think are the neatest storage devices ever made: mercury delay lines. A piezoelectric transducer launched acoustic pulses through pipes of liquid mercury to a receiving transducer. Analog feedback amplifiers regenerated the pulses on the fly to form a no-moving-parts, serial-access memory holding about 2 kilobytes of evanescent data. The access time, a few tens of milliseconds, was comparable to drum memory and far too slow for the ever-increasing speed of the rest of the system. Sound familiar?

Developed at roughly the same time as the delay lines, the Williams tube stored up to 2 kilobits in charged spots on the face of a cathode-ray tube, with the key advantage of electronic-speed random access to the bits. A pair of bottles held 128 40-bit words in the Manchester Mark I computer, the first machine to store both instructions and data in random-access memory. Yes, both in 128 words!

Magnetic core memory came back from a shaky start to sweep away all contenders in the early '60s. It had three key advantages:

Fast access.
Random addressing.
Relatively low cost per bit, despite hand-threading all those teensy ferrite doughnuts.

Core was also nonvolatile, but most commercial systems didn't really take advantage of that fact, as programs and data were already far larger than any available memory.

Core remained the memory of choice for high-end systems until the late '70s, when integrated-circuit semiconductor memory finally became cheap and reliable enough. In fact, the Intel 1101 256-bit (!) static RAM chips were nothing more than ENIAC's bistable multivibrators: Half a dozen transistors scribed on a silicon chip replacing two hot triodes.

The neon indicators, alas, didn't fit.

Intel's 1702 2-kilobit EPROM, on the other hand, was almost unusably complex, requiring three power supplies (+5, +12, -12V), strobed -48V programming pulses on the data lines, and finicky UV erasing. But EPROM was nonvolatile and could actually form the basis of a recognizable embedded system, after microcontrollers got beyond the initial 4004 architecture.

In fact, small embedded systems through about the mid '90s typically featured a three-chip cluster: microcontroller, EPROM, and RAM. Smaller systems might omit the RAM, tiny microcontrollers might have on-chip ROM, but the overall plan was about the same. In all cases, memory was pretty simple, as long as you remembered that you could read and write RAM, but only read EPROM.

Then came Flash memory.

Flash Flavors

NOR Flash, introduced by Intel in 1988, closely resembles EPROM, at least for read access. Each memory location, holding 8 or 16 bits depending on the chip, can be directly addressed and read in tens of nanoseconds, roughly the speed of large RAM chips.

NAND Flash, introduced by Toshiba in 1989, uses a serial interface that closely resembles a disk drive, if not a mercury delay line. The interface accepts commands, address, and data bits multiplexed over a few pins. Reading any particular location takes tens of microseconds, but reading successive memory addresses happens in a few tens of nanoseconds.

Unlike EPROM and the later EEPROM, Flash memory can be erased in relatively small sections. The CPU can store new data, thus enabling in-system updates and the miracle of in-flight program patching.

The names NOR and NAND vaguely suggest the internal structure of the memory arrays, but should indicate general categories: NOR means "direct access" and NAND means "serial access." As you might expect, there are myriad variations on the theme.

To inject some real-world numbers in the discussion, I'll use the Samsung KAB0xD100M-TxGP multichip memory. It has a 16-bit datapath (a "word") with access to 4M-word NOR Flash, 8M-word NAND Flash, and 2M-word pseudostatic RAM. The "x" characters are placeholders for digits that indicate various options and speeds that aren't relevant here. While you can find bigger and faster versions of each component, this one datasheet has all the pieces: http://tinyurl.com/3wxmt.

Most of the datasheet supplies the details that the hardware folks use to interface the chip with whatever microcontroller will be running the code. A few key specs, however, make life difficult for the software folks who might otherwise regard the Flash as just RAM with a really slow write cycle.

That's a really great way to kill a chip, if not an entire project.

Unlike traditional RAM or EPROM chips, Flash memory chips have both nonuniform addressing and a command structure. The address space includes control registers, data, room for metadata like ECC bits, hidden blocks of secret stuff, configuration settings, and so forth and so on. An on-chip state machine controls access to the chip's innards, so you must ensure that your program's model of that state machine either matches what it's actually doing or can force it into a known state. The datasheet gives the details, but expect to spend some time experimenting.

NAND Flash

NAND Flash may be the easiest to understand, if only because its serial nature alerts you to something unusual. Each operation requires writing a command and address into the chip, then either writing or reading the appropriate data. The commands can reset the chip and read ID bytes and status, in addition to the expected data-read, -erase, and -write operations.

For example, reading a particular data word from the Samsung NOR Flash involves sending a Read1 command followed by three address bytes specifying the page containing the word, all of which takes at least 120ns. The chip then transfers the entire page to an internal latch in a leisurely 10s process, after which you can read all 256 words at a mere 50ns each and extract the particular word you wanted from that stream.

Obviously, NAND Flash is best used for applications that mimic a disk drive. Because it does not provide random access to words within a page, you cannot execute code from it or read widely scattered words with any alacrity.

After reading those 256 words, you must send another Read1 command with the next page address and endure another 10s startup delay. The net transfer rate is thus (10s + 256×50ns)/256 = 90ns per word. That's actually not too shabby, as long as you're consuming data in a stream rather than sip by sip. If you're playing back audio or displaying a picture, it's the right hammer for the job.

Unfortunately, it's not quite that simple, because NAND Flash doesn't have nearly the same reliability as, say, the SDRAM in your server. Each 256-word page has a corresponding 8-word Spare Area for the ECC bits required to correct what's charmingly called "bit flip" in the main data. The ECC algorithm is up to you, but you should have one!

The Spare Area is accessible in two ways. You can read 256 data words, then continue to read the additional eight words, or you can use a Read2 command to access just the Spare Area without slogging through the actual data.

Storing data works similarly, but you must erase an entire page before writing new words. Erasing a page requires up to 3ms (yes, 3000s!), writing data takes (264×5ns), and the actual program operation takes up to 500s, for about 3.6ms. You must compute and store the additional eight words in the Spare Area along with the main data, because there's no other way to write them.

In addition to bit flip, NAND Flash simply wears out with use. Permanent single-bit errors will occur after 1000 erase/ program cycles on any given block. Single-bit ECC pushes the inevitable failure out to 100,000 cycles when you can expect a second permanent bit failure in a page. A transient bit flip in a page with a permanent error will cause an uncorrectable error, so you must factor that probability into your choice of ECC algorithm.

Even with ECC, the NAND Flash chip may report that an erase or programming operation has failed, in which case your code must relocate that page's data somewhere else. NAND Flash really does behave just like a disk drive: It has bad sectors and requires a mapping directory of some sort.

At least it's smaller than a drum memory and less toxic than a mercury delay line.

NOR Flash

Unlike NAND Flash, NOR Flash has a straightforward address and data interface: Present an address, assert the Read control line, and out pops the corresponding data.

It also has a control interface that accepts commands as data values written to specific addresses in a particular order. For example, to erase the whole chip, you write 0xaa to 0x555, 0x55 to 0x2aa, 0x80 to 0x555, 0xaa to 0x555, 0x55 to 0x2aa, and 0x10 to 0x555. Got that?

Now, if your first reaction is to figure the probability that a series of ordinary memory writes would accidentally trigger a chip erase, you've fallen into a classic embedded systems trap. Remember that the type of chip we're discussing was once known as Flash ROM: Under normal circumstances, the chip never sees write operations because it's not RAM.

Pop Quiz: Compute the probability that such a sequence would arise at least once in a system writing to RAM. Hint: Your first guess is probably correct.

NOR Flash blocks must also be erased before being programmed, which the datasheet says takes 700ms (yes, 700,000s) "typical." A block has 32K words, so the average value is 20s per word. Programming a single word with data requires on the order of 10s, making the overall bandwidth pretty dismal. The chip's data output contains various status flags during the programming operation, so your code can simply poll the chip to figure out when it's finished.

See the gotcha? If the memory produces status outputs instead of data, the code that should poll the outputs will crash when it tries to execute status bits fetched as instructions from different addresses in that same chip.

It turns out that early Flash chips had exactly that problem. Systems required either two Flash chips (one to run and one to program), executable RAM, the ability to run from the CPU's instruction cache, or some combination of tricks.

The Samsung part's NOR Flash is divided into two sections and can program or erase one while performing normal read operations from the other. A relatively small section, called the "Boot Block," generally holds the system's startup code, as well as the utility code required to reprogram the other, much larger, section with a new version of the main firmware.

The Samsung module also has a 2M-word RAM chip that can be used for instructions, but that's a feature of this particular multichip part rather than Flash in general.

Unlike NAND Flash, NOR flash is rated for over 100,000 updates to a single block before it fails, so ECC isn't absolutely required and there is no dedicated Spare Area for those bits. You'd be well advised, however, to compute an overall checksum for your data, as it's entirely possible for errors to creep in unannounced.

There is, however, a 32K-word Security Code area hidden in the Boot Block that can be programmed only once, then accessed only by a command sequence. That's where you put the unique Product ID codes that lock software to your gizmo, at least if you also believe in Santa Claus.

Reentry Checklist

I'll pick up the thread of Flash file systems later on. In the meantime, remember that any particular Flash memory chip will have more peculiarities than I've mentioned here. Expect the unexpected!

The History of Computing project is at http://www.thocp.net/index.htm. A 1946 paper on ENIAC at http://pages.cpsc.ucalgary.ca/~williams/509pdffiles/ ENIAC.pdf gives an overview of how it all worked. See a decent picture of the replica Atanasoff-Berry Computer at http://perun.hscs.wmin.ac.uk/JHPC00/; the original ABC drum at http://www.csm.ornl.gov/ssi-expo/abcDrum.gif; and a Williams tube at http://www.cedmagic.com/history/williams-tube.html.

A history of nonmechanical storage, with some nice photos of core memory planes, is at http://www.allaboutcircuits.com/vol_4/chpt_15/4.html and an analysis of core memory development is at http://theory.lcs.mit.edu/classes/6.972/ Core%20Report.html. A bit of core versus IC history is at http://www.eetimes.com/special/special_issues/millennium/ milestones/whittier.html.

The Story of Mel shows how a Real Programmer puts a bizarre instruction set and drum memory to good use: http://www.jargon.net/jargonfile/t/TheStoryofMel.html.

The datasheet for Samsung's multichip NOR- and NAND-Flash plus RAM module is at http://tinyurl.com/3wxmt/, which encodes a nasty jawbreaker of a Samsung URL. If that doesn't work, start at http://www .samsung.com/, search for KAB01D100M, and pick the result with "NOR-based" in the summary.

DDJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Long-Time Memories

Earliest Memories

Flash Flavors

NAND Flash

NOR Flash

Reentry Checklist

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Long-Time Memories

Earliest Memories

Flash Flavors

NAND Flash

NOR Flash

Reentry Checklist

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content