FLASH FILE SYSTEMS
A new approach to storing information
This article contains the following executables: FLASH.ARC
Drew Gislason
Drew is vice president of research and development at Datalight and can be contacted at 307N. Olympic Ave., Suite 201, Arlington, WA 98223 or via CompuServe at 70671,3272.
Flash memory promises radical changes in the laptop, palmtop, and embedded-systems markets. Because Flash memory requires little power to write information into its cells--and none at all to retain it--long-life, battery-operated systems with mass storage are possible.
Flash memory, typically packaged as cards (see Figure 1) as defined by the Personal Computer Memory Card International Association (PCMCIA), plug into slots in the computer. These cards store from 1 to 8 megabytes of data, with 64-Mbyte capabilities on the horizon. One of Flash cards' big advantages is that they alleviate power-consumption problems inherent in power-hungry hard and floppy disks, as well as static and dynamic RAM.
Still, Flash memory requires an extremely long time to erase--up to 30 seconds per 128-Kbyte memory chip, and a 1-Mbyte card uses eight chips! Furthermore, erasure can occur only in large discreet-sized blocks. These limitations have given rise to the Flash file system, a new method of storing information.
Two approaches for storing file information in Flash memory on Flash cards have emerged. One method, akin to the familiar RAM disks (of RAMDISK.SYS and VDISK.SYS fame) simply divides the Flash memory into equal-sized chunks called sectors. This article provides full source to just such a FAT-style, sector-oriented Flash file system. The other approach is more byte-conscious. It records file information in a byte-wise, sequential fashion in the Flash memory, thus avoiding the "dead-space" problems associated with FAT file systems and allowing files to be erased and/or modified. But it presents challenges in terms of proprietary schemes and garbage collection.
Regardless of which method works best for your application or system, DOS still manages the higher-level file I/O functions. In the places where DOS file I/O falls short in dealing with Flash memory, a Flash file system fills the gap.
While this article supplies source to a FAT-style Flash system, it isn't a complete solution unto itself like those provided by DOS vendors. Microsoft, for instance, offers a byte-oriented Flash file system for MS-DOS 5.0 embedded-systems developers and Datalight (the company I work for), developer of a DOS 5.0-compatible operating system (a work-alike, not just MS-DOS redistributed), also delivers a complete Flash file system.
The FAT-style Flash File System
DOS thinks of a disk as a large array of discrete-sized chunks called sectors. It organizes these sectors on a disk into various areas, one of which is called the file allocation table (FAT). The sectors are very similar to Flash memory blocks. Sectors are 512 bytes in length. Flash memory blocks range from 2-128K in size, depending on manufacturer. Hence, the natural choice for a Flash file system is the FAT, or sector-oriented method.
This sectored approach is what makes the FAT-style Flash file system so simple. A sector is really just a portion of the Flash memory. A 1-Mbyte Flash card may contain eight 128K Flash memory chips, each of which is divided into 256 sectors of 512 bytes, for a total of 2048 sectors. The sector size of 512 bytes is purely convention. Sector sizes of 128 bytes or even 1024 bytes are possible, but floppies and hard disks both use 512 bytes.
A FAT-style Flash file system is really just a disk device driver for DOS that understands Flash memory. DOS device drivers are well documented in books such as Ray Duncan's Advanced MS-DOS Programming (Microsoft Press, 1988).
Additionally, a FAT-style Flash file system can be hooked into the BIOS via interrupt 13H. Among other things, the BIOS reads and writes floppy-and hard-disk sectors. DOS reads, writes, and formats (erases) those small chunks of data (sectors) using BIOS interrupt 13H. By hooking into interrupt 13H, a FAT-style Flash file system can fool any utility that deals directly with disks (Norton Utilities, for example) into thinking the Flash disk is nothing other than a very fast, odd-sized floppy.
Another advantage of the FAT-style Flash file system is that even non-Flash file systems from any vendor can read a Flash disk. A Flash disk, written in the FAT-style method, contains the same layout as ROM and SRAM disks. PCMCIA Socket Services (that is, the PCMCIA BIOS) are designed to read Flash, ROM, and RAM cards. Writing to Flash memory is another matter, hence the need for a Flash file system.
A problem arises when a sector must be overwritten with new data. The FAT, for instance, is periodically updated as files are added and deleted. FAT entries containing 0s are set to some value representing the location of the file on disk. For a floppy or hard disk, overwriting a sector with new data is not an issue; it simply works. It's impossible to overwrite data in Flash memory.
Flash memory (and ROM, for that matter) consists of binary 1s. Any byte written to the Flash memory changes some of the bits to 0s. Bits that are 1s are ignored. In Flash memory (as in ROMs), once a bit has been changed to 0, it's 0 until the whole chip has been flashed (erased).
Unfortunately, most DOS structures are set to 0 upon initialization (FORMAT), including both the FAT and root directory. This would make a Flash disk completely unusable! The Flash disk device driver presented here overcomes this limitation by inverting each and every bit: All 1s becomes 0s, all 0s become 1s.
Besides the bit problem, other anomalies surface when writing to Flash memory in a FAT-style, sector-oriented fashion. Files cannot be deleted, as the directory entry and FAT entries occupied by the file cannot be changed. Neither can files be updated or appended to. Anything written to Flash memory is there until the entire Flash chip is erased.
DOS commands such as COPY, DIR, and TYPE work as expected, as does the create/write/close sequence in many applications. But ERASE, RENAME, and "ECHO TEXT >>FILENAME" fail. The FAT-style flash disk effectively becomes a write-once/read-many (WORM) drive. When the drive is full, there's no choice but to stop writing or erase the whole thing.
In general, sector-oriented schemes share the disadvantage of sector dead space that occurs when the file length doesn't exactly match the length of one or more sectors. For instance, if a file is exactly 1036 bytes long, it will be stored in three 512-byte sectors. The first two are filled with data, and the final sector contains only 12 bytes of data with the remaining 500 bytes unused. This problem grows to terrible proportions when a file contains just a single byte of data. In this last case, 99.8 percent of the sector is unused.
The Device Driver
The FAT Flash disk device driver FLASHDEV.ASM (see Listing One, page 94) isn't as robust as commercial Flash file systems. I haven't, for example, provided a facility to erase a Flash disk. The FAT-style Flash disk device driver assumes the Flash memory is either erased or contains a valid Flash disk. Listing One presents the bare-bones version of FLASHDEV.ASM. A fully commented version along with the source code for the Flash memory I/O (which reads and programs Intel Flash memory) is available electronically; see "Availability," page 7.
The write routines blindly assume Intel parts. While it's possible to query Flash chips to determine both manufacturer and type, this device driver takes the dumb approach: It's hardcoded to Intel parts. An intelligent algorithm would detect the manufacturer and write or flash (erase) accordingly.
The first half of the FLASHDEV.ASM comprises the actual device-driver interaction with DOS. This code is generic to any disk driver. A function table diverts the "request" from DOS to the code that must fulfill that request. Simple, generic entry and exit routines handle all communiques from DOS.
Each device-driver function--meminit, memchanged, memread, and memwrite--calls one of the real "meat" routines to do the actual work, and all must intimately understand Flash memory. They are prototyped in the comments of the device driver.
The meminit routine formats the Flash memory if it hasn't already been formatted. The formatting process will produce a disk the FAT-style Flash file system can use. The memread and memwrite routines read and write Flash memory. The memchanged routine currently does nothing; it could be expanded to detect whether a Flash card has been removed and proceed accordingly.
The Byte-oriented Flash File System
The byte-oriented Flash file system takes a different approach to storing file information. This method has been refined by a number of vendors, including Microsoft and Datalight.
The byte-oriented method does not store file information in discreet-sized sectors, but in a byte-wise, sequential fashion. Typically, a structure surrounds the actual data, providing the byte-oriented Flash file system with the information necessary to keep track of what data belongs to which file or directory.
At Datalight, for instance, we implement the Flash file system with a linked list of records. Each record contains a header of approximately 12 bytes (describing the data's owner) and a variable-length data field. The overhead of this header is far less than the typical dead space associated with FAT-style Flash file systems.
Take, for example, the 1-byte file described earlier: A 1-byte file in a FAT-style, sector-oriented Flash file system requires one full sector, or 512 bytes. That means that 511 bytes of overhead exist for that one piece of data. Contrast this to 12 bytes of overhead, and you see why the byte-oriented Flash file system saves space. The most compelling reason to use a byte-oriented Flash file system, however, is its ability to append, overwrite, and delete files in Flash memory. A FAT-style Flash file system can't do this.
These advantages don't come without a price. No standards yet exist that define the data layout in physical Flash memory for a byte-oriented Flash disk. That means you're tied to whatever software vendor you choose for the Flash file system. It also means that Flash cards produced on your system can't be shared with systems that use a different Flash file system. This disadvantage, more than any other, makes the byte-oriented Flash file system of limited use.
Other unfortunate things occur when files are deleted, renamed, or changed. As mentioned earlier, Flash memory has the limitation that once data is written to it, the data can't change without flashing (erasing) the whole chip. Or more correctly, bits that contain a 0 cannot change. This has led makers of byte-oriented Flash file systems to simply mark off a section of Flash memory that must change as "no longer used." The data is then changed and written to unused Flash memory elsewhere. From the point of view of DOS or an application program, the data was simply modified and happens invisibly to DOS.
The marked-off Flash memory is now dead space that remains unused until the entire Flash chip is erased. A Flash disk can quickly fill with dead space if files are continuously rewritten or the data in them is frequently modified.
The only way to clear this dead space out of a byte-oriented Flash disk is with garbage collection. Typically, this involves reading much of the data into RAM, erasing various Flash chips, and writing data back out. When the procedure is through, the beginning of the Flash disk contains valid, useful data and the end of the disk is erased, empty, ready-to-be-used Flash memory. Unfortunately, since Flash memory takes a long time to erase, this garbage collection can take minutes on a large Flash disk.
An issue I haven't touched on is how DOS talks to a byte-oriented Flash file system. While FAT-style systems use standard DOS disk device drivers, byte-oriented systems can't. Typically, a byte-oriented Flash file system uses a little-known mechanism in DOS called the "installable file system" (IFS). DOS has two file systems built into it: the FAT file system (FFS) and IFS. These systems talk to a disk in very different ways.
The FAT file system is the most familiar. DOS on a desktop uses it and its built-in device drivers to talk to the floppy and hard disks. All access to the disk is performed via sectors that are buffered by DOS (the BUFFERS= command in CONFIG.SYS).
DOS talks to the IFS in a different way. The IFS answers DOS with high-level, file- and directory-oriented responses. In this way, the physical layout of the data has no constraints. DOS may ask the IFS to open a file or delete a directory, but DOS won't ask anything about the layout of that data.
With this lack of constraints, the IFS allows a byte-oriented Flash file system to store data in any manner it deems appropriate while DOS remains blissfully ignorant of the details. For a good discussion of IFS, see Andrew Schulman's book, Undocumented DOS, second edition (Addison-Wesley, 1993).
Conclusion
Some vendors have created hybrids of the FAT-style and the byte-oriented Flash file systems to produce a Flash disk that looks and feels to applications to be a fast, small floppy disk. Other vendors hook BIOS interrupt 13H to cause such bit-twiddling programs as Norton Utilities to think it is dealing with a standard, sector-oriented, FAT-style disk. Be aware, however, that unless true sectors are written to the Flash memory in a standard FAT-style configuration, the data layout in the Flash memory itself will be proprietary, confining Flash cards written by the system to one vendor's software.
No matter which Flash file system suits your needs, it's important to weigh the advantages and disadvantages of each--and carefully consider how it will be used.
_FLASH FILE SYSTEMS_ by Drew Gislason[LISTING ONE]
<a name="012d_0009"> ;---------------------------------------------------------- ; FLASHDEV.ASM--02/16/93--Drew Gislason ; This FAT-Flash disk driver is designed as a Write Once Read Many (WORM) ; FAT-style disk driver. Normal DOS commands such as COPY, DIR, and TYPE will ; work as expected. DOS commands like RENAME, ERASE and "ECHO text >>file" ; will fail. Part 1 is generic to all disk devices. It could be used as a ; template for any FAT-style disk device driver. Part 2 (available ; electronically) actually reads and programs Flash memory. It is written ; for Intel Flash memory, but can be ported to other Flash memory mechanism. ;---------------------------------------------------------- ;---------------------------------------------------------- ; PART 1: GENERIC DISK DRIVER CODE -- Driver relies on a number of routines ; to read and write Flash memory. All of routines could be rewritten to ; accomodate ANY kind of Flash memory, or even ROM, RAM or an actual disk ; for that matter. Routines that MUST be supplied to make this generic driver ; into a working Flash Disk Driver are defined (in C syntax) as follows: ; bool meminit(char far *cmdline, unsigned *endseg); ; int memchanged(void); ; bool memread(long offset, unsigned len, char far buffer); ; bool memwrite(long offset, unsigned len, char far * buffer); ; Note that all the routines follow C conventions: that ; is, the last item "pushed" on the stack is the 1st ; argument, and each routine MUST preserve SI,DI,BP and ; all segment registers besides ES. ;---------------------------------------------------------- ; Device Requests -- When DOS calls the device driver, the "request" ; (pointed to by ES:BX) may contain some or all of the following fields: REQ_CMDLEN equ 00H ; length of command REQ_UNIT equ 01H ; unit # for disks REQ_CMD equ 02H ; the command itself REQ_STATUS equ 03H ; status word REQ_MEDIA equ 0DH ; media byte (disks) REQ_TRANS equ 0EH ; transfer address REQ_COUNT equ 12H ; # sectors for rd/wr REQ_START equ 14H ; start sector for rd/wr REQ_DRIVE equ 16H ; drive # (0=A, 1=B, etc) ; Device Error Codes -- If the device must return an error, the following ; list defines the possible error conditions: ERR_WRITEPROT equ 00H ERR_BADUNIT equ 01H ERR_NOTREADY equ 02H ERR_BADCMD equ 03H ERR_CRC equ 04H ERR_BADSTRUCT equ 05H ERR_SEEK equ 06H ERR_BADMEDIA equ 07H ERR_SECTORNOTFOUND equ 08H ERR_NOPAPER equ 09H ERR_WRITEFAULT equ 0AH ERR_READFAULT equ 0BH ERR_GENERALFAIL equ 0CH ; C Function Arguments -- In small CODE models (that is, where CS never ; changes), the 1st argument begins at 4 bytes above the Base Pointer. ; +--------+ ; | Return | ; +--------+ ; | BP | ; +--------+ ; P equ 4 ; Segment Ordering -- SEGMENT ORDERING is defined here for convenience. ; The linker gathers segments in the order it first encounters them. By ; defining the segment ordering immediately, segments will automatically ; appear in correct order as they do here. The segment alignment and combine ; type are specified here to simplify segment directives. _TEXT segment byte public 'CODE' ; Code assume CS:_TEXT _TEXT ends _DATA segment word public 'DATA' ; Initialized data _DATA ends _BSS segment word public 'BSS' ; Uninitialized data DGROUP group _DATA, _BSS assume DS:DGROUP _BSS ends _ENDDEV segment para public 'ENDDEV' ; End of device driver _ENDDEV ends ; Device Header -- Every DOS device driver, whether for Character or ; Disk devices, MUST begin with a device header. _TEXT segment public flash_header flash_header: dw -1, -1 ; Pointer to next header, set by DOS dw 0000H ; Disk device, not removable dw strategy ; Strategy routine dw service ; Service routine maxdrv db 1,0 ; For future expansion ; DOS makes "requests" to device driver to perform following actions... cmdtbl: dw flash_init ; 0 - initialize device dw flash_media ; 1 - detect media change dw flash_bpb ; 2 - build a bpb dw dev_cmd_err ; 3 - ioctl input (not implemented) dw flash_read ; 4 - read from device dw dev_cmd_err ; 5 - non-destructive read (unused) dw dev_cmd_err ; 6 - input status (not implemented) dw dev_cmd_err ; 7 - input flush (not implemented) dw flash_write ; 8 - write to device dw flash_write ; 9 - write to device w/ verify cmdend: _TEXT ends ; Device Strategy and Service Routines -- DOS makes two calls to a device ; driver for each request, one to the "strategy" routine and one to ; the service routine. The first call, to the device driver's "strategy" ; routine, was intended for multi-tasking that was never implemented in DOS ; and so, can be ignored. The second call, to the device driver's "service" ; routine, does the real work. _BSS segment public _dos_req_ptr _dos_req_ptr DD ? stack_bottom DW 512 dup (?) ; 1K stack local_stack DW ? org_stack DD ? _BSS ends _TEXT segment public strategy strategy: retf public service service: ; save all registers we might change pushf push AX push BX push CX push DX push SI push DI push DS push ES ; save current stack mov DX, DGROUP mov DS, DX mov WORD PTR DGROUP:org_stack, SP mov WORD PTR DGROUP:org_stack+2, SS ; switch to our stack mov AX, OFFSET DGROUP:local_stack cli mov SS, DX mov SP, AX sti ; ES:BX points to "request" from DOS, save it mov word ptr [_dos_req_ptr], BX mov word ptr [_dos_req_ptr+2], ES ; load some basics that ALL commands use into registers mov CX, word ptr ES:[BX+REQ_COUNT] ; CX = count mov DX, word ptr ES:[BX+REQ_START] ; DX = start sector mov AL, byte ptr ES:[BX+REQ_CMD] ; command code cmp AL, (OFFSET cmdend - OFFSET cmdtbl)/2 jae dev_cmd_err ; bad command cbw shl AX, 1 ; get index into cmd table mov SI, offset cmdtbl add SI, AX les DI, dword ptr ES:[BX+REQ_TRANS] ; ES:DI = transfer buffer ; we've set up the following: ; DS = DGROUP ; CX = count ; DX = start sector ; ES:DI = transfer area ; do the command jmp word ptr CS:[SI] ; Unknown command error public dev_cmd_err dev_cmd_err: mov AL, 3 ; Mark error status, error code is in AL public dev_err_exit dev_err_exit: mov AH, 81H jmp SHORT err1 ; Normal exits (no error) occur through here public dev_exit dev_exit: mov AH, 01H ; done bit set ; save status (error or not) in "request" header from DOS err1: les BX, [_dos_req_ptr] mov WORD PTR ES:[BX+REQ_STATUS], AX ; restore original stack mov AX, WORD PTR DGROUP:org_stack mov DX, WORD PTR DGROUP:org_stack+2 cli mov SS, DX mov SP, AX sti ; restore all registers we may have changed pop ES pop DS pop DI pop SI pop DX pop CX pop BX pop AX popf retf _TEXT ends ; Device Data Area _BSS segment ; if an error occurs, what is it? _memerr db ? PUBLIC start_bpb start_bpb LABEL BYTE sec_size DW ? ; bytes per sector sec_per_clus DB ? ; sectors/cluster res_sec DW ? ; reserved sectors num_fat DB ? ; number of fats (2 for a floppy) dir_ent DW ? ; number of directory entries num_sec DW ? ; total number of sectors media_desc DB ? ; media descriptor sec_per_fat DW ? ; sectors per fat sec_per_trak DW ? ; sectors/track heads DW ? ; number of heads spec_res DW ? ; reserved for disk partitioning end_bpb LABEL BYTE _BSS ends _DATA segment PUBLIC bpbptr bpbptr DW offset DGROUP:start_bpb _DATA ends ; Initialize Device (COMMAND=0) -- "request" defines following fields: ; On Entry ; | 13-Byte header | * set by DOS ; | BYTE Number of units | (undefined) ; | DWORD End Address | (undefined) ; | DWORD Cmdline Pointer | * ptr to "DEVICE=" command line ; | BYTE Block Device # | * this disk is... (0=A, 1=B, etc) ; ; device= \dev\name.sys /1 ; ^ ; Cmdline points here, string ends in a CR (0DH) or LF (0AH) ; On Exit ; 00H | 13-Byte header | (set by DOS on entry) ; 0DH | BYTE Number of units | * set by this driver ; 0EH | DWORD End Address | * set by this driver ; 12H | DWORD Ptr to BPB array | * set by this driver ; 16H | BYTE Block Device # | (set by DOS on entry) ; _DATA segment endseg dw SEG _ENDDEV _DATA ends _TEXT segment public flash_init flash_init: ; initialze the Flash memory hardware ; meminit(char far * cmdline, unsigned *endseg) mov AX, offset DGROUP:endseg push AX les BX, [_dos_req_ptr] push WORD PTR [BX+REQ_COUNT+2] push WORD PTR [BX+REQ_COUNT] call _meminit add SP, 6 or AX, AX jz init_err ; read the BPB from Flash memory into device driver BPB call readbpb or AX, AX jz init_err ; set up the i/o packet les BX, _dos_req_ptr mov BYTE PTR ES:[BX+REQ_MEDIA], 1 ; 1 unit (drive) mov WORD PTR ES:[BX+REQ_COUNT+2], DS ; point to BPB list mov WORD PTR ES:[BX+REQ_COUNT], OFFSET DGROUP:bpbptr mov AX, word ptr endseg mov WORD PTR ES:[BX+REQ_TRANS], 0 mov WORD PTR ES:[BX+REQ_TRANS+2], AX jmp dev_exit ; error occured during init, fail init_err: mov AL, ERR_GENERALFAIL jmp dev_err_exit ; Detect Media Change (COMMAND 1) -- Has flash memory been changed somehow? ; (A PCMCIA card might be pulled out and swapped for another). ; int memchanged(void) ; Returns -1 if Device HAS changed ; Returns 0 if Device MAY have changed ; Returns 1 if Device has NOT changed public flash_media flash_media: call _memchanged les BX, [_dos_req_ptr] mov byte ptr [BX+REQ_TRANS], AL jmp dev_exit ; Build a BPB (COMMAND 2) -- DOS makes this "request" when it believes disk ; has or may have been changed. This will detect public flash_bpb flash_bpb: ; read the BPB from Flash memory into device driver BPB call readbpb or AX, AX jnz gotbpb ; we failed to read BPB mov AL, _memerr jmp dev_err_exit ; return BPB ptr in request header gotbpb: les BX, [_dos_req_ptr] mov WORD PTR ES:[BX+REQ_COUNT], offset DGROUP:start_bpb mov WORD PTR ES:[BX+REQ_COUNT+2], DS jmp dev_exit readbpb: ; read the BPB from the disk into our structure ; memread(long offset, unsigned len, char far * buffer) push DS push bpbptr mov AX, (OFFSET end_bpb - OFFSET start_bpb) push AX xor DX, DX mov AX, 11 ; offset 11L (start of BPB on disk) push DX push AX call _memread add SP, 10 ret ; Read from Device (COMMAND 4) -- Read from Flash memory. DOS "requests" that ; this device driver read 1 or more sectors from Flash memory. Device driver, ; in turn, calls "memread" to do the real work. ; Upon Entry ; DS = ROM-DOS data segment ; ES:DI = buffer data ; CX = # of sectors to read ; DX = starting sector public flash_read flash_read: mov SI, OFFSET _memread ; This generic read/write routine is passed a ; pointer to the read or write routine in "SI" rdwr: ; convert # of sectors to # of bytes push DX mov BX, OFFSET DGROUP:start_bpb mov AX, [BX] ; sector size (512) mul CX mov CX, AX ; CX = bytes to transfer pop DX ; calculate starting offset mov AX, [BX] ; sector size (512) mul DX ; DX:AX = offset (in bytes) ; call memread(offset,len,buffer) to do the work push ES push DI push CX push DX push AX call SI ; call read or write routine add SP, 10 or AX, AX jz rdwr_failed mov AL, 0 jmp dev_exit rdwr_failed: mov AL, byte ptr _memerr jmp dev_err_exit ;------------------------------------------------ ; Write to Device (COMMAND 8 or 9) ; ; ; Write to Flash memory. DOS "requests" that ; this device driver write 1 or more sectors to ; the Flash memory. The device driver, in turn, ; calls "memwrite" to do the real work. ; ; Upon Entry ; DS = ROM-DOS data segment ; ES:DI = buffer data ; CX = # of sectors to read ; DX = starting sector ; public flash_write flash_write: mov SI, OFFSET _memwrite jmp short rdwr _TEXT ENDS
Copyright © 1993, Dr. Dobb's Journal