Pete works for a small consulting firm as a programmer/analyst writing client-server software in Windows 3, OS/2, DOS, and Windows NT. He is working on a book, tentatively titled The Hitchhiker's Guide to Win32 Programming, to be published by Addison-Wesley. Pete can be contacted on CompuServe at 71644,3570.
Introduction
In my introductions to this column for the past few months, I've been desperately seeking someone who would provide us with the file format used by the Windows help system. But even though I knew that .HLP files were important, and that many Windows developers would love to get their hands on this information, I was surprised by the overwhelming response to this request. Thousands of DDJ readers wrote in, explaining they had already "cracked" a bit of the .HLP file format, and that they were anxious to share information with others who might have figured out other parts. Well, actually, it was only about half a dozen readers who had figured out parts of the .HLP format, but that's still pretty good for something like this.
Why is the .HLP file format important? For starters, there are a number of WinHelp-related tools available (RoboHelp and DocToHelp spring to mind). However, all these tools produce Rich Text Format (RTF) files, which must then be fed into Microsoft's extremely slow help compiler to produce a .HLP file. Clearly, it would be nice if someone other than Microsoft knew how .HLP files work, so that we might get better and faster help compilers than the one from Microsoft, just as companies such as SLR produce better and faster linkers than Microsoft's. Knowing the .HLP file format would also make it possible to write help decompilers, and to produce a WinHelp viewer that runs under DOS, without requiring Windows.
It's important to realize that we're not just talking about "help" in the narrow F1 sense. As Ron Burk noted in his article on undocumented WinHelp (DDJ, June 1993), Microsoft has a broader view of WinHelp. The .MVB files used by Microsoft's Multimedia Viewer are really just extended .HLP files. Microsoft in fact is using this extended WinHelp system as the core for a new type of application: Both the Microsoft Developer Network (MSDN) and Cinemania CD-ROM products are essentially huge .MVB files. Yes, huge--MSDNCD.MVB (which we'll examine later on) is 270 megabytes, and CINMANIA.MVB is 139 Mbytes! Using the WinHelp file format, you can see how these applications (both of which are excellent, by the way) are put together.
As I said, a surprising number of you wrote in with information about WinHelp. Many thanks to Brian Walker, Carl Burke, and Lou Grinzo in particular for their contributions! Pete Davis (working with Ron Burk, editor of Windows/DOS Developer's Journal did a great amount of reverse engineering, put together sample code, and wrote this month's article.
The central insight in Pete's article is that .HLP and .MVB files are organized around a B-tree structure he calls WHIFS, the WinHelp Internal File System. In somewhat the same way that a Stacker or DoubleSpace "volume" looks like a single file to DOS, but actually holds all of your files, likewise a single .HLP file is just a collection of WHIFS files. Interestingly, Microsoft documents this in a roundabout way, when the Windows SDK mentions the possibility of "baggage" files and of getting at them via some WinHelp macros.
Pete and Ron figured out that everything in the .HLP file--titles, keywords, phrases, the help text itself--is treated in exactly the same way as third-party baggage. All the help data in a .HLP file is kept in various internal WHIFS files which have names such as |SYSTEM, |Phrases, and |TOPIC. It's the initial pipe character that distinguishes these built-in files from any baggage. The same format is used for .HLP files under Win32, on both Intel and MIPS machines (though there doesn't seem to be a |Phrases file).
This is a big topic, and there will barely be room to squeeze everything into even a two-part article. This month, Pete explains the basics of the WHIFS B-tree system, and explains a few of the internal files, such as |SYSTEM which, for example, contains any WinHelp macros. Next month, Pete enumerates the rest of the internal files, including |TOPIC, which contains the actual help text. Because the text is usually compressed, the article will show how to decompress it. Part Two will also describe a full-blown HelpDump program.
After the conclusion of Pete's .HLP article next month, it looks like the next topic will be Novell's "proprietary" NetWare Core Protocol (NCP). See? We don't just pick on Microsoft in this column. Any company that has an important software interface that is kept under wraps is fair game here. Send your suggestions to me at 76320,302 on CompuServe (that's [email protected] from the Internet).
Who needs to know about the Windows .HLP file format? After all, Microsoft provides a compiler and third-party vendors supply WinHelp development tools. What more could you need?
Well, there are a lot of things that could be done with solid information on .HLP files. Vendors could create development environments with help compilation. New WinHelp viewers could be written with extensions. Using the file system built into WinHelp, you could write a viewer with additional features not supported by WINHELP.EXE or VIEWER.EXE. These features could be added through the use of additional WHIFS files, and still allow the help files to be viewed by WINHELP.EXE (though without the additional features, of course).
One thing I think we'd all like to see is a help compiler that's just plain faster than Microsoft's. Also, there's currently no way to reverse the help compilation as there is with Microsoft's DOS help compiler.
The term "WinHelp file" includes not just .HLP files but also Mulitmedia Viewer (.MVB) files, such as used in Microsoft's Cinemania and MSDN CD-ROM products. The Multimedia Viewer is essentially an extended WinHelp, so many of the things we'll talk about apply to both WinHelp and VIEWER.
While there are important differences between Windows 3.0 and 3.1 help files, I'll focus here almost entirely on 3.1 files.
The key to the WinHelp file format is that it's based on a structure I call the WinHelp Internal File System (WHIFS). WHIFS is like DOS in a way. It provides a directory of files and a way to find the files. The difference is that all of the WHIFS files are kept inside a single .HLP file. As far as DOS is concerned, the .HLP file is just one file, but to WinHelp, the .HLP file contains several (possibly many) files. All that is needed to get at the information in WinHelp is a simple interface to the WHIFS. Once you have access to the WHIFS, everything else is available to you.
The WHIFS is actually documented by Microsoft, in an odd way, in its description of "baggage" files in the Programming Tools manual in the Windows 3.1 SDK. Baggage files are explained a little more clearly in the MSDN CD-ROM, in the section on creating Viewer files. Baggage files are simply files that you can list in the help project file, which the help compiler will then import into the help file and store in its own WHIFS.
What isn't documented is that a .HLP file contains many more internal files than just the baggage files included by the user. In fact, everything is in the WHIFS files, from keyword B-trees, to the Phrases table to the Topic file that contains the actual help text. Even the list of available fonts is kept in a WHIFS file. The key insight for understanding the .HLP file format is what Microsoft says only refers to baggage files, in fact refers to everything in the .HLP or .MVB file. Everything is in the WHIFS except for a 16-byte header at the beginning of the file; this header contains the WHIFS's location within the .HLP file.
This means that you don't necessarily have to write code from outside WinHelp to get at this information. You could access WinHelp's internal file system by using the WinHelp Internal File System commands. This is described in the Multimedia Developer's Kit and the Microsoft Developer's Network CD. Both refer to VIEWER.EXE, but, as Ron Burk explained in the June "Undocumented Corner" (DDJ, June 1993), the same commands are available in WinHelp. Microsoft's intention is that you use these to access your baggage files, but it turns out that you can access the WinHelp internal files in the same way. That's Ron Burk's specialty, though, and if you want to know more, you'll have to read his upcoming book, tentatively titled WinHelp for Programmers and Technical Writers. (All of my work in reverse-engineering WinHelp was done with Ron, who edits the Windows/DOS Developer's Journal.) In this article, I'll be developing DOS programs to manipulate .HLP files, so I won't be able to exploit the built-in Windows functionality.
WHSTRUCT.H in Listing One (page 136) shows all the key data structures that make up a .HLP or .MVB file. The .HLP file starts off with a structure called HELPHEADER (see Listing One). The first field is a 4-byte magic number, 0x00035F3F, which appears to be constant in all .HLP and .MVB files. You can use this field to verify that you have a valid .HLP or .MVB file. The second HELPHEADER field is the long file offset of the crucial Windows Help Internal File System header (called WHIFSBTREEHEADER in Listing One). The other fields don't seem very important.
The WHIFS header (WHIFSTBREEEHEADER) is central to the entire WinHelp file. As with several other parts of WinHelp, it is based on a B-tree. For something that has been kept under wraps so long, the B-tree implementation in WinHelp is actually fairly straightforward.
As with all B-trees, the WHIFS B-tree has two types of nodes, index and leaf nodes. In this case, the leaf nodes and index nodes have slightly different headers, described later. The combination of the header (index or leaf) and the data following the header create what will be referred to as "a page," so a 1K page with a 16-byte header would only have 1024-16=1008 bytes of data.
The B-tree has an initial header that describes the dynamics of the B-tree itself. There are fields for the number of pages, the number of levels, the root page, and so on. With this information you can then traverse the B-tree, and find internal files that make up the .HLP file.
Immediately following the header are the actual nodes of the B-tree. Once the header is read in, you should save the current file location, which marks the beginning of the first WHIFS node. The WHIFS uses 1-Kbyte pages (elsewhere WinHelp uses a 2-Kbytes B-tree page size). Thus, to find the file offset of any node in the WHIFS B-tree, you simply multiply the page number you're looking for by 1024L (note that WinHelp files can be quite large, so that the file offset must be a long), and add in the location of the first node (page 0). This is demonstrated in the sample program HELPDIR.C, Listing Two (page 136).
The NSplits field in the WHIFS header is the number of "splits" that the B-tree has suffered. When a leaf node of a B-tree has filled up and a record is added, then the node is split into two nodes.
The RootPage field tells you which page is the root of the B-tree. Page numbers start at 0. TotalPages is the number of pages in the entire B-tree. This is usually used to let you know when you've read in the last record of the B-tree.
Naturally, NLevels is how many levels there are in the B-tree. All the leaves are at the same level, so NLevels lets you know when you have reached a leaf node.
TotalWHIFSEntries is simply a count of all of the filenames in the WHIFS B-tree. This count has little use, other than to confirm the totals given in each node of the B-tree or else to know, before traversing the tree, how much space to allocate for a file list.
Each page of the B-tree has a node header. The index nodes and leaf nodes have slightly different headers. These node headers are generic and work with all of the B-trees in the entire WHIFS. The index nodes are referred to as nodes and the leaf nodes as leaves.
NEntries in the index nodes (BTREEINDEXHEADER) indicates how many keys are in the data page. As discussed earlier, where n is the number of key values, there will be n+1 pointers to lower nodes (leaf or index).
The leaf node header (BTREELEAFHEADER) has two extra fields, PreviousPage and NextPage, which are part of a double-linked list to other leaf node headers in the B-tree. This makes it easy to traverse all of the filenames alphabetically. You simply follow from one page to the next instead of backing up to the index page and finding the next entry.
If the only node in the B-tree is the root node (in other words, if there's only one page in the tree), it will be a leaf and not an index. You will need to use the leaf header to read it and both PreviousPage and NextPage will be set to -1.
The WHIFS directory is simply a list of files in the data of the leaf nodes of the WHIFS B-tree. Each filename is a variable-length ASCIIZ (null-terminated) string. Following the null is a long integer pointing to the file's header relative to the beginning of the help file. This is the information we really care about! Since the strings are of variable length, this isn't a proper C structure, and isn't shown in Listing One. Figure 1 shows an annotated hex dump of various portions of a .HLP file (it happens to be MSMAIL32.HLP from the MIPS subdirectory of Windows NT).
Files that are used by WinHelp internally are prefixed with the | (pipe) character; see Table 1. For example, the Keyword B-tree file is called |KWBTREE. Baggage files included by the user, on the other hand, are stored under the file's original name as specified in the help project file.
Each file in the WHIFS file system begins with a file header (FILEHEADER in Listing One), holding the size of the file, including the header, and the size of the file without the header, followed by a single-byte null (0x00).
So far we've covered the basic structure of WHIFS. Nothing much here sounds like it has anything to do with WinHelp! It's just a generic file system using btrees. But this is what's required to get at the internal files which are used for everything else in WinHelp.
Listing Two presents HELPDIR.C, which merely lists all of the WHIFS files in the WHIFS B-tree. The method used is to traverse the B-tree to the first leaf node. From there it uses the linked list to traverse the rest of the leaf nodes. Sample output from HELPDIR shown in Figure 2, for the same .HLP file that was hex dumped in Figure 1.
Now that you know how to get to the internal files within WinHelp, it's time to learn something about the files you're going to be reading. Each of these files is found in the WHIFS directory and starts with the standard FILEHEADER structure. Each file has a different purpose and is used by WinHelp in one way or another. For example, the |SYSTEM file holds global information, such as the amount of compression used. The |TOPIC file contains the actual help text (possibly compressed). These internal files provide the actual help display that the user sees.
Applications that interpret the |TOPIC file or |Phrases file should first read the |SYSTEM File (SYSTEMHEADER in Listing One) to determine if any compression is used. It also tells which version of the help compiler was used to compress the file. With a few files I've seen, the flag values appear to take on an almost random value. This has caused me some heartache, so the flags in Listing One (NO_COMPRESSION_310, and so on) should be used with care.
In .HLP generated by the 3.10 help compiler, the SYSTEMHEADER is followed by a group of records (SYSTEMREC) that contain information that is needed within the help file in general. Each record has a record type (HPJ_TITLE, and so on in Listing One). You must keep track of how far you are through the |SYSTEM file to know if there are any more records (you need the file header with the file length). It seems the copyright notice shows up in all 3.10 compiles, although the copyright notice is one null byte if no copyright notice appears in the .HPJ file.
For the 3.00 compiler, the rest of |SYSTEM, following the SYSTEMHEADER is reserved for the TITLE. The 3.0 |SYSTEM file has a fixed length of 54 bytes, including the file header. The 3.0 compiler does not create SYSTEMREC records.
A most interesting record type is MACRO_DATA (0x04). As discussed in Volume 4 (Resources), Chapter 15 of the Windows 3.1 SDK, .HLP files can include calls to macros. For example, RegisterRoutine() takes a DLL name, function name, and format specification for the function's arguments (U represents an unsigned long, S represents a far string, and so on). DLL functions registered this way can then be used in the .HLP file. Because they can link to anything in a DLL, .HLP files can thus act as applications.
Listing Three (page 137) shows WHMACROS.C, which will list all the macros included in a .HLP or .MVB file. Sample output from the MSDN CD-ROM and Microsoft Cinemania are shown in Figure 3. WHMACROS is essentially a mini-disassembler for the code stored in WinHelp files. Its implementation, however, is trivial. Macros are stored verbatim (even abbreviations such as JI for JumpId() are preserved). In Figure 3, you can see that the Search button is attached to ExecFullTextSearch() in FTUI.DLL (a DLL which provides full-text search capabilities). Likewise, in the MSDN CD, the previous and next buttons are attached to the Navigator() function in MSDNCD.DLL.
The |Phrases file is a list of phrases that are used as a sort of compression of the |TOPIC file (which we'll discuss next month). The |Phrases file is simply a collection of phrases that appear in the |TOPIC text a certain number of times. If the help compiler feels that the word appears often enough to warrant replacement, it adds it to the |Phrases file and in the topic file, puts a reference to the phrase in place of the actual phrase.
The basic phrase header (PHRASEHDR) is a simple two-word record. The first word is the number of phrases in the phrase list. The second word is unknown, but seems to always contain 0x0100.
If the 3.10 help compiler was used with COMPRESSION=HIGH, an alternate phrase header (ALTPHRASEHDR) is used. It contains one additional field, PhrasesSize, which is simply the amount of space required by all of the offsets and phrases in the phrase list. The reason is that the phrases file, when COMPRESSION=HIGH, is compressed using a simple compression technique. (Next month I'll discuss the compression technique and provide the code for decompressing the |Phrases file.)
Following the phrase header is an array of offsets, followed by the phrases themselves. The offsets are used to determine the start and end of a phrase in the phrase table. There are NumPhrases+1 offsets, where the last offset points to one byte past the letter of the phrase. This is used to calculate the length of the last phrase.
The entire |Phrases file, except for the header, is kept in one contiguous block of memory. When the phrases are decompressed or loaded into memory, all offsets point correctly to the phrases. For a given phrase number, n, the phrase begins at offset[n] and is of the length offset[n+1]-offset[n].
Well, no. I said there were a lot of structures, and we've covered only a few of the ten system files. For example, all the actual help text is in the |TOPIC, which I haven't covered yet. I'll have to cover as many as I can next month. I'll also be providing HelpDump, a program that will dump information about the various files in a help file. Also, next month, I'll discuss the compression algorithm used on the |Phrases and |TOPIC file when using the 3.10 help compiler with COMPRESSION=HIGH.
Before wrapping it up, I want to give credit where credit is due. Much of the work here was done with or by Ron Burk, the true expert on WinHelp programming. Several other people found and/or corrected a lot of the information--Lou Grinzo, Carl Burke, and Brian Walker.
As with any undocumented features, this information needs to be used with care. For one thing, it probably isn't 100 percent accurate; it certainly isn't complete. I'd be more than happy to hear from anyone who has some more insights into the WinHelp format.
Copyright © 1993, Dr. Dobb's Journalby Andrew Schulman
The .HLP Internal File System
Help File Header
WHIFS Header
The WHIFS B-tree
WHIFS Directory and File Header
From WHIFS to WinHelp
The |SYSTEM File
The |Phrases File
That's It?
Table 1
WinHelp internal files.===========================================================================
Function Description
===========================================================================
bmx Bitmap files, numbered (bm0, bm24, bm12, and so on. Do not
start with a |)
|CONTEXT Context topic table
|CTXOMAP Context mapping to topics
|FONT Fonts available to help file
|KWBTREE Keyword B-tree file
|KWDATA Keyword mappings to topic file
|KWMAP Map into the KWBTREE for quick access
|Phrases A list of phrases used for compression of the |TOPIC file
|SYSTEM Contains mostly information from the .HPJ file
|TOMAP List of pointers to topics
|TOPIC Contains the actual help text (usually compressed)
|TTLBTREE Topic titles B-tree
baggage Appears under the filename exactly as specified in help project
===========================================================================
Figure 1: Annotated hex dump of portions of a .HLP file. (a) All .HLP files start with a HELPHEADER. The first long is the .HLP magic number (0x035F3F). The next long is the file offset of the WHIFS header (in Figure 1(b), that's 0x041F); (b) the WHIFS starts off with a WHIFSBTREEHEADER, immediately followed by the WHIFS directory, which contains null-terminated file names followed by the individual WHIFS file's offset within the larger .HLP file. Here, bag.ini is at offset 0x10, |CONTEXT is at 0x0362B3, and |CTXOMAP is at 0x032B02; (c) each internal file begins with FILEHEADER structure, which specifies the file's size both with and without the header, followed by a 0. Here, bag.ini is 0x040F bytes with the header, and 0x0F06 without. The file data itself (evidentally, some kind of initialization file) starts immediately after the header.
===========================================================================
(a)
D:\MIPS>dump msmail32.hlp -bytes 8
00000000 | 3F 5F 03 00 1F 04 00 00 | ?_......
(b)
D:\MIPS>dump msmail32.hlp -offset 0x041f
0000041f | 2F 04 00 00 26 04 00 00 04 3B 29 02 04 00 04 7A |/...&....;)....z
0000042f | 34 00 00 43 3A 5C 7E 68 63 35 00 09 02 62 6D 00 |4..C:\~hc5...bm.
0000043f | 00 00 00 00 00 FF FF 01 00 01 00 1E 00 00 00 C1 |................
0000044f | 02 1E 00 FF FF FF FF 62 61 67 2E 69 6E 69 00 10 |.......bag.ini..
0000045f | 00 00 00 7C 43 4F 4E 54 45 58 54 00 B3 62 03 00 |...|CONTEXT..b..
0000046f | 7C 43 54 58 4F 4D 41 50 00 02 2B 03 00 7C 46 4F ||CTXOMAP..+..|FO
; ... etc. ...
(c)
D:\MIPS>dump d:\mips\msmail32.hlp -offset 0x10
00000010 | 0F 04 00 00 06 04 00 00 00 0D 0A 5B 62 61 67 2E |...........[bag.
00000020 | 69 6E 69 5D 0D 0A 67 72 6F 75 70 63 6F 75 6E 74 |ini]..groupcount
00000030 | 3D 31 34 0D 0A 67 72 6F 75 70 31 3D 42 61 63 6B |=14..group1=Back
00000040 | 75 70 0D 0A 67 72 6F 75 70 32 3D 43 6C 69 70 62 |up..group2=Clipb
; ... etc. ...
===========================================================================
Figure 2: HELPDIR output for the .HLP file hex dumped in Figure 1.
===========================================================================
D:\MIPS>c:\ddj\helpdir msmail32.hlp
bag.ini 0x00000010
|CONTEXT 0x000362B3
|CTXOMAP 0x00032B02
|FONT 0x000327D2
|KWBTREE 0x00033255
|KWDATA 0x00032ED5
|KWMAP 0x0003323E
|SYSTEM 0x0000084E
|TOPIC 0x00000A53
|TTLBTREE 0x00034A84
|bm0 0x00037AE2
; ... etc. ...
===========================================================================
Figure 3: Selected macros in Microsoft Cinemania and the MSDN
CD-ROM, as displayed by WHMACROS.
===========================================================================
C:\DDJ>dir d:\content\*.mvb
CINMANIA MVB 139104719 08-18-92 12:00a
C:\DDJ>whmacros d:\content\cinmania.mvb
RegisterRoutine("ftui","InitRoutines","SU")
RegisterRoutine("ftui","ExecFullTextSearch","USSS")
; ...
InitRoutines(qchPath,1)
; ...
CreateButton("ftSearch", "&Search", \
"ExecFullTextSearch(hwndApp, qchPath, `', `')")
; ...
C:\DDJ>dir d:\*.mvb
MSDNCD MVB 270353088 04-05-93 5:58p
C:\DDJ>whmacros d:\msdncd.mvb
RegisterRoutine("msdncd", "Navigator", "USS")
Navigator(hwndApp, "Load", qchPath)
; ...
CreateButton("btn_prv","<<I&ndex","Navigator(hwndApp,\"Prev\",\"\")")
CreateButton("btn_nxt","Inde&x>>","Navigator(hwndApp,\"Next\",\"\")")
===========================================================================
_UNDOCUMENTED CORNER_
edited by Andrew Schulman
written by Pete Davis
[LISTING ONE]
/* WHSTRUCT.H--Windows Help File Internal Records--Pete Davis and Ron Burk,
June 1993. See "Undocumented Corner," DDJ, September 1993 */
typedef unsigned long DWORD;
typedef unsigned int WORD;
typedef unsigned char BYTE;
#define HELP_MAGIC 0x00035F3FL
/* Help file Header record */
typedef struct HELPHEADER {
DWORD MagicNumber; /* 0x00035F3F */
long WHIFS; /* File offset of WHIFS header */
long Negative1;
long FileSize; /* Size of entire .HLP File */
} HELPHEADER;
/* File Header for WHIFS files */
typedef struct FILEHEADER {
long FilePlusHeader; /* File size including this header */
long FileSize; /* File size not including header */
char TermNull;
} FILEHEADER;
/* Help Directory BTREE */
typedef struct WHIFSBTREEHEADER {
char Magic[18]; /* Not exactly magic for some .MVB files */
char Garbage[13];
int MustBeZero; /* Probably shows up when Help > ~40 megs */
int NSplits; /* Number of page split Btree has suffered */
int RootPage; /* Page # of root page */
int MustBeNegOne; /* Probably shows up when B-Tree is HUGE!! */
int TotalPages; /* total # to 2Kb pages in Btree */
int NLevels; /* Number of levels in this Btree */
DWORD TotalWHIFSEntries;
} WHIFSBTREEHEADER;
/* Modified B-Tree Node header to handle a pointer to the page */
typedef struct BTREENODEHEADER {
WORD Signature; /* Signature word */
int NEntries; /* Number of entries */
int PreviousPage; /* Index of Previous Page */
int NextPage; /* Index of Next Page */
char *BTData; /* Pointer to B-Tree's data */
} BTREENODEHEADER;
/* Modified B-Tree Index header to handle a pointer to the page */
typedef struct BTREEINDEXHEADER {
WORD Signature; /* Signature word */
int NEntries; /* Number of entries in node */
char *IdxData;
} BTREEINDEXHEADER;
/* Phrase header for uncompressed |Phrases file */
typedef struct PHRASEHDR {
int NumPhrases; /* Number of phrases in table */
WORD OneHundred; /* 0x0100 */
} PHRASEHDR;
/* Phrase header for compressed |Phrases file */
typedef struct ALTPHRASEHDR {
int NumPhrases; /* Number of phrases in table */
WORD OneHundred; /* 0x0100 */
long PhrasesSize; /* Amount of space uncompressed phrases requires */
} ALTPHRASEHDR;
/* Flags for |SYSTEM header Flags field below: Unfortunately, none of these
flags are particularly solid. The 0x0004 works MOST of the time. Another
flag, 0x0008, appears both in Win32 .HLP files, and in files with Phrase
compression but without LZ77 compression. */
#define NO_COMPRESSION_310 0x0000
#define COMPRESSION_310 0x0004
#define SYSFLAG_300 0x000A
/* Header for |SYSTEM file */
typedef struct SYSTEMHEADER {
BYTE Magic; /* 0x6C */
BYTE Version; /* Version # */
BYTE Revision; /* Revision code */
BYTE Always0; /* Unknown */
WORD Always1; /* Always 0x0001 */
DWORD GenDate; /* Date/Time that the help file was generated */
WORD Flags; /* Values seen: 0x0000 0x0004, 0x0008, 0x000A */
} SYSTEMHEADER;
/* Types for SYSTEMREC RecordType below: note that other record types,
such as 0x0A, 0x0B, 0x0C, 0x0D, shown up in the large .MVB files used
by the MSDN CD-ROM and Cinemania products. */
#define HPJ_TITLE 0x0001 /* Title from .HPJ file */
#define HPJ_COPYRIGHT 0x0002 /* Copyright notice from .HPJ file */
#define HPJ_CONTENTS 0x0003 /* Contents= from .HPJ */
#define MACRO_DATA 0x0004 /* RData = 4 nulls if no macros */
#define ICON_DATA 0x0005 /* Data for Icon */
#define HPJ_SECWINDOWS 0x0006 /* Secondary window info in .HPJ */
#define HPJ_CITATION 0x0008 /* Citation= under [OPTIONS] */
/* Secondary Window Record following type 0x0006 System Record */
typedef struct SECWINDOW {
WORD Flags; /* Flags (See Below) */
BYTE Type[10]; /* Type of window */
BYTE Name[9]; /* Window name */
BYTE Caption[51]; /* Caption for window */
WORD X; /* X coordinate to start at */
WORD Y; /* Y coordinate to start at */
WORD Width; /* Width to create for */
WORD Height; /* Height to create for */
WORD Maximize; /* Maximize flag */
BYTE Rgb[3]; /* RGB for background */
BYTE Unknown1; /* No known use */
BYTE RgbNsr[3]; /* RGB for non scrollable region */
BYTE Unknown2; /* No known use */
} SECWINDOW;
/* Values for Secondary Window Flags */
#define WSYSFLAG_TYPE 0x0001 /* Type is valid */
#define WSYSFLAG_NAME 0x0002 /* Name is valid */
#define WSYSFLAG_CAPTION 0x0004 /* Ccaption is valid */
#define WSYSFLAG_X 0x0008 /* X is valid */
#define WSYSFLAG_Y 0x0010 /* Y is valid */
#define WSYSFLAG_WIDTH 0x0020 /* Width is valid */
#define WSYSFLAG_HEIGHT 0x0040 /* Height is valid */
#define WSYSFLAG_MAXIMIZE 0x0080 /* Maximize is valid */
#define WSYSFLAG_RGB 0x0100 /* Rgb is valid */
#define WSYSFLAG_RGBNSR 0x0200 /* RgbNsr is valid */
#define WSYSFLAG_TOP 0x0400 /* On top was set in HPJ file */
/* Help Compiler 3.1 System record. Multiple records possible */
typedef struct SYSTEMREC {
WORD RecordType; /* Type of Data in record */
WORD DataSize; /* Size of RData */
char *RData; /* Raw data (Icon, title, etc) */
} SYSTEMREC;
/* Header for |TOMAP file */
typedef struct TOMAPHEADER {
long IndexTopic; /* Index topic for help file */
long Reserved[15];
int ToMapLen; /* Number of topic pointers */
long *TopicPtr; /* Pointer to all the topics */
} TOMAPHEADER;
[LISTING TWO]<a name="029e_0015">
/* HELPDIR.C -- List all internal files with a Windows .HLP file.
WHIFS = Windows Help Internal File System -- Pete Davis, June 1993
bcc helpdir.c
See "Undocumented Corner," DDJ, September 1993 */
#pragma pack(1)
#include <conio.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "whstruct.h"
#define PAGE_SIZE 1024L /* 1k pages -- must be long! */
void fail(const char *s) { puts(s); exit(1); }
int main(int argc, char *argv[]) {
HELPHEADER HelpHdr;
WHIFSBTREEHEADER WHIFSHdr;
BTREENODEHEADER WHIFSNode;
int file, aPage, c;
long WHIFSStart, FileOffset;
FILE *HelpFile;
if ((HelpFile=fopen(argv[1], "rb")) == NULL)
fail("can't open file");
/* Get Help header, go to WHIFS and get WHIFS Header */
fread(&HelpHdr, sizeof(HelpHdr), 1, HelpFile);
if (HelpHdr.MagicNumber != HELP_MAGIC)
fail("not a Windows help file");
fseek(HelpFile, HelpHdr.WHIFS, SEEK_SET);
fread(&WHIFSHdr, sizeof(WHIFSHdr), 1, HelpFile);
/* WHIFS starts after the WHIFSHdr */
WHIFSStart = HelpHdr.WHIFS + sizeof(WHIFSHdr);
file=1;
/* Goto WHIFS Root */
fseek(HelpFile, WHIFSStart + (PAGE_SIZE * WHIFSHdr.RootPage), SEEK_SET);
/* Find the first leaf node */
while (file < WHIFSHdr.NLevels) {
/* if it's not a leaf, we don't need last 2 fields */
fread(&WHIFSNode, 4, 1, HelpFile);
/* Find page pointer to first node in index */
fread(&aPage, sizeof(int), 1, HelpFile);
fseek(HelpFile, WHIFSStart + (PAGE_SIZE * aPage), SEEK_SET);
file++;
}
#ifdef DO_MACROS
{
extern void do_macros(FILE *HelpFile, long WHIFSStart);
do_macros(HelpFile, WHIFSStart);
}
#else
/* Go through linked list of leaf nodes */
for (;;) {
if (! fread(&WHIFSNode, sizeof(WHIFSNode)-2, 1, HelpFile))
break;
/* List all entries in node */
for (file = 1; file <= WHIFSNode.NEntries; file ++) {
while (c = fgetc(HelpFile))
putchar(c);
fread(&FileOffset, sizeof(FileOffset), 1, HelpFile);
printf(" \t0x%08lX\n", FileOffset);
}
if (WHIFSNode.NextPage == -1)
break;
else
fseek(HelpFile,WHIFSStart+(WHIFSNode.NextPage*PAGE_SIZE),SEEK_SET);
}
#endif
return 1;
}
<a name="029e_0016"><a name="029e_0017">
[LISTING THREE]<a name="029e_0017">
/* WHMACROS.C -- Get macros from a .HLP file. Used by HELPDIR.C if #define
DO_MACROS -- Pete Davis and Andrew Schulman,
bcc -DDO_MACROS whmacros.c helpdir.c
See "Undocumented Corner," DDJ, September 1993 */
#pragma pack(1)
#include <conio.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "whstruct.h"
extern void fail(const char *s);
#define PAGE_SIZE 1024L /* 1k pages -- must be long! */
void do_macros(FILE *HelpFile, long WHIFSStart)
{
BTREENODEHEADER WHIFSNode;
SYSTEMHEADER SystemHdr;
SYSTEMREC SystemRec;
FILEHEADER FileHdr;
long SystemOffset=0, FileOffset, FileStart;
char filename[20], *data;
int *Offsets;
int c, i, file, txt;
/* Find the System file. */
do {
fread(&WHIFSNode, sizeof(WHIFSNode) - 2, 1, HelpFile);
/* Search all entries in node */
for (file = 1; file <= WHIFSNode.NEntries; file ++) {
i = 0;
while ( c = fgetc(HelpFile) )
filename[i++]=c;
filename[i] = 0;
fread(&FileOffset, sizeof(FileOffset), 1, HelpFile);
if (strcmp(filename, "|SYSTEM") == 0) {
SystemOffset = FileOffset;
break;
}
}
if (WHIFSNode.NextPage != -1)
fseek(HelpFile, WHIFSStart + (WHIFSNode.NextPage * PAGE_SIZE),
SEEK_SET);
} while (WHIFSNode.NextPage != -1);
if (! SystemOffset)
fail("Can't locate |SYSTEM file");
/* Get System header */
fseek(HelpFile, SystemOffset, SEEK_SET);
fread(&FileHdr, sizeof(FileHdr), 1, HelpFile);
fread(&SystemHdr, sizeof(SystemHdr), 1, HelpFile);
FileStart = SystemOffset + sizeof(FileHdr) + sizeof(SystemHdr);
FileOffset = 0;
while (FileOffset < FileHdr.FileSize) {
fseek(HelpFile, FileStart + FileOffset, SEEK_SET);
fread(&SystemRec, sizeof(SystemRec)-1, 1, HelpFile);
FileOffset += (sizeof(SystemRec) + SystemRec.DataSize - 1);
if (SystemRec.RecordType == MACRO_DATA) {
if (! (data = (char *) malloc(SystemRec.DataSize+1)))
fail("insufficient memory");
fread(data, SystemRec.DataSize, 1, HelpFile);
data[SystemRec.DataSize] = '\0';
printf("%s\n\n", data);
free(data);
}
}
}