Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Undocumented Corner


SEP93: UNDOCUMENTED CORNER

Pete works for a small consulting firm as a programmer/analyst writing client-server software in Windows 3, OS/2, DOS, and Windows NT. He is working on a book, tentatively titled The Hitchhiker's Guide to Win32 Programming, to be published by Addison-Wesley. Pete can be contacted on CompuServe at 71644,3570.


Introduction

by Andrew Schulman

In my introductions to this column for the past few months, I've been desperately seeking someone who would provide us with the file format used by the Windows help system. But even though I knew that .HLP files were important, and that many Windows developers would love to get their hands on this information, I was surprised by the overwhelming response to this request. Thousands of DDJ readers wrote in, explaining they had already "cracked" a bit of the .HLP file format, and that they were anxious to share information with others who might have figured out other parts. Well, actually, it was only about half a dozen readers who had figured out parts of the .HLP format, but that's still pretty good for something like this.

Why is the .HLP file format important? For starters, there are a number of WinHelp-related tools available (RoboHelp and DocToHelp spring to mind). However, all these tools produce Rich Text Format (RTF) files, which must then be fed into Microsoft's extremely slow help compiler to produce a .HLP file. Clearly, it would be nice if someone other than Microsoft knew how .HLP files work, so that we might get better and faster help compilers than the one from Microsoft, just as companies such as SLR produce better and faster linkers than Microsoft's. Knowing the .HLP file format would also make it possible to write help decompilers, and to produce a WinHelp viewer that runs under DOS, without requiring Windows.

It's important to realize that we're not just talking about "help" in the narrow F1 sense. As Ron Burk noted in his article on undocumented WinHelp (DDJ, June 1993), Microsoft has a broader view of WinHelp. The .MVB files used by Microsoft's Multimedia Viewer are really just extended .HLP files. Microsoft in fact is using this extended WinHelp system as the core for a new type of application: Both the Microsoft Developer Network (MSDN) and Cinemania CD-ROM products are essentially huge .MVB files. Yes, huge--MSDNCD.MVB (which we'll examine later on) is 270 megabytes, and CINMANIA.MVB is 139 Mbytes! Using the WinHelp file format, you can see how these applications (both of which are excellent, by the way) are put together.

As I said, a surprising number of you wrote in with information about WinHelp. Many thanks to Brian Walker, Carl Burke, and Lou Grinzo in particular for their contributions! Pete Davis (working with Ron Burk, editor of Windows/DOS Developer's Journal did a great amount of reverse engineering, put together sample code, and wrote this month's article.

The central insight in Pete's article is that .HLP and .MVB files are organized around a B-tree structure he calls WHIFS, the WinHelp Internal File System. In somewhat the same way that a Stacker or DoubleSpace "volume" looks like a single file to DOS, but actually holds all of your files, likewise a single .HLP file is just a collection of WHIFS files. Interestingly, Microsoft documents this in a roundabout way, when the Windows SDK mentions the possibility of "baggage" files and of getting at them via some WinHelp macros.

Pete and Ron figured out that everything in the .HLP file--titles, keywords, phrases, the help text itself--is treated in exactly the same way as third-party baggage. All the help data in a .HLP file is kept in various internal WHIFS files which have names such as |SYSTEM, |Phrases, and |TOPIC. It's the initial pipe character that distinguishes these built-in files from any baggage. The same format is used for .HLP files under Win32, on both Intel and MIPS machines (though there doesn't seem to be a |Phrases file).

This is a big topic, and there will barely be room to squeeze everything into even a two-part article. This month, Pete explains the basics of the WHIFS B-tree system, and explains a few of the internal files, such as |SYSTEM which, for example, contains any WinHelp macros. Next month, Pete enumerates the rest of the internal files, including |TOPIC, which contains the actual help text. Because the text is usually compressed, the article will show how to decompress it. Part Two will also describe a full-blown HelpDump program.

After the conclusion of Pete's .HLP article next month, it looks like the next topic will be Novell's "proprietary" NetWare Core Protocol (NCP). See? We don't just pick on Microsoft in this column. Any company that has an important software interface that is kept under wraps is fair game here. Send your suggestions to me at 76320,302 on CompuServe (that's [email protected] from the Internet).

Who needs to know about the Windows .HLP file format? After all, Microsoft provides a compiler and third-party vendors supply WinHelp development tools. What more could you need?

Well, there are a lot of things that could be done with solid information on .HLP files. Vendors could create development environments with help compilation. New WinHelp viewers could be written with extensions. Using the file system built into WinHelp, you could write a viewer with additional features not supported by WINHELP.EXE or VIEWER.EXE. These features could be added through the use of additional WHIFS files, and still allow the help files to be viewed by WINHELP.EXE (though without the additional features, of course).

One thing I think we'd all like to see is a help compiler that's just plain faster than Microsoft's. Also, there's currently no way to reverse the help compilation as there is with Microsoft's DOS help compiler.

The term "WinHelp file" includes not just .HLP files but also Mulitmedia Viewer (.MVB) files, such as used in Microsoft's Cinemania and MSDN CD-ROM products. The Multimedia Viewer is essentially an extended WinHelp, so many of the things we'll talk about apply to both WinHelp and VIEWER.

While there are important differences between Windows 3.0 and 3.1 help files, I'll focus here almost entirely on 3.1 files.

The .HLP Internal File System

The key to the WinHelp file format is that it's based on a structure I call the WinHelp Internal File System (WHIFS). WHIFS is like DOS in a way. It provides a directory of files and a way to find the files. The difference is that all of the WHIFS files are kept inside a single .HLP file. As far as DOS is concerned, the .HLP file is just one file, but to WinHelp, the .HLP file contains several (possibly many) files. All that is needed to get at the information in WinHelp is a simple interface to the WHIFS. Once you have access to the WHIFS, everything else is available to you.

The WHIFS is actually documented by Microsoft, in an odd way, in its description of "baggage" files in the Programming Tools manual in the Windows 3.1 SDK. Baggage files are explained a little more clearly in the MSDN CD-ROM, in the section on creating Viewer files. Baggage files are simply files that you can list in the help project file, which the help compiler will then import into the help file and store in its own WHIFS.

What isn't documented is that a .HLP file contains many more internal files than just the baggage files included by the user. In fact, everything is in the WHIFS files, from keyword B-trees, to the Phrases table to the Topic file that contains the actual help text. Even the list of available fonts is kept in a WHIFS file. The key insight for understanding the .HLP file format is what Microsoft says only refers to baggage files, in fact refers to everything in the .HLP or .MVB file. Everything is in the WHIFS except for a 16-byte header at the beginning of the file; this header contains the WHIFS's location within the .HLP file.

This means that you don't necessarily have to write code from outside WinHelp to get at this information. You could access WinHelp's internal file system by using the WinHelp Internal File System commands. This is described in the Multimedia Developer's Kit and the Microsoft Developer's Network CD. Both refer to VIEWER.EXE, but, as Ron Burk explained in the June "Undocumented Corner" (DDJ, June 1993), the same commands are available in WinHelp. Microsoft's intention is that you use these to access your baggage files, but it turns out that you can access the WinHelp internal files in the same way. That's Ron Burk's specialty, though, and if you want to know more, you'll have to read his upcoming book, tentatively titled WinHelp for Programmers and Technical Writers. (All of my work in reverse-engineering WinHelp was done with Ron, who edits the Windows/DOS Developer's Journal.) In this article, I'll be developing DOS programs to manipulate .HLP files, so I won't be able to exploit the built-in Windows functionality.

Help File Header

WHSTRUCT.H in Listing One (page 136) shows all the key data structures that make up a .HLP or .MVB file. The .HLP file starts off with a structure called HELPHEADER (see Listing One). The first field is a 4-byte magic number, 0x00035F3F, which appears to be constant in all .HLP and .MVB files. You can use this field to verify that you have a valid .HLP or .MVB file. The second HELPHEADER field is the long file offset of the crucial Windows Help Internal File System header (called WHIFSBTREEHEADER in Listing One). The other fields don't seem very important.

WHIFS Header

The WHIFS header (WHIFSTBREEEHEADER) is central to the entire WinHelp file. As with several other parts of WinHelp, it is based on a B-tree. For something that has been kept under wraps so long, the B-tree implementation in WinHelp is actually fairly straightforward.

As with all B-trees, the WHIFS B-tree has two types of nodes, index and leaf nodes. In this case, the leaf nodes and index nodes have slightly different headers, described later. The combination of the header (index or leaf) and the data following the header create what will be referred to as "a page," so a 1K page with a 16-byte header would only have 1024-16=1008 bytes of data.

The B-tree has an initial header that describes the dynamics of the B-tree itself. There are fields for the number of pages, the number of levels, the root page, and so on. With this information you can then traverse the B-tree, and find internal files that make up the .HLP file.

Immediately following the header are the actual nodes of the B-tree. Once the header is read in, you should save the current file location, which marks the beginning of the first WHIFS node. The WHIFS uses 1-Kbyte pages (elsewhere WinHelp uses a 2-Kbytes B-tree page size). Thus, to find the file offset of any node in the WHIFS B-tree, you simply multiply the page number you're looking for by 1024L (note that WinHelp files can be quite large, so that the file offset must be a long), and add in the location of the first node (page 0). This is demonstrated in the sample program HELPDIR.C, Listing Two (page 136).

The NSplits field in the WHIFS header is the number of "splits" that the B-tree has suffered. When a leaf node of a B-tree has filled up and a record is added, then the node is split into two nodes.

The RootPage field tells you which page is the root of the B-tree. Page numbers start at 0. TotalPages is the number of pages in the entire B-tree. This is usually used to let you know when you've read in the last record of the B-tree.

Naturally, NLevels is how many levels there are in the B-tree. All the leaves are at the same level, so NLevels lets you know when you have reached a leaf node.

TotalWHIFSEntries is simply a count of all of the filenames in the WHIFS B-tree. This count has little use, other than to confirm the totals given in each node of the B-tree or else to know, before traversing the tree, how much space to allocate for a file list.

The WHIFS B-tree

Each page of the B-tree has a node header. The index nodes and leaf nodes have slightly different headers. These node headers are generic and work with all of the B-trees in the entire WHIFS. The index nodes are referred to as nodes and the leaf nodes as leaves.

NEntries in the index nodes (BTREEINDEXHEADER) indicates how many keys are in the data page. As discussed earlier, where n is the number of key values, there will be n+1 pointers to lower nodes (leaf or index).

The leaf node header (BTREELEAFHEADER) has two extra fields, PreviousPage and NextPage, which are part of a double-linked list to other leaf node headers in the B-tree. This makes it easy to traverse all of the filenames alphabetically. You simply follow from one page to the next instead of backing up to the index page and finding the next entry.

If the only node in the B-tree is the root node (in other words, if there's only one page in the tree), it will be a leaf and not an index. You will need to use the leaf header to read it and both PreviousPage and NextPage will be set to -1.

WHIFS Directory and File Header

The WHIFS directory is simply a list of files in the data of the leaf nodes of the WHIFS B-tree. Each filename is a variable-length ASCIIZ (null-terminated) string. Following the null is a long integer pointing to the file's header relative to the beginning of the help file. This is the information we really care about! Since the strings are of variable length, this isn't a proper C structure, and isn't shown in Listing One. Figure 1 shows an annotated hex dump of various portions of a .HLP file (it happens to be MSMAIL32.HLP from the MIPS subdirectory of Windows NT).

Files that are used by WinHelp internally are prefixed with the | (pipe) character; see Table 1. For example, the Keyword B-tree file is called |KWBTREE. Baggage files included by the user, on the other hand, are stored under the file's original name as specified in the help project file.

Each file in the WHIFS file system begins with a file header (FILEHEADER in Listing One), holding the size of the file, including the header, and the size of the file without the header, followed by a single-byte null (0x00).

From WHIFS to WinHelp

So far we've covered the basic structure of WHIFS. Nothing much here sounds like it has anything to do with WinHelp! It's just a generic file system using btrees. But this is what's required to get at the internal files which are used for everything else in WinHelp.

Listing Two presents HELPDIR.C, which merely lists all of the WHIFS files in the WHIFS B-tree. The method used is to traverse the B-tree to the first leaf node. From there it uses the linked list to traverse the rest of the leaf nodes. Sample output from HELPDIR shown in Figure 2, for the same .HLP file that was hex dumped in Figure 1.

Now that you know how to get to the internal files within WinHelp, it's time to learn something about the files you're going to be reading. Each of these files is found in the WHIFS directory and starts with the standard FILEHEADER structure. Each file has a different purpose and is used by WinHelp in one way or another. For example, the |SYSTEM file holds global information, such as the amount of compression used. The |TOPIC file contains the actual help text (possibly compressed). These internal files provide the actual help display that the user sees.

The |SYSTEM File

Applications that interpret the |TOPIC file or |Phrases file should first read the |SYSTEM File (SYSTEMHEADER in Listing One) to determine if any compression is used. It also tells which version of the help compiler was used to compress the file. With a few files I've seen, the flag values appear to take on an almost random value. This has caused me some heartache, so the flags in Listing One (NO_COMPRESSION_310, and so on) should be used with care.

In .HLP generated by the 3.10 help compiler, the SYSTEMHEADER is followed by a group of records (SYSTEMREC) that contain information that is needed within the help file in general. Each record has a record type (HPJ_TITLE, and so on in Listing One). You must keep track of how far you are through the |SYSTEM file to know if there are any more records (you need the file header with the file length). It seems the copyright notice shows up in all 3.10 compiles, although the copyright notice is one null byte if no copyright notice appears in the .HPJ file.

For the 3.00 compiler, the rest of |SYSTEM, following the SYSTEMHEADER is reserved for the TITLE. The 3.0 |SYSTEM file has a fixed length of 54 bytes, including the file header. The 3.0 compiler does not create SYSTEMREC records.

A most interesting record type is MACRO_DATA (0x04). As discussed in Volume 4 (Resources), Chapter 15 of the Windows 3.1 SDK, .HLP files can include calls to macros. For example, RegisterRoutine() takes a DLL name, function name, and format specification for the function's arguments (U represents an unsigned long, S represents a far string, and so on). DLL functions registered this way can then be used in the .HLP file. Because they can link to anything in a DLL, .HLP files can thus act as applications.

Listing Three (page 137) shows WHMACROS.C, which will list all the macros included in a .HLP or .MVB file. Sample output from the MSDN CD-ROM and Microsoft Cinemania are shown in Figure 3. WHMACROS is essentially a mini-disassembler for the code stored in WinHelp files. Its implementation, however, is trivial. Macros are stored verbatim (even abbreviations such as JI for JumpId() are preserved). In Figure 3, you can see that the Search button is attached to ExecFullTextSearch() in FTUI.DLL (a DLL which provides full-text search capabilities). Likewise, in the MSDN CD, the previous and next buttons are attached to the Navigator() function in MSDNCD.DLL.

The |Phrases File

The |Phrases file is a list of phrases that are used as a sort of compression of the |TOPIC file (which we'll discuss next month). The |Phrases file is simply a collection of phrases that appear in the |TOPIC text a certain number of times. If the help compiler feels that the word appears often enough to warrant replacement, it adds it to the |Phrases file and in the topic file, puts a reference to the phrase in place of the actual phrase.

The basic phrase header (PHRASEHDR) is a simple two-word record. The first word is the number of phrases in the phrase list. The second word is unknown, but seems to always contain 0x0100.

If the 3.10 help compiler was used with COMPRESSION=HIGH, an alternate phrase header (ALTPHRASEHDR) is used. It contains one additional field, PhrasesSize, which is simply the amount of space required by all of the offsets and phrases in the phrase list. The reason is that the phrases file, when COMPRESSION=HIGH, is compressed using a simple compression technique. (Next month I'll discuss the compression technique and provide the code for decompressing the |Phrases file.)

Following the phrase header is an array of offsets, followed by the phrases themselves. The offsets are used to determine the start and end of a phrase in the phrase table. There are NumPhrases+1 offsets, where the last offset points to one byte past the letter of the phrase. This is used to calculate the length of the last phrase.

The entire |Phrases file, except for the header, is kept in one contiguous block of memory. When the phrases are decompressed or loaded into memory, all offsets point correctly to the phrases. For a given phrase number, n, the phrase begins at offset[n] and is of the length offset[n+1]-offset[n].

That's It?

Well, no. I said there were a lot of structures, and we've covered only a few of the ten system files. For example, all the actual help text is in the |TOPIC, which I haven't covered yet. I'll have to cover as many as I can next month. I'll also be providing HelpDump, a program that will dump information about the various files in a help file. Also, next month, I'll discuss the compression algorithm used on the |Phrases and |TOPIC file when using the 3.10 help compiler with COMPRESSION=HIGH.

Before wrapping it up, I want to give credit where credit is due. Much of the work here was done with or by Ron Burk, the true expert on WinHelp programming. Several other people found and/or corrected a lot of the information--Lou Grinzo, Carl Burke, and Brian Walker.

As with any undocumented features, this information needs to be used with care. For one thing, it probably isn't 100 percent accurate; it certainly isn't complete. I'd be more than happy to hear from anyone who has some more insights into the WinHelp format.

Table 1

WinHelp internal files.===========================================================================
 Function   Description
===========================================================================
 bmx        Bitmap files, numbered (bm0, bm24, bm12, and so on. Do      not

start with a |)
 |CONTEXT   Context topic table
 |CTXOMAP   Context mapping to topics
 |FONT      Fonts available to help file
 |KWBTREE   Keyword B-tree file
 |KWDATA    Keyword mappings to topic file
 |KWMAP     Map into the KWBTREE for quick access
 |Phrases   A list of phrases used for compression of the |TOPIC file
 |SYSTEM    Contains mostly information from the .HPJ file
 |TOMAP     List of pointers to topics
 |TOPIC     Contains the actual help text (usually compressed)
 |TTLBTREE  Topic titles B-tree
 baggage    Appears under the filename exactly as specified in help      project
===========================================================================

Figure 1: Annotated hex dump of portions of a .HLP file. (a) All .HLP files start with a HELPHEADER. The first long is the .HLP magic number (0x035F3F). The next long is the file offset of the WHIFS header (in Figure 1(b), that's 0x041F); (b) the WHIFS starts off with a WHIFSBTREEHEADER, immediately followed by the WHIFS directory, which contains null-terminated file names followed by the individual WHIFS file's offset within the larger .HLP file. Here, bag.ini is at offset 0x10, |CONTEXT is at 0x0362B3, and |CTXOMAP is at 0x032B02; (c) each internal file begins with FILEHEADER structure, which specifies the file's size both with and without the header, followed by a 0. Here, bag.ini is 0x040F bytes with the header, and 0x0F06 without. The file data itself (evidentally, some kind of initialization file) starts immediately after the header.

===========================================================================
(a)
  D:\MIPS>dump msmail32.hlp -bytes 8
  00000000 | 3F 5F 03 00 1F 04 00 00                         | ?_......
(b)
  D:\MIPS>dump msmail32.hlp -offset 0x041f
0000041f | 2F 04 00 00 26 04 00 00 04 3B 29 02 04 00 04 7A |/...&....;)....z
0000042f | 34 00 00 43 3A 5C 7E 68 63 35 00 09 02 62 6D 00 |4..C:\~hc5...bm.
0000043f | 00 00 00 00 00 FF FF 01 00 01 00 1E 00 00 00 C1 |................
0000044f | 02 1E 00 FF FF FF FF 62 61 67 2E 69 6E 69 00 10 |.......bag.ini..
0000045f | 00 00 00 7C 43 4F 4E 54 45 58 54 00 B3 62 03 00 |...|CONTEXT..b..
0000046f | 7C 43 54 58 4F 4D 41 50 00 02 2B 03 00 7C 46 4F ||CTXOMAP..+..|FO
; ... etc. ...
(c)
  D:\MIPS>dump d:\mips\msmail32.hlp -offset 0x10
00000010 | 0F 04 00 00 06 04 00 00 00 0D 0A 5B 62 61 67 2E |...........[bag.
00000020 | 69 6E 69 5D 0D 0A 67 72 6F 75 70 63 6F 75 6E 74 |ini]..groupcount
00000030 | 3D 31 34 0D 0A 67 72 6F 75 70 31 3D 42 61 63 6B |=14..group1=Back
00000040 | 75 70 0D 0A 67 72 6F 75 70 32 3D 43 6C 69 70 62 |up..group2=Clipb
; ... etc. ...
===========================================================================

Figure 2: HELPDIR output for the .HLP file hex dumped in Figure 1.

===========================================================================
  D:\MIPS>c:\ddj\helpdir msmail32.hlp
  bag.ini                 0x00000010
  |CONTEXT                0x000362B3
  |CTXOMAP                0x00032B02
  |FONT                   0x000327D2
  |KWBTREE                0x00033255
  |KWDATA                 0x00032ED5
  |KWMAP                  0x0003323E
  |SYSTEM                 0x0000084E
  |TOPIC                  0x00000A53
  |TTLBTREE               0x00034A84
  |bm0                    0x00037AE2
  ; ... etc. ...
===========================================================================

Figure 3: Selected macros in Microsoft Cinemania and the MSDN

CD-ROM, as displayed by WHMACROS.
===========================================================================
C:\DDJ>dir d:\content\*.mvb
CINMANIA MVB 139104719 08-18-92  12:00a
C:\DDJ>whmacros d:\content\cinmania.mvb
RegisterRoutine("ftui","InitRoutines","SU")
RegisterRoutine("ftui","ExecFullTextSearch","USSS")
; ...
InitRoutines(qchPath,1)
; ...
CreateButton("ftSearch", "&Search", \
    "ExecFullTextSearch(hwndApp, qchPath, `', `')")
; ...
C:\DDJ>dir d:\*.mvb
MSDNCD   MVB 270353088 04-05-93   5:58p
C:\DDJ>whmacros d:\msdncd.mvb
RegisterRoutine("msdncd", "Navigator", "USS")
Navigator(hwndApp, "Load", qchPath)
; ...
CreateButton("btn_prv","<<I&ndex","Navigator(hwndApp,\"Prev\",\"\")")
CreateButton("btn_nxt","Inde&x>>","Navigator(hwndApp,\"Next\",\"\")")
===========================================================================


_UNDOCUMENTED CORNER_
edited by Andrew Schulman
written by Pete Davis

[LISTING ONE]

/* WHSTRUCT.H--Windows Help File Internal Records--Pete Davis and Ron Burk,
   June 1993. See "Undocumented Corner," DDJ, September 1993  */

typedef unsigned long   DWORD;
typedef unsigned int    WORD;
typedef unsigned char   BYTE;

#define HELP_MAGIC      0x00035F3FL

/* Help file Header record */
typedef struct HELPHEADER {
    DWORD   MagicNumber;      /* 0x00035F3F                */
    long    WHIFS;            /* File offset of WHIFS header   */
    long    Negative1;
    long    FileSize;         /* Size of entire .HLP File  */
}   HELPHEADER;
/* File Header for WHIFS files */
typedef struct FILEHEADER {
    long    FilePlusHeader;  /* File size including this header */
    long    FileSize;        /* File size not including header  */
    char    TermNull;
}   FILEHEADER;
/* Help Directory BTREE */
typedef struct WHIFSBTREEHEADER {
    char    Magic[18];      /* Not exactly magic for some .MVB files   */
    char    Garbage[13];
    int     MustBeZero;     /* Probably shows up when Help > ~40 megs  */
    int     NSplits;        /* Number of page split Btree has suffered */
    int     RootPage;       /* Page # of root page                     */
    int     MustBeNegOne;   /* Probably shows up when B-Tree is HUGE!! */
    int     TotalPages;     /* total # to 2Kb pages in Btree           */
    int     NLevels;        /* Number of levels in this Btree          */
    DWORD   TotalWHIFSEntries;
}   WHIFSBTREEHEADER;
/* Modified B-Tree Node header to handle a pointer to the page */
typedef struct BTREENODEHEADER {
    WORD    Signature;      /* Signature word            */
    int     NEntries;       /* Number of entries         */
    int     PreviousPage;   /* Index of Previous Page    */
    int     NextPage;       /* Index of Next Page        */
    char    *BTData;        /* Pointer to B-Tree's data  */
}   BTREENODEHEADER;
/* Modified B-Tree Index header to handle a pointer to the page */
typedef struct BTREEINDEXHEADER {
    WORD    Signature;      /* Signature word            */
    int     NEntries;       /* Number of entries in node */
    char    *IdxData;
}   BTREEINDEXHEADER;
/* Phrase header for uncompressed |Phrases file */
typedef struct PHRASEHDR    {
    int     NumPhrases;   /* Number of phrases in table                    */
    WORD    OneHundred;   /* 0x0100                                        */
} PHRASEHDR;
/* Phrase header for compressed |Phrases file */
typedef struct ALTPHRASEHDR    {
    int     NumPhrases;   /* Number of phrases in table                    */
    WORD    OneHundred;   /* 0x0100                                        */
    long    PhrasesSize;  /* Amount of space uncompressed phrases requires */
} ALTPHRASEHDR;
/* Flags for |SYSTEM header Flags field below:  Unfortunately, none of these
   flags are particularly solid. The 0x0004 works MOST of the time. Another
   flag, 0x0008, appears both in Win32 .HLP files, and in files with Phrase
   compression but without LZ77 compression. */
#define NO_COMPRESSION_310      0x0000
#define COMPRESSION_310         0x0004
#define SYSFLAG_300             0x000A
/* Header for |SYSTEM file */
typedef struct SYSTEMHEADER {
    BYTE    Magic;     /* 0x6C                  */
    BYTE    Version;   /* Version #             */
    BYTE    Revision;  /* Revision code         */
    BYTE    Always0;   /* Unknown               */
    WORD    Always1;   /* Always 0x0001         */
    DWORD   GenDate;   /* Date/Time that the help file was generated    */
    WORD    Flags;     /* Values seen: 0x0000 0x0004, 0x0008, 0x000A    */
    } SYSTEMHEADER;
/* Types for SYSTEMREC RecordType below:  note that other record types,
   such as 0x0A, 0x0B, 0x0C, 0x0D, shown up in the large .MVB files used
   by the MSDN CD-ROM and Cinemania products. */
#define HPJ_TITLE       0x0001      /* Title from .HPJ file            */
#define HPJ_COPYRIGHT   0x0002      /* Copyright notice from .HPJ file */
#define HPJ_CONTENTS    0x0003      /* Contents=  from .HPJ            */
#define MACRO_DATA      0x0004      /* RData = 4 nulls if no macros    */
#define ICON_DATA       0x0005      /* Data for Icon                   */
#define HPJ_SECWINDOWS  0x0006      /* Secondary window info in .HPJ   */
#define HPJ_CITATION    0x0008      /* Citation= under [OPTIONS]       */
/* Secondary Window Record following type 0x0006 System Record */
typedef struct SECWINDOW {
    WORD    Flags;          /* Flags (See Below)        */
    BYTE    Type[10];       /* Type of window           */
    BYTE    Name[9];        /* Window name              */
    BYTE    Caption[51];    /* Caption for window       */
    WORD    X;              /* X coordinate to start at */
    WORD    Y;              /* Y coordinate to start at */
    WORD    Width;          /* Width to create for      */
    WORD    Height;         /* Height to create for     */
    WORD    Maximize;       /* Maximize flag            */
    BYTE    Rgb[3];         /* RGB for background       */
    BYTE    Unknown1;       /* No known use             */
    BYTE    RgbNsr[3];      /* RGB for non scrollable region */
    BYTE    Unknown2;       /* No known use             */
} SECWINDOW;
/* Values for Secondary Window Flags */
#define WSYSFLAG_TYPE       0x0001  /* Type is valid        */
#define WSYSFLAG_NAME       0x0002  /* Name is valid        */
#define WSYSFLAG_CAPTION    0x0004  /* Ccaption is valid    */
#define WSYSFLAG_X          0x0008  /* X    is valid        */
#define WSYSFLAG_Y          0x0010  /* Y    is valid        */
#define WSYSFLAG_WIDTH      0x0020  /* Width    is valid    */
#define WSYSFLAG_HEIGHT     0x0040  /* Height   is valid    */
#define WSYSFLAG_MAXIMIZE   0x0080  /* Maximize is valid    */
#define WSYSFLAG_RGB        0x0100  /* Rgb  is valid        */
#define WSYSFLAG_RGBNSR     0x0200  /* RgbNsr   is valid    */
#define WSYSFLAG_TOP        0x0400  /* On top was set in HPJ file */
/* Help Compiler 3.1 System record. Multiple records possible */
typedef struct SYSTEMREC {
    WORD    RecordType;   /* Type of Data in record      */
    WORD    DataSize;     /* Size of RData               */
    char   *RData;        /* Raw data (Icon, title, etc) */
    } SYSTEMREC;
/* Header for |TOMAP file */
typedef struct TOMAPHEADER {
    long    IndexTopic;   /* Index topic for help file */
    long    Reserved[15];
    int     ToMapLen;     /* Number of topic pointers  */
    long    *TopicPtr;    /* Pointer to all the topics */
    } TOMAPHEADER;


[LISTING TWO]
<a name="029e_0015">

/* HELPDIR.C -- List all internal files with a Windows .HLP file.
   WHIFS = Windows Help Internal File System -- Pete Davis, June 1993
   bcc helpdir.c
   See "Undocumented Corner," DDJ, September 1993 */
#pragma pack(1)
#include <conio.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "whstruct.h"

#define PAGE_SIZE       1024L        /* 1k pages -- must be long! */

void fail(const char *s) { puts(s); exit(1); }

int main(int argc, char *argv[]) {
   HELPHEADER         HelpHdr;
   WHIFSBTREEHEADER   WHIFSHdr;
   BTREENODEHEADER    WHIFSNode;
   int                file, aPage, c;
   long               WHIFSStart, FileOffset;
   FILE               *HelpFile;

   if ((HelpFile=fopen(argv[1], "rb")) == NULL)
       fail("can't open file");
   /* Get Help header, go to WHIFS and get WHIFS Header */
   fread(&HelpHdr, sizeof(HelpHdr), 1, HelpFile);
   if (HelpHdr.MagicNumber != HELP_MAGIC)
       fail("not a Windows help file");
   fseek(HelpFile, HelpHdr.WHIFS, SEEK_SET);
   fread(&WHIFSHdr, sizeof(WHIFSHdr), 1, HelpFile);
   /* WHIFS starts after the WHIFSHdr */
   WHIFSStart = HelpHdr.WHIFS + sizeof(WHIFSHdr);
   file=1;
   /* Goto WHIFS Root */
   fseek(HelpFile, WHIFSStart + (PAGE_SIZE * WHIFSHdr.RootPage), SEEK_SET);
   /* Find the first leaf node */
   while (file < WHIFSHdr.NLevels) {
       /* if it's not a leaf, we don't need last 2 fields */
       fread(&WHIFSNode, 4, 1, HelpFile);
       /* Find page pointer to first node in index */
       fread(&aPage, sizeof(int), 1, HelpFile);
       fseek(HelpFile, WHIFSStart + (PAGE_SIZE * aPage), SEEK_SET);
       file++;
   }
#ifdef DO_MACROS
{
    extern void do_macros(FILE *HelpFile, long WHIFSStart);
    do_macros(HelpFile, WHIFSStart);
}
#else
   /* Go through linked list of leaf nodes */
   for (;;) {
       if (! fread(&WHIFSNode, sizeof(WHIFSNode)-2, 1, HelpFile))
           break;
       /* List all entries in node */
       for (file = 1; file <= WHIFSNode.NEntries; file ++) {
          while (c = fgetc(HelpFile))
               putchar(c);
          fread(&FileOffset, sizeof(FileOffset), 1, HelpFile);
          printf("  \t0x%08lX\n", FileOffset);
       }
       if (WHIFSNode.NextPage == -1)
          break;
      else
          fseek(HelpFile,WHIFSStart+(WHIFSNode.NextPage*PAGE_SIZE),SEEK_SET);
   }
#endif
   return 1;
}


<a name="029e_0016"><a name="029e_0017">
[LISTING THREE]
<a name="029e_0017">

/* WHMACROS.C -- Get macros from a .HLP file. Used by HELPDIR.C if #define
   DO_MACROS -- Pete Davis and Andrew Schulman,
   bcc -DDO_MACROS whmacros.c helpdir.c
   See "Undocumented Corner," DDJ, September 1993 */

#pragma pack(1)
#include <conio.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include "whstruct.h"

extern void fail(const char *s);

#define PAGE_SIZE       1024L        /* 1k pages -- must be long! */

void do_macros(FILE *HelpFile, long WHIFSStart)
{
   BTREENODEHEADER  WHIFSNode;
   SYSTEMHEADER     SystemHdr;
   SYSTEMREC        SystemRec;
   FILEHEADER       FileHdr;
   long             SystemOffset=0, FileOffset, FileStart;
   char             filename[20], *data;
   int              *Offsets;
   int              c, i, file, txt;
   /* Find the System file. */
   do {
       fread(&WHIFSNode, sizeof(WHIFSNode) - 2, 1, HelpFile);
       /* Search all entries in node */
       for (file = 1; file <= WHIFSNode.NEntries; file ++) {
          i = 0;
          while ( c = fgetc(HelpFile) )
                filename[i++]=c;
          filename[i] = 0;
          fread(&FileOffset, sizeof(FileOffset), 1, HelpFile);
          if (strcmp(filename, "|SYSTEM") == 0) {
              SystemOffset = FileOffset;
              break;
          }
       }
       if (WHIFSNode.NextPage != -1)
          fseek(HelpFile, WHIFSStart + (WHIFSNode.NextPage * PAGE_SIZE),
              SEEK_SET);
   } while (WHIFSNode.NextPage != -1);
    if (! SystemOffset)
        fail("Can't locate |SYSTEM file");
   /* Get System header */
   fseek(HelpFile, SystemOffset, SEEK_SET);
   fread(&FileHdr, sizeof(FileHdr), 1, HelpFile);
   fread(&SystemHdr, sizeof(SystemHdr), 1, HelpFile);

   FileStart = SystemOffset + sizeof(FileHdr) + sizeof(SystemHdr);
   FileOffset = 0;
   while (FileOffset < FileHdr.FileSize)    {
       fseek(HelpFile, FileStart + FileOffset, SEEK_SET);
       fread(&SystemRec, sizeof(SystemRec)-1, 1, HelpFile);
       FileOffset += (sizeof(SystemRec) + SystemRec.DataSize - 1);
       if (SystemRec.RecordType == MACRO_DATA)  {
           if (! (data = (char *) malloc(SystemRec.DataSize+1)))
               fail("insufficient memory");
           fread(data, SystemRec.DataSize, 1, HelpFile);
           data[SystemRec.DataSize] = '\0';
           printf("%s\n\n", data);
           free(data);
       }
   }
}








Copyright © 1993, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.