Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Indexing and Microsoft Visual C++ 5.0


Dr. Dobb's Journal October 1997: Indexing and Microsoft Visual C++ 5.0

Al, a DDJ contributing editor, can be contacted at [email protected].


I am writing the first words of this column on the Fourth of July. Judy and I are heading north in the RV for a few weeks away from the grind. The radio just announced the passing on this day of journalist Charles Kuralt, sad news for Americans everywhere. Charles Kuralt had the gift of specialization. He would visit an ordinary place, talk with ordinary people, and uncover something special about the time, the place, and the inhabitants. His reports always made us wish that we were in those special places with those special people. When he reported from extraordinary places -- Independence Hall, Monticello, the Alaskan wilderness -- his vivid accounts gave life and meaning to what had been for many of us little more than long closed pages in long forgotten textbooks. Kuralt's legacy to Americans is an appreciation and recognition of the special properties of ordinary people and places. His legacy to journalists is the realization that the fundamental decency in people and the beauty of the earth are worth reporting.

Delphi Evangelists

A few months back, I described in two successive columns my first impressions of C++Builder and Optima++, two C++-based Win32 visual development environments. C++Builder is based on the Delphi engine, and that particular column generated a lot of mail from Delphi programmers. I concluded from those messages that Delphi is more a culture than a programming language. Its devotees were of a single, evangelical mind. They pleaded, cajoled, insisted, and downright demanded that I give Delphi more of a chance. They extolled its virtues and superiority over anything with a C++ flavor. They related experiences wherein Delphi saved projects from certain failure. They told tales of C++ programmers behind schedule, under stress, near revolt, suicide, or worse, saved from certain doom by a last-minute, in-the-nick-of-time paradigm shift to Delphi. They gently accused me of harboring and promoting a Delphi prejudice. They wondered why I don't write more (and more positively) in my column about Delphi, this panacea, this answer to a maiden's prayer, this solution to the ills of a mostly failure-prone industry. I fully expected a contingent of Delphi programmers to show up on my doorstep with little pamphlets for me to read, quoting chapter and verse from Wirth and Duntemann.

Although I believe in the sincerity and substance in all the readers' endorsements, the answer to their plaintive pleas is found, of course, at the top of this page. This is the "C Programming" column. My beat covers things related to C and its derivatives. They ain't paying me to write about Actor, Ada, Adabase Natural, Algol, APL, ASM, Autocoder, Basic, Cobol, Forth, Fortran, LISP, Macro-11, MUMPS, Pascal, PL/1, Simula, Snobol, SOAP II, or any other languages, extant or otherwise, that languish on my faded résumé or in my fading memory. I know C and C++ well enough to cover my beat fairly well. I also know them well enough to write programs on time, within budget, and responsive to the users' needs. Anyone can be just as productive with C++ if they care to take the time to learn it properly, or so says I. C++ as currently implemented serves me quite well, and, until there are native Java compilers with comparable development environments and application frameworks and generic container class libraries, I, as a working programmer, see no compelling reason to change languages. Maybe not even then.

All of those who wrote overlooked my report that a colleague credits Delphi with saving his product and his company. I thought that was sufficient endorsement for this forum. Apparently not. Therefore, to conciliate those readers who have found in Delphi an answer to their needs and are feeling left out, ignored, or otherwise neglected on these pages, I hereby heartily encourage everyone to give Delphi a try, develop a preference, and use what works best for you. There. How's that?

The Indexer

Every good technical book has an index. Most bad ones do, too. The quality of the index is as much a measure of the quality of the book as its content. The greatest, most scholarly treatise on a subject has little value if you can't find it when you need it.

Where does the index come from? When you write a book, you send an electronic manuscript, usually a word processing file, to the publisher. The publisher turns your raw material into a book. They do many things to reach that end, including hiring someone to read the book in its final page-proof format and generate an index. If you are lucky, the indexer understands the technical content well enough to do a good job. More often than not, the indexer is not a technical person and might not appreciate the significance of your explanation of some arcane topic. Browse the indexes of many computer programming books and virtually all programmers' tools documentation, and you'll see what I mean.

For a good-sized computer book, the publisher pays this usually unqualified indexer person as much as $1500. Here's the catch: The author pays that fee. The publisher deducts the indexer's fee from the author's royalties. Therefore, there are two good reasons to write your own index: first, to ensure the technical quality of the index, and second, to keep your royalties for yourself.

When I wrote my first book, I was paranoid about the publisher's contract, as every new author should be. Consequently, I read the contract word for word and gasped when I realized that if I did not write an index, someone else would get some of my royalties. Can't let that happen. I vowed to learn to index a book.

Today, most word processors generate an automatic index if you embed hidden, encoded index phrases on the same page with the text that the index entry references. Microsoft Word does that, gives you a choice of several index formats, and rebuilds the index whenever it repaginates the document. The problem with this approach is that the page numbers of your manuscript do not coincide with the page numbers of the final page proofs after the publisher has completed the layout. To properly index a book, you must go through the publisher's page proofs page by page, record each index phrase and its page number, sort those records by the index phrase, reconcile different ways that the same phrase might be represented, merge all common index phrases into single entries with all the page numbers listed, and print the final index. Many professional indexers perform this process manually with 3×5 cards. No way, hose-A.

For years, I've used a simple pair of home-grown, custom MS-DOS filter programs and the MS-DOS SORT filter to generate an index. First, I use a text editor to build a text file with an index phrase on each line. Token text lines with page numbers separate the phrases. Example 1 is a small sample of the start of such a file. The entries with double backslashes represent entries that are indexed twice -- once under a heading and a second time under their own name. The "functions\\printf" entry is an example. Entries with a single backslash, such as "operators\+" are indexed only once, under the heading. The first filter program reads the text file and creates a file with the text phrases and their page numbers appended. I sort this file with the MS-DOS SORT filter. Example 2 is the sorted index database. Then the second filter program creates the index in a format that the publisher can print as Example 3 shows. This procedure works well, but it lacks the smooth seamless integration that contemporary GUI software applications have. When Judy used this cobbled indexing application for one of her projects and complained about its cryptic command-line user interface, I decided to rewrite the indexer application as a Windows 95 program. I'm publishing that program here so that other authors can use it and to encourage qualified programmers -- the readers of this column, for example -- to go after some of those well-paid assignments indexing the books of others. Call a publisher of technical books, ask to speak to a development editor, and offer your services. Tell 'em Al sent you.

One of my objectives in the new indexer program was to retain the format of the original text file so that my old index databases would work with the new program. This approach helps me reuse earlier work with newer editions of a book.

The Indexer program is a single-document-interface (SDI) MFC application with a modified text editor control. The derived document class reads the text file and substitutes page-separator text lines for the page tokens. The derived view class keeps the cursor in the left margin of those page separators and handles the details of pagination -- inserting, deleting, and renumbering page breaks.

Most of the Indexer's processing is done by the derived view class. It's not always clear in an MFC application where you should put different parts of the processing. Any menu or toolbar command can be sent to the application, frame, document, and view classes, and the programming model does not suggest where different kinds of processes belong. Presumably, processes that deal with the document and its data belong in the document class, and processes that manage the document's display belong in the view class. But applications like the Indexer have a single view and store the document data directly in a view control. That arrangement tends to make you want to put document processing functions in the view class rather than the document class if only to avoid figuring out from within the document class which view class object receives data changes.

Consequently, most of the document processing is in the CindexerView class. IndexerView.h and IndexerView.cpp implement the class. You can get the entire project's source code electronically; see "Availability," page 3.

The EditLine Class

The EditLine class in IndexerView.cpp encapsulates a few things common to the application's use of a line of text in the index's text file. First, the MFC edit classes do not provide a simple way to retrieve a line of text given the line number. First, you get the character index of the first character of the line by calling the CRichEditCtrl::LineIndex function with the line number as an argument. Then you ask the length of the line by calling the CRichEditCtrl::LineLength function with the character index as an argument. Next you allocate a buffer long enough to retrieve the text line. Then you retrieve the line into the buffer by calling the CRichEditCtrl::GetLine function with the line number, the buffer address, and the length as arguments. Finally, after processing the line of text, you return the allocated buffer to the heap. They sure could have simplified such a routine procedure. This series of steps is not restricted to the CRichEditCtrl control class. The CEdit control class works the same way.

I found it simpler to build an EditLine class to encapsulate retrieving an editor text line. The constructor takes as arguments a reference to the CRichEditCtrl object and the line number. It allocates the buffer and does all the other stuff. The CRichEditCtrl::GetLine function uses the first four bytes of the buffer to store the length before it retrieves the text, which overwrites the length data. Consequently, the buffer has to be at least four bytes long, no matter how long the text is. That peculiar behavior cost me several hours of debugging when I allocated three bytes for a two-byte text line. The heap was contaminated and the symptom was a peculiar exception thrown from the depths of the Win32 API that gave no indication of its cause.

The EditLine destructor gives the buffer back to the heap. I used the class to encapsulate two other text-line related operations -- one to determine if the line is empty or consists of white space only and another to retrieve the page number from a text line that is assumed to be one of the indexer's page breaks.

CEdit versus CRichEditCtrl

I used the CRichEditCtrl control class rather than the CEdit class for two reasons. First, I wanted to display the page breaks as blue line separators with the page numbers in the horizontal center of the line. The CRichEditCtrl makes it easy to change fonts, colors, and so on. The CIndexerView::InsertPageBreak function shows how that works. Second, the CEdit class behaves in peculiar ways when the text reaches a certain size somewhere around 32K and you try to insert more text. Actually, the behavior is in the underlying Win32 EDITBOX control mechanism. The API simply refuses to insert the text. I verified the problem by trying to insert some text into the text file with the Windows 95 Notepad applet, which is little more than an EDITBOX control with a menu on top. Same effect.

CProgressDlg and Progress Classes

The CRichEditCtrl class has one serious drawback: The loading and saving of CRichEditCtrl objects with text is very slow. Maybe this is because of the way rich-text format encodes text attributes, but it's more likely that not enough care went into the CRichEditCtrl implementation in Redmond. The input serialization of the indexer's text file, which occurs in CIndexerView::Serialize, reads the text a line at a time from the CArchive object, translates page-separator token lines into the blue separator lines that the view displays, and adds each line to the CRichEditCtrl object. For a large index, this process takes quite some time, even on a contemporary, fast PC. I'm willing to put up with that performance behavior, but at first I thought my program was locked up. I decided to implement some so-called "human factors" or "ergonomics" or whatever you call it when you provide user feedback instead of optimizing the program. I decided to put one of those little progress bars on the indexer application's status bar to indicate whenever the program was loading, saving, or paginating an index. Just like Microsoft Word. Listings One and Two are ProgressDlg.h and ProgressDlg.cpp, the files that implement this feature. The CStatusBar control that MFC provides does not support such progress bars, but I figured out how to piece one together.

The CProgressDlg class is derived from the CDialog class and implements a frameless, textless dialog box wide enough to contain its only control, a CProgressCtrl object, and short enough to fit inside the application's status bar. Its overridden OnCreate member function sets the dialog's position based on the position of the CStatusBar object associated with the application's main frame window.

The Progress class is used to instantiate and display the CProgressDlg object, which it does only if the application class has completed its initialization. This approach prevents the application from trying to instantiate the progress control before the application has created the main frame and status bar windows, which can happen when the application is started from double-clicking an associated document.

The Progress class constructor creates the CProgressDlg object and sets the text of the status bar's first pane to the constructor's first argument. This technique allows the program to label the operation whose progress is being reported. The destructor resets the pane to blank text.

The Progress class includes SetRange and StepIt member functions to allow the application to manage the progress control's display.

VC++ 5.0

The Indexer project provided an opportunity to take a first look at Visual C++ 5.0. I installed this new version for three reasons. First, I wanted to see how well the new compiler supports ANSI proposed standard features not supported by earlier versions, particularly the template modifications that STL requires. I reported some of those findings last month. Second, I wanted to develop an applet to run on my Windows CE palmtop, and the Windows CE SDK requires VC++ 5.0. Third, I want to evaluate the new, slick user interface in VC++ 5.0's Developer Studio.

The Windows CE SDK

Not much to report here. The Windows CE SDK installs on top of Visual C++ 5.0, but it requires Windows NT to do anything serious. You can develop and download CE programs into the palmtop under Windows 95, but you can't use the palmtop emulator in Developer Studio and you can't do any remote debugging. I haven't tried it, but I'm guessing that the only way to test and debug a program without NT is to use lots of message-box windows for kludged breakpoints and data displays, which is too much like the printf debugger mode we used under DOS before source-level debuggers were available. Tedious, in other words. I'm not ready to make the switch to Windows NT, as much as I'd like to. Several of my hardware and software components (no-brand generic 16X CD-ROM drive, MIRO video capture card, Turtle Beach Multisound, Cakewalk, HP CD-R drive, and so on), all great Windows 95 components, do not work under NT. I'll put the CE SDK off for now.

VC++: Helpless

I do not particularly like VC++ 5.0 and see no reason to switch from 4.2 unless you are into ActiveX development, which I am not, or CE development, which I cannot do without NT. What don't I like? Mainly, it's that new, slick user interface in VC++ 5.0's Developer Studio.

VC++ 5.0 did away with the Help-based on-line documentation and replaced it with a custom Internet Explorer and HTML file formats. Now, every time I start the program or try to get help, I have to tell the system, "No, I do not want you to dial my Internet provider and connect me with some web site at some cryptic URL that I never heard of." There's got to be an option that turns off that stupid feature, but I haven't found it. Maybe the only way is to make that connection once and download (or upload?) whatever it's after, but I don't like features that have no explanation anywhere, particularly when I'm paying for the connect time. If you are on a network all the time, you might not notice this enigmatic behavior. Do you want your computer's innards potentially examined by an undocumented process at an unidentified web site? I sure don't.

By using HTML for the Help database format, they took away a few features. For example, it was easy with the old system to double-click a function name in the help file, copy it to the clipboard, and paste it into a program. It's not as easy now. Double-clicking doesn't select a word in the new HTML help file browser. You've got to drag-click the word, which always seems to get an extra character on one end or the other. Sometimes the selection jumps to one end or the other of a complete phrase. Smart mouse, I guess. And that first window with the help topics doesn't go away when you choose to view one of its topics like it used to. You've got to close that pesky window every time because it's always in the way.

Another thing: This new HTML, custom browser help system is really sloooow! Hey, Microsoft developer persons! Slow is bad. Fast is good. It's not supposed to get slower. It's supposed to get better. Duh. Ask any programmer. How would you like to get a brand new compiler with not many new features other than some marginal ANSI compliance and a new user interface that you have to learn and with a help system so slow that it's painful? How would you like that? Nobody I've talked to ever asked for anything like that.

Yet another thing: Why do the Developer Studio toolbars keep disappearing and reconfiguring themselves? Why don't they stay where I put them? At least five times a day I have to figure out how to restore a toolbar that moved, went away, or lost or gained buttons.

Makes me wonder. Does anyone in Redmond listen to the customers? Do they get real programmers to try this stuff out? I know they have more important things on their minds, but I wish they'd take a little time out from tallying their stock options to build a modicum of quality into the tools they foist on us.

Valley Guy

The Fourth has passed now, and I'm sitting with my laptop finishing this column in the RV, parked beside a barn on the side of a gently sloping hill in a farm valley in central eastern Pennsylvania. Besides my in-laws' farmhouse and outbuildings and those of a neighbor about a mile away, there is no sign of other civilization. No traffic, no rush, no schedules to meet. Just trees, wildflowers, fields, and mountains as far as I can see in every direction. The only sounds are the wind, the birds, and a tractor in a field somewhere nearby. A short walk into the forest at the edge of a cornfield brings me face to face with a doe and her fawns who freeze and stare at me and then bolt into the brush. This is a peaceful place, and I've been coming here for 37 years, one of the benefits of marrying a farmer's daughter. It hasn't changed much in that time. It doesn't need to. Toolbars and publishers and objects and programming languages and paradigm shifts all seem a million miles away. Small farming is the main industry because, although the valley is right in the middle of the coal region, there is no coal here and the strip miners left it alone. It's such an ordinary place. Charles Kuralt would have found it very special indeed.


Listing One

// ProgressDlg.h#if !defined(AFX_PROGRESSDLG_H__49A4AF82_EA2A_11D0_
                                8FB3_00C0A80034F0__INCLUDED_)
#define AFX_PROGRESSDLG_H__49A4AF82_EA2A_11D0_8FB3_
                                00C0A80034F0__INCLUDED_
#if _MSC_VER >= 1000
#pragma once
#endif // _MSC_VER >= 1000


</p>
class CProgressDlg : public CDialog
{
public:
    CProgressDlg(CWnd* pParent = 0);


</p>
    //{{AFX_DATA(CProgressDlg)
    enum { IDD = IDD_PROGRESSDLG };
    CProgressCtrl   m_Progress;
    //}}AFX_DATA
    //{{AFX_VIRTUAL(CProgressDlg)
    protected:
    virtual void DoDataExchange(CDataExchange* pDX);
    //}}AFX_VIRTUAL
    //{{AFX_MSG(CProgressDlg)
    afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct);
    //}}AFX_MSG
    DECLARE_MESSAGE_MAP()
};


</p>
//{{AFX_INSERT_LOCATION}}
#endif // !defined(AFX_PROGRESSDLG_H__49A4AF82_EA2A_11D0_
                                        8FB3_00C0A80034F0__INCLUDED_)

Back to Article

Listing Two

// ProgressDlg.cpp

</p>
#include "stdafx.h"
#include "Indexer.h"
#include "ProgressDlg.h"
#include "MainFrm.h"


</p>
#ifdef _DEBUG
#define new DEBUG_NEW
#undef THIS_FILE
static char THIS_FILE[] = __FILE__;
#endif


</p>
CProgressDlg::CProgressDlg(CWnd* pParent /*=NULL*/)
    : CDialog(CProgressDlg::IDD, pParent)
{
    //{{AFX_DATA_INIT(CProgressDlg)
    //}}AFX_DATA_INIT
}
void CProgressDlg::DoDataExchange(CDataExchange* pDX)
{
    CDialog::DoDataExchange(pDX);
    //{{AFX_DATA_MAP(CProgressDlg)
    DDX_Control(pDX, IDC_PROGRESS, m_Progress);
    //}}AFX_DATA_MAP
}
BEGIN_MESSAGE_MAP(CProgressDlg, CDialog)
    //{{AFX_MSG_MAP(CProgressDlg)
    ON_WM_CREATE()
    //}}AFX_MSG_MAP
END_MESSAGE_MAP()


</p>
int CProgressDlg::OnCreate(LPCREATESTRUCT lpCreateStruct) 
{
    if (CDialog::OnCreate(lpCreateStruct) == -1)
        return -1;
    if (m_hWnd != 0)    {
        // --- get window position of mainframe window
        WINDOWPLACEMENT mfwndpl = { sizeof(WINDOWPLACEMENT) };
        theApp.m_pMainWnd->GetWindowPlacement(&mfwndpl);
        // --- get window position of statusbar
        WINDOWPLACEMENT stwndpl = { sizeof(WINDOWPLACEMENT) };
        ((CMainFrame*)(theApp.m_pMainWnd))->
                           m_wndStatusBar.GetWindowPlacement(&stwndpl);
        int nStatusBarHeight = stwndpl.rcNormalPosition.bottom - 
                               stwndpl.rcNormalPosition.top + 1;
        // --- get window position of progress dialog window
        WINDOWPLACEMENT wndpl = { sizeof(WINDOWPLACEMENT) };
        GetWindowPlacement(&wndpl);
       int nProgressBarHeight = wndpl.rcNormalPosition.bottom - 
                                wndpl.rcNormalPosition.top + 1;
        // --- relocate the progress dialog to the status bar
        const int marginoffset = 75;
        wndpl.rcNormalPosition.right += marginoffset;
        wndpl.rcNormalPosition.left += marginoffset;
        wndpl.rcNormalPosition.top = stwndpl.rcNormalPosition.top + 
                      (nStatusBarHeight - nProgressBarHeight) / 2 + 2;
        wndpl.rcNormalPosition.bottom = wndpl.rcNormalPosition.top + 
                       nProgressBarHeight - 1;
        SetWindowPlacement(&wndpl);
        ShowWindow(SW_SHOW);
    }
    return 0;
}

Back to Article

DDJ


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.