Speeding up database searches
Steve develops e-commerce software for the Information Architects Corporation in Charlotte, NC. He can be reached at firstname.lastname@example.org.
Presenting large amounts of data in an efficient and intuitive manner has always been a challenge for Windows developers. Although Windows provides the listbox control to handle multiple data items, listboxes often fall short when dealing with large quantities of information. These shortcomings are due to the way listboxes manage their entries. For example, they use scrollbars to navigate through data items, and for scrolling to work properly, the list must contain all of the relevant data. In other words, dragging the scrollbar thumb to the bottom of the scrollbar should display the last item in the list. Of course, this proper scrolling behavior requires loading the listbox with the entire dataset. Unfortunately, there are many cases where loading an entire dataset into a listbox just isn't feasible. For example, your dataset might consist of a phone list with more than 100,000 entries. Loading such a large list isn't practical because it's time consuming and memory intensive.
In this article I present CVListCtrl, a class that works in conjunction with the listview control to provide a virtual list implementation. Unlike normal lists, a virtual list contains only a handful of visible items at any one time. It relies on callback routines to retrieve additional items from a secondary storage as needed. The complete source code and related files for CVListCtrl are available electronically (see "Resource Center," page 5).
The Listview Control: An Improved Listbox
The listview control, which was first introduced with Windows 95, offered many improvements over standard listboxes. For example, report-style listviews present data in a manner similar to listboxes, but with the addition of multiple, resizable columns. Because listviews offer many enhancements over listboxes, developers often choose to use listviews in situations where they would have previously used a listbox.
Although the listview control introduced with Windows 95 offered many advanced features not found in listboxes, it still suffered from some of the same weaknesses as listboxes when dealing with large amounts of data. In other words, scrolling through data still required loading the entire dataset into the listview control. But, the version of the listview control first introduced with Internet Explorer 3.0 provided a new feature designed specifically to address these weaknesses.
Unlike its predecessors, the IE 3.0 version of the listview control implements a new virtual mode. This virtual mode lets the listview present a handful of visible data items, while paging additional data items into the visible area as needed. It also manages the listview's scrollbar so its position correctly reflects the data's location within the dataset, even though the entire dataset isn't actually loaded in the listview. For all intents and purposes, virtual listviews appear exactly the same to users as a normal listview. For example, a listview that contains 5000 data items lets users scroll or page through all 5000 items as if all of the data is present in the list. However, unlike normal listviews, a virtual listview contains only a handful of currently visible data items in memory at any one time. It retrieves additional data items as they're made visible through the user's navigation. It retrieves these additional data items by sending a message to the listview's owner. This message requests a specific list item, based on a list entry's index. It's up to the application to associate the index with the proper information and return this information to the caller.
The process of creating a virtual listview begins in much the same way as creating a normal listview. First, you need to add the listview control to your dialog. The next step is to change the listview's style to indicate that it is a virtual listview. If you're using Visual C++ 6.0, you can change the listview's style by checking the Owner Data checkbox in its property sheet. If you're using an older version of Visual C++ or another vendor's compiler, you'll most likely need to modify the listview's properties in the resource file. This is done by ORing in the LVS_OWNERDATA listview style.
Once you've created a virtual list control, the next step is to set the total count of entries that the virtual list contains. This step is necessary so the virtual list can properly set the range of scrollbars and so it can properly retrieve each item of information through callbacks. The total item count is set by sending the virtual list an LVM_SETITEMCOUNT message. My CVListCtrl class (see vlist.cpp, available electronically) contains the SetItemCount() method for accomplishing this task. SetItemCount takes one parameter that represents the total number of entries in the virtual list. If you take a look at my sample code (see maindlg.cpp), you'll see that I used a database call to set my virtual list's total count. Since my sample program uses a DAO database, I obtained the total entry count by calling CDaoRecordset::GetRecordCount(). This call returns the total number of records in the specified dataset.
Once you've set the virtual list's entry count, it's time to begin the loading process. Again, one major advantage of a virtual list is that it loads only the handful of visible items. For example, my sample program uses a database that contains 10,000 records, but the sample's list control contains only nine visible items. The additional entries are loaded by the virtual list control as needed. These additional entries are requested in the form of an LVN_GETDISPINFO message, which is sent to the owner of the listview control. The LVN_GETDISPINFO message is a standard WM_NOTIFY style message, whose first parameter is a pointer to an LV_DISPINFO structure. LV_DISPINFO contains several items of information that specify the type of information requested. It also contains several data fields, which receive the returned information. The mask field contains flags that indicate the type of information to return. For example, a mask value LVIF_TEXT instructs the handler to return the textual portion of a list entry.
My virtual list implementation uses the CVListData data provider class (see vlist.cpp) for handling the LVN_DISPINFO message. You'll need to override the data provider class's OnGetDispInfo method to provide your own data handling capabilities. In addition to intercepting LVN_DISPINFO, OnGetDispInfo also breaks down each of the LV_DISPINFO structure entries into a series of parameters. Before using the data provider class, you'll need to associate it with a virtual listview. You can do this by calling CVListCtrl::SetDataProvider() and supplying a pointer to your data-provider class. You can see an example of setting up a data provider by looking at the CVLDlg::OnInitDialog() (see maindlg.cpp) in my sample program. My sample implements the data provider as a mix-in class to the main dialog class (see maindlg.h). If you look closely at the sample's main dialog class, you'll see that it's derived from both CDialog and CVListData. Because this is the case, my sample program sets the data provider simply by supplying the this pointer as the parameter to SetDataProvider.
The main dialog class also contains an overridden OnGetDispInfo() method (see maindlg.cpp). This method is called each time the virtual list wishes to load an entry. OnGetDispInfo() provides several parameters that are used to load an item. The first parameter is the control ID of the list control. For example, this value might be something like IDC_LIST. It's basically the control ID that you set when you added the list control to your dialog. In cases where your dialog contains only one listview, this value may not be of much use. But, it's useful when your dialog contains two or more virtual lists since it lets you load data for multiple virtual lists without requiring separate OnGetDispInfo() handlers for each.
Paint by Numbers
The iItem parameter of OnGetDispInfo() contains information that's critical for properly loading virtual list entries. This parameter contains the item number of the requested list entry. In other words, iItem indicates which list item the virtual listview wishes to load. For example, in my sample program, iItem ranges from 0 to 9999, because the sample database contains 10,000 entries.
Using index numbers to request list entries works especially well when it's used in conjunction with a database. If you look at things from a database's perspective, a list index corresponds very well to a record number (or record position) in a database. In DAO, you can define a dataset by calling the CDaoRecordSet::Open() method and optionally providing a SQL-style record filter. Once this is done, you have a collection of records, where each record is assigned a record number ranging from 0 to the value returned by CDaoRecordset::GetRecordCount().
The last remaining requirement is to position the database's cursor to the record that corresponds to the specified list index. This is done using the CDaoRecordset::SetAbsolutePosition() call. SetAbsolutePosition() takes one parameter, the record number where you'd like to position the data cursor. For example, if you wish to retrieve the first record in the dataset, you'd call SetAbsolutePosition() with a value of 0.
Another caveat with virtual listviews is that they don't provide sorting capabilities like their nonvirtual counterparts. Fortunately, this isn't a major limitation if you're using a database in conjunction with a virtual list. That's because you can use the database's built-in filtering and sorting capabilities to control the order of entries. For example, my sample program sets the CDaoRecordset::m_strSort member to [Item Text] ASC so that my sample data is sorted in ascending order based on the Item Text field. Likewise, you can also use the m_strFilter member if you wish to filter the data included in the virtual list.
Cache Is Better
Although virtual lists provide performance gains when loading large amounts of data, these performance gains are negated somewhat by frequent calls to SetAbsolutePosition() and loading of records. That's because the SetAbsolutePosition() record requires a lot of overhead. Ideally, it's much more efficient to call SetAbsolutePosition only once, then load a block consisting of several records. Fortunately, virtual listviews offer some assistance with accomplishing this task by providing the LVN_OCACHEHINT message. LVN_ODCACHEHINT requests a block of entries that the virtual listview wishes to load. For example, it could request to load all of the visible items in the list at one time, rather than positioning and loading them one at a time.
My CVLData class provides a virtual method, OnCacheHint(), to handle the LVN_ODCACHEHINT message. OnCacheHint supplies the ID of the list and the block's starting and ending index numbers. The sample program's OnCacheHint() method (see maindlg.cpp) loads a block of database records by first calling SetAbsolutePosition() to position to the starting index. Next, it loads the current record. Then it positions to the next record using the CDaoRecordset::MoveNext() method. The MoveNext() method requires much less overhead for positioning data cursors than SetAbsolutePostion().
The sample program stores cached items in the m_arrCachedData array (see maindlg.h). It also uses two data members, m_nCacheStart and m_nCacheEnd, to keep track of the current starting and ending indices for cached items. Each time OnGetDispInfo() is called to load an entry, it first checks to see if the requested list item is already available in the cache. If it is available, OnGetDispInfo() returns the information from the cache. If the requested item isn't in the cache, OnGetDispInfo() loads it directly from the database.
Searching for entries is another important capability of listviews. Virtual listviews provide this capability through the LVN_FINDITEM message. The lParam of this message contains a partial string entered by users. The return value is the index of the matching item. The CVLData data provider contains the OnFindItem() virtual method for handling the LVN_FINDITEM message. The first parameter is the ID of the virtual listview. The second parameter is the string that the user entered.
It's important to keep in mind that the string in the match request may be a partial string. In other words, users may have entered the letters "Sm" expecting to position to an entry entitled "Smith." You can see this in action in the included sample program by entering a partial text string in the Search field at the bottom of the screen. If you enter the letter "Z," then press the Search button, you're positioned to the first entry in the list that begins with the letter Z.
The sample program's FindPartial() method (see maindlg.cpp) is responsible for locating partial matches. It does so by using the SQL LIKE operator. The LIKE operator instructs the database to return partial matches, beginning with the specified characters. For example, the statement: SELECT * FROM [Data] WHERE [Item Text] LIKE 'Sm*' instructs the database to return the first record that begins with the string Sm.
If a matching record is found, FindPartialMatch() obtains the record's index value by calling CDaoRecordset::GetAbsolutePosition(). The value returned by GetAbsolutePosition() corresponds to the requested entry's index value. FindPartialMatch()returns this index value that is eventually returned in response to the LVN_FINDITEM message. The virtual list then loads the specified item's information by sending an LVN_GETDISPINFO request for the returned index.
After adding advanced features to my virtual list, such as caching and searching capabilities, I felt that it was pretty much "feature complete." One last minor issue was to select an initial entry when the list is first displayed. With normal listviews, this generally isn't a big deal. It's simply a matter of setting a particular item's state to selected. Unfortunately, that didn't turn out to be so simple with a virtual listview. I used the normal listview approach of setting an item's selected state, but much to my dismay the selection never appeared. After much struggle, I eventually came to the conclusion that programmatic selection must not work properly for virtual listviews. But, it was very interesting to note that although selecting an entry through code didn't work properly, selecting it with a mouse always yielded the desired results. Having spent many years developing Windows applications, I immediately thought of a less-than-glamorous workaround.
I theorized that if I could mimic the behavior of a user's mouse click, I might be able to trick the virtual list control into a programmatic selection. I did this by calling the CListCtrl::EnsureVisble() method to make sure the item to select is in the visible area of the list. Next, I called the ClistCtrl::GetItemRect() method to get the rectangle of the item I wanted to select. Then I called CListCtrl::SetItemState() to set the item's state to selected. Finally, I posted a "fake" mouse message to the list control by sending WM_LBUTTONDOWN and WM_LBUTTONUP messages to the virtual list, while supplying the item's rectangle as the location of the mouse clicks. Much to my surprise, this simple solution worked extremely well for programmatically selecting list items. To simplify using this approach, I incorporated the code into the SelectItem() method of my CVListCtrl class (see vlist.cpp).
I've included a sample program (available electronically) to illustrate many of the concepts involved with creating and managing virtual listviews. The sample program uses DAO in conjunction with a Microsoft Access database. You'll need to have DAO 3.5 or later installed to run the sample program. The database included with the sample is initially empty. The first time you run the program it populates the sample database with 10,000 randomly generated entries.
Once the database is created, you can navigate through the list and search for entries as you'd do with any normal listview. In fact, it might be a good experiment to test the virtual listview's loading speed by scrolling and paging rapidly through the list's entries. I think you'll find that the virtual list does a good job of keeping up with such operations. In fact, you might be very surprised to find that virtual lists are pretty hard to distinguish from those of the nonvirtual variety.