Channels ▼
RSS

Mapping recls to COM Collections


May 04: Mapping recls to COM Collections Delving into the Component Object Model

In previous columns, I introduced you to recls — a platform-independent library that provides recursive filesystem searching — and demonstrated techniques for integrating C/C++ libraries with C++ ("normal" classes, and STL sequences), C#, D, and Java by implementing recls mappings for those languages. The source for all the versions of the libraries and the mappings are available from http://www.cuj.com/code/ and http://recls.org/downloads.html. This month, I focus on mapping the library to COM.

There are four changes to recls for this round:

  • First, it now fully supports Unicode as well as ANSI, in the form of a compile-time preprocessor selection. If the preprocessor symbol UNICODE is defined and the symbol RECLS_NO_UNICODE is not defined, then the recls library is built using Unicode APIs, and the character types recls_char_t are defined to be wchar_t rather than char. At the moment, this is only supported for Win32; Unicode on UNIX is not supported.
  • Second, passing a null pointer for the pattern causes it to default to the given platform's search-all string; for instance, "*" on UNIX and "*.*" on Win32. It's a minor change, but helps simplify client code, which no longer needs to care about how to express this on a platform-specific basis.

  • Third, Comeau 4.3.3 and Intel 8.0 are now supported, and the next version will probably also support Open Watcom.

  • Finally, the recls C API calling convention is now cdecl on Windows, rather than stdcall. The C# and D mappings have been adjusted accordingly.

This month's developments correspond to Version 1.3(.1) or later of recls. It requires either STLSoft 1.7.1 or later (http://stlsoft.org/). (It actually relies on v1.7.1 Beta 4.)

COM Enumerators and Collections

In the STL world, the Container concept describes the things that hold collections of a given type, and the Iterator concept describes the mechanisms by which we access the elements within a collection. In the COM world, the containers are known as COM Collections, or just Collections, and the iterators are known as COM Enumerators, or just Enumerators.

A COM Enumerator implements a COM enumeration interface, which provides four methods:

interface IEnumTHING
{
  HRESULT Next( unsigned long n
     , THING         *pElements
     , unsigned long *pFetched);
  HRESULT Skip(unsigned long n);
  HRESULT Reset();
  HRESULT Close(IEnumTHING **ppEnThing);
};

For our fictional THING type, IEnumTHING provides access to a number of THING instances via the Next() method. You specify the number of elements you require, n, along with a pointer to an array of elements, pElements, of at least length n, that receives the elements. You also specify a pointer to an integer, pFetched, which receives the number of elements retrieved. In this way, the return of fewer than the requested number is facilitated.

The Skip() method moves the enumeration point on by the given number of elements. Reset() sets the enumeration point back to the start. Clone() provides a new enumerator to the same collection of elements, whose initial enumeration point is the same as the enumerator on which it is called.

The COM Collection model requires an object to support some or all of the following methods/properties: Count, Item, _NewEnum, Add, Remove, and Clear. The first three are used to elicit information from the collection; the latter three effect changes on the collection. Since the recls mappings provide read-only information about the filesystem contents, we are not concerned with Add, Remove, and Clear.

The Count property returns the number of elements in the collection. The Item property is an indexed property by which one can retrieve an element by index and/or by name. The _NewEnum property provides access to the COM Enumerator for the collection; it's like begin() and end() rolled into one.

The Implementation

I used ATL for the implementation of the COM objects. Although it has some flaws, ATL is an easy framework with which to create most kinds of COM implementations, and is suitable for in-process servers (implemented in a dynamic library), which is what we're using here.

The COM objects in the recls COM mapping are lightweight wrappers over the recls API and its internal implementation, which are pretty lightweight themselves.

I won't go into detail about COM's threading models here; that's way too much information to fit into this column. If you want to learn about that, check the MSDN documentation, or ATL Internals, by Brent Rector and Chris Sells (Addison-Wesley, 1999) or Essential COM, by Don Box (Addison-Wesley, 1997). Thankfully, the recls library makes our choices very simple.

The FileSearch class has no state, and its Search() method effectively acts as a static method. The FileEntry class maintains a single recls entry handle of type recls_info_t and has no other member variables. recls does not change the data associated with an entry handle once it is created. Furthermore, retrieving information from entries, and even closing entry handles, is guaranteed threadsafe by recls. Therefore, there are no problems with different threads making simultaneous calls on the same FileEntry instance, and the mechanics of COM reference counting handles the final release of the entry handle back to recls. There's no need to manage multithreaded access to member state, so both these COM classes follow the "Both" model — which means that they support both free threading and apartment models — and they use the CComMultiThreadModelNoCS ATL threading policy class in their definitions. Essentially, this ensures that the reference counting is done in a threadsafe manner (via InterlockedIncrement()/InterlockedDecrement()), but the access to member data does not use a synchronization object (a Win32 CRITICAL_SECTION).

FileSearch

The FileSearch class is the entry point into the library, and provides the single method Search(), defined by the IFileSearch interface:

interface IFileSearch
  : IDispatch
{
   HRESULT Search( [in] BSTR searchRoot
     , [in] BSTR pattern
     , [in] long flags
     , [out, retval] IUnknown **_srch);
};

In other words, the caller specifies a search directory, a search pattern, and search flags, and receives an object that corresponds to those parameters. The object is returned via the generic IUnknown interface. The object returned, an instance of SearchCollection, must support the COM Collection model.

Listing 1 is the implementation of FileSearch::Search(). The important feature of this method is that it starts a search before creating a FileSearch instance. This is because it would be possible for a search to fail (for reasons other than that no matching entries were found). In such a case, you want the Search() method to fail and deal with it there, rather than have some weird error reported as you start to enumerate through the collection at some later point.

The other main point to note is that the code is written to work correctly whether you are compiling for Unicode (RECLS_CHAR_TYPE_IS_WCHAR is defined) or ANSI (RECLS_CHAR_TYPE_IS_WCHAR is defined). This facilitates our working with the ATL Wizard-provided project options of Unicode and ANSI releases. Also note that the ANSI version uses the WinSTL (the STLSoft subproject for the Win32 API; http://winstl.org/) w2a() Unicode to ANSI function, which acts as a conversion shim (see my article "Generalized String Manipulation: Access Shims and Type Tunneling," CUJ, August 2003; http://www.cuj.com/documents/s=8681/cuj0308wilson/). The normal way to do such conversion in ATL is to use the W2CA(), A2T(), and associated macros, but I avoid them for three reasons:

  • Do not work with null pointers, which rules them out for this particular case.
  • Rely on alloca(), which I personally prefer to avoid (for the reasons outlined in my article "Efficient Variable Automatic Buffers," CUJ, December 2003).

  • Require that the string to be converted is null terminated.

The WinSTL components suffer from none of these problems, though they do have the conversion shim return-value lifetime restrictions.

Assuming that the call to Recls_Search() results in a valid search, or fails only for having no matching items, then an instance of the SearchCollection is created and passes a SearchInfo instance containing the attributes of the current search.

SearchCollection

Since a recls search has no means to know ahead of time the number of elements that will be found, the Count method is not provided on the collection. Similarly, we cannot index the items because they are enumerated sequentially, nor can we access them by name, so the Item property is not supported either. We're just left with _NewEnum. The SearchCollection class embodies the COM Collection for the recls mapping, and derives from the ISearchCollection interface:

interface ISearchCollection
  : IDispatch
{
   [propget, id(DISPID_NEWENUM), restricted, hidden] 
   HRESULT _NewEnum([out,retval] IUnknown** pVal);
};

Its definition is straightforward, since it merely implements the get__NewEnum() method and stores an instance of SearchInfo to record details of the search. The SearchInfo class is defined as:

struct SearchInfo
{
  hrecls_t  hSrch;
  CComBSTR  searchRoot;
  CComBSTR  pattern;
  long      flags;
  bool      bEmpty;
  SearchInfo()
    : hSrch(NULL)
    , bEmpty(false)
  {}
};

The reason you need to record the search parameters is that there are several points at which a search may need to be (re)initiated. One of these is within the SearchCollection class itself. When client code calls get__NewEnum(), the current search (perhaps the one started in the FileSearch::Search() method) is given over to the object that represents the COM Enumerator for the collection, an instance of the EnumEntry class (which we'll meet shortly). An example of this is found in one of the accompanying test programs, the imaginatively titled VBClient, which provides an almost identical version of the C# client we saw in the second installment. Listing 2, from the search button handler, demonstrates the use of the collection. The For Each statement in Visual Basic causes the given object to be treated as a collection and queried for its enumerator via the _NewEnum property. In fact, it uses the stock DISPID_NEWENUM dispatch ID (DISPID) to identify the property, so you could give it any name. (You use the standard name so that C++ clients can use it directly, rather than having to use IDispatch::Invoke().)

Since there's nothing to stop any client code doing multiple enumerations through the collection's contents, we need to be able to support additional enumerations. Because the recls API has a single pass semantic, you retain the details and then start a new search if you are requested to do so. You might suggest that this is all needless complexity and that searches should only be initiated at the last minute; there's some merit in that position, but I chose this way since it provides more obvious behavior to the client code.

Listing 3 presents the implementation of SearchCollection::get__NewEnum(). Note that, as is the way with ATL, the COM objects must be initialized after construction. You just have to accept this; trying to buck the system will only bring you grief.

EnumEntry

The EnumEntry class embodies the COM Enumerator for the recls mapping. Because all this stuff's not complicated enough, I've added an extra twist. For Visual Basic (and other automation languages) clients, the collection must be in the form of an IEnumVARIANT; in other words, the items enumerated by the collection are VARIANTs. However, since the items are always instances implementing the IFileEntry interface (see Listing 4), I've implemented the EnumEntry to provide both IEnumVARIANT and IEnumFileEntry interfaces. This means that C++ clients don't have to go through the tedious extra step of eliciting the interface from the VARIANT (which, to be correct, involves checking that the variant type is VT_UNKNOWN or VT_DISPATCH, then querying for IFileEntry).

Listing 5 is the definition of EnumEntry. The rules of COM dictate that once an object has responded positively to a request for an interface, it must always appear to support that interface. The way EnumEntry works, therefore, is that it maintains two members to monitor which interface, if any, it has provided. Once either IEnumEntry or IEnumVARIANT has been returned, the other interface cannot be made available.

This is accomplished in the QueryFunc() static method, which checks the status of the instance with respect to the two interfaces and accepts or rejects the interface request accordingly. (Some of the code is rather grisly, but it is correct; if you want to look at it, check out the implementation available in this month's code archive, or online at http://recls.org/downloads.html.) The lesser known INTERFACE_MAP entries catch the requests for IEnumVARIANT/IEnumFileEntry, and also ensure that queries for IUnknown always succeed, as they must by the rules of COM.

The other methods in the class are there to define and implement the enumeration interfaces. Since the two interfaces have identical signatures for the Skip() and Reset() methods, they share a single implementation in the class. This is fine in this particular case because the class will only ever support one enumeration interface and, therefore, only one enumerated type: either VARIANT or IFileEntry.

The Next() methods are logically identical, in that they enumerate the entries in the search represented by the hSrch field of the m_info member. The only difference is that one converts to IFileEntry, in the form of a FileEntry instance, whereas the other takes the further step of wrapping that up in a VARIANT.

Clone() is not supported for either interface and returns E_NOTIMPL. Since recls is single pass, there's no guaranteed way to Clone() an enumerator; the rationale for this was discussed in the first installment of the series.

Skip() is implemented by simply calling Recls_GetNext() the requisite number of times. The remaining method, Reset(), is the most interesting of the four (or six). Because the recls API provides single pass enumeration, resetting is implemented as closing the current search and starting a new one. This is the other reason (along with multiple enumerations of the SearchCollection) that we need to record the search criteria. After a call to Reset(), the appropriate Next() method can be called to retrieve the entries of the newly initiated search.

FileEntry

The search entries are represented by the FileEntry class, whose class definition is mostly simple. It implements the IFileEntry interface in Listing 4, and maintains a single recls_info_t member, which it passes to the recls API to retrieve the entry's characteristics.

The only notable aspects of this are the helper methods GetStringProperty_() and GetTimeProperty_().

HRESULT GetStringProperty_(
    size_t (*)(recls_info_t, recls_char_t *, size_t)
   , BSTR *pVal);
HRESULT GetTimeProperty_( recls_time_t recls_fileinfo_t::*pm
                        , DATE *pVal);

They encapsulate all the boilerplate COM for string and time property retrieval and use pointers to recls API functions or structure members. Hence, all such property methods have extremely simple implementations, as in:

STDMETHODIMP FileEntry::get_Path(BSTR *pVal)
{
  return GetStringProperty_(Recls_GetPathProperty, pVal);
}
STDMETHODIMP FileEntry::get_CreationTime(DATE *pVal)
{
  return GetTimeProperty_(&recls_fileinfo_t::creationTime, pVal);
}

DirectoryPartsCollection

The only complexity in the FileEntry class involves the get_DirectoryParts() method (Listing 6), which creates and returns a COM Collection in the form of the DirectoryPartsCollection class (Listing 7). This derives from a parameterization of the ATL ICollectionOnSTLImpl template, which implements the semantics of a COM Collection using an STL container. A full discussion of this template is outside the scope of this article, but there are good sources on the matter, including some of the COM references I gave earlier.

The important point to note is that the STL container used by ICollectionOnSTLImpl is a parameterization of STLSoft's proxy_sequence template, which I mentioned in an earlier installment when I mapped recls to STL sequences. Here, it is used in combination with the dirparts_proxy_traits traits type to create a proxy sequence over the directory parts pointers for a search entry whose value type is CComVariant. In this way, you avoid any unnecessary copying of data from the entry inside the recls API until it is actually needed during an enumeration of the DirectoryPartsCollection by client code. This efficiency is bought for the almost negligible cost of ensuring that the FileEntry instance is kept "alive" for the duration of the DirectoryPartsCollection instance; this is simply and cheaply accomplished by taking a pointer to the owner in the Init() method (Listing 7).

The implementation of dirparts_proxy_traits::make_value() revealed an interesting quirk (bug) within ATL. The original implementation was:

CComVariant make_value(const recls_strptrs_t &ptr)
{
  return CComVariant( CComBSTR( ptr.end - ptr.begin
                              , ptr.begin));
}

Unfortunately, there is a bug in CComBSTR, or rather, in the helper function that it calls — A2WBSTR() — which assumes that the string is null terminated even when we're passing in a length. The actual implementation (Listing 8) instantiates the VARIANT directly, using the SysAllocStringLen() API function.

Unicode versus ANSI

As I mentioned in the previous installment, I don't hold with the (Unicode) Release MinDependency versus (Unicode) Release MinSize complexity, so I tend to edit projects to produce just two flavors of release — ANSI and Unicode. In general, I like to go even further and ensure that there is a single version that works optimally on both Windows 9x/NT systems. Unfortunately, time was not my friend this month, so this has not been done. Hence, to use the COM mapping in its current guise, you should select between the ANSI and Unicode forms, and build and install the one suitable to your needs: If you only wish to use the recls COM mapping on NT family systems, you should use the Unicode version; otherwise, choose the ANSI version.

Later on, I will look at a way to obviate this hassle and just build a single library that will work optimally on all Win32 platforms.

Next Steps

I've included with this month's archive two test programs for the COM mapping. There's a Visual Basic 6 GUI client, similar to the C# GUI client from earlier in the series, and a C++ command-line client with similar behavior to the other command-line clients we've seen (for C, C++, D, Java, and STL test programs).

I haven't yet decided what language or technology to map recls to for next time, so it'll be a surprise to all of us — probably either Perl or Python. There are a few other changes I'd like to make as well, such as providing a single binary version to work optimally on both Windows 9x/NT, and the API function name rationalization that I've been promising since the first column. Time will tell.

Feel free to write to me (or post a FAQ at http://recls.org/faq.html) and suggest other languages/technologies for which you'd like to see a recls mapping. recls has already been adopted into the D Standard Library (as the std.recls module), and I'm open to any similar possibilities with other languages.


Matthew Wilson is a software development consultant for Synesis Software, creator of the STLSoft libraries, and author of the forthcoming Imperfect C++ (Addison-Wesley, 2004). He can be contacted at http://stlsoft.org/.



Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video