Introducing recls Mappings

Have you ever wished that STL member function adaptors like mem_fun or mem_fun_ref would like you invoke a method of a class of your choice?


January 01, 2004
URL:http://www.drdobbs.com/introducing-recls-mappings/184401749

In the November column, I introduced you to recls, a platform-independent library that provides recursive filesystem searching. Hopefully, you've downloaded and used Version 1.0 (http://www.cuj.com/code/ or http://recls.org/). In addition to providing a single API for use from C/C++, the library also provides mappings to different languages and technologies. These mappings form the main focus of this column for the next few installments.

This month, I introduce the first three mappings. The C++ wrapper classes mapping was included in the library's 1.0 release. The other two mappings are C++ STL and C#. Together, these make up Version 1.1 of the library.

Building recls

My intention is for recls to be as standalone as possible (except for system and mapping-related libraries), with the proviso that it relies on the STLSoft libraries. Specifically, the code requires Version 1.6.5 (or later) of the STLSoft libraries (http://stlsoft.org/downloads.html and http://recls.org/). The makefiles also require that you define an environment variable STLSOFT_INCLUDE to refer to the absolute path of the directory in which the STLSoft library files reside. As the author of both libraries, I assure you that they work together. And since the raison d'etre of the recls project (and this column) is to learn about language integration, it seems a reasonable shortcut to what would be a daunting coding task. The STLSoft code is almost entirely used within the recls implementation, and does not form part of the public interface. Other than that, it only appears in the C++ STL mapping.

There are new STLSoft components that have not made it into a distribution but are used within the implementation of recls 1.1. (At this writing, the planned release of STLSoft 1.7.1 is months away, lurking behind too many commitments.) These files are, therefore, included in the recls 1.1 archive and in a patch available from the STLSoft and recls web sites. Once installed, you should have no problems. Let me know if you find any and I'll post updates to the recls web site.

I'm providing makefiles for all of the supported compilers, and some IDDE project files (initially Visual Studio 98), but you have to handle your own static/dynamic-library builds and such. Naturally, I'll respond to bugs in the recls source and try to help where I can. However, the only IDDE I'm anything like expert in is Visual Studio 98, so you're largely on your own. But if you want to create project files and submit them, I'll happily post them on the web site for others.

recls Improvements

There are a few changes to the core library between Versions 1.0 and 1.1. The obvious difference is that there are two new functions:

I've also amended how return codes are defined. They were formerly defined as:

#define RECLS_RC_NO_MORE_DATA   
             ((recls_rc_t)(-1 - 1003))

but because the recls_rc_t type is defined within the recls namespace, C++ client code that might reference the return code but not have used the type (whether by nice using declarations, or naughty using directives) would not compile. I've fixed this by defining the return code via a macro, as in:

#if !defined(RECLS_NO_NAMESPACE)
# define RECLS_QUAL(x)          ::recls::x
#else
# define RECLS_QUAL(x)          x
#endif /* !RECLS_NO_NAMESPACE */
 ...
#define RECLS_RC_NO_MORE_DATA 
      ((RECLS_QUAL(recls_rc_t))(-1 - 1003))

This works fine with both C and C++ compilation. (Of course, I should probably make it an enum, but it's worth pointing out this technique, which is useful for functions and types shared between C and C++.)

Under the covers, the library has changed in a couple of ways. First, it now returns directories as search items as well as (or instead of) files. In other words, the Recls_Search() function pays attention to the RECLS_F_FILES and RECLS_F_DIRECTORIES flags, where it previously ignored the flags and always enumerated files. A corollary to this is the introduction of a return code RECLS_RC_INVALID_SEARCH_TYPE. This is returned (by the Win32 implementation) if the requested types are nonzero and do not include the files or directories flags. (If the flags are 0, files is always assumed.) Also defined are RECLS_F_LINKS and RECLS_F_DEVICES, which are currently ignored.

The second behavioral change is that the RECLS_F_RECURSIVE and RECLS_F_DIRECTORY_ PARTS flags are acted upon. Hence, if you don't specify the former, you'll only receive matching entries in the search directory. If you don't specify the latter, the directory parts will not be evaluated, resulting in slightly faster searching.

The last change is that Recls_SearchProcess() is now implemented. This function takes a pointer to a function (which takes a pointer to the entry info structure and a user-defined parameter) and conducts an entire (cancelable) search, applying the given function to each matching entry.

int RECLS_CALLCONV_DEFAULT 
EntryFunc(recls_info_t info,
         recls_process_fn_param_t /* param */)
{
  printf("%s\n", info->path.begin);
  return 1; /* Continue search */
}
Recls_SearchProcess("h:\\recls", "*.h", 
             RECLS_F_RECURSIVE, EntryFunc, 0);

This is useful when the operation you wish to apply to the items is simple because it reduces client code considerably, as shown in the new SearchProcess_test test program in the archive.

Note that the API is still ANSI-only; there's no Unicode yet.

C++

The C++ class mapping is straightforward. There are three classes defined within the recls::cpp namespace (aliased to reclspp [1]). The FileSearch class (see the simplified public interface in Listing 1) manages a search handle hrecls_t and provides enumeration via the HasMoreElements(), GetNext(), and GetCurrentEntry() methods. It presents file entry information in the form of the FileEntry class (Listing 2), which wraps a recls_info_t instance. The string_t type is defined in the C++ mapping root header, reclspp.h, to be std::string, unless you stipulate otherwise via the preprocessor. Once the recls library supports Unicode, this will change to a traits-based approach.

I've kept these classes as simple as possible to represent an efficient and convenient alternative to the raw API for C++ client code. There are no exceptions and no runtime error checking (other than assertions via recls_assert.h). Users of the FileSearch class must call GetCurrentEntry() when they know that an entry is available due to calling HasMoreElements(). A search is initiated in the constructor and cannot be restarted. Hence, a FileSearch instance is just that — a (single) file search. The constructor does not throw an error; rather, you must use the usually crummy construct-and-test technique. In this case, it's reasonable because the HasMoreElements() method must be tested before retrieving elements.

The only other notable part of this mapping is that the FileEntry::GetDirectoryParts() method returns an instance of the DirectoryParts class. This is a thin layer over the shared recls_info_t instance, copied (actually, just a reference-count increment) from the source FileEntry instance. The DirectoryParts class provides two methods to give access to the separate parts of the entry's directory:

size_t size() const
string_t operator [](size_t index) const;

STL

The C++ STL mapping consists of several cooperating templates, defined within the recls::stl namespace (aliased to reclstl [1]). One declares instances of basic_search_sequence<char or wchar_t>, then accesses and manipulates the asymmetric range defined by [begin(), end()).

typedef reclstl::basic_search_sequence<char>
                                  sequence_t;
sequence_t search("/usr/include", "*h",
            RECLS_F_FILES | RECLS_F_RECURSIVE);
std::for_each(search.begin(), search.end(), . . . );

Rather than detail here how to implement STL sequences over other kinds of enumeration APIs, I opt for the technology evangelist and TV chef "here's one I prepared earlier" approach. Specifically, my article "Adapting Win32 Enumeration APIs to STL Iterator Concepts" (Windows Developer Network, March 2003; http://www.windevnet.com/documents/win0303a/) covers this subject in detail. It also explains the implementation of the WinSTL basic_findfile_sequence template, which is used in the Win32 implementation of recls.

In anticipation of the Unicode support, I've used a traits class, recls_traits (Listing 3), to select the appropriate methods from the API. For example, the recls_traits<char>::GetNextDetails method is defined as:

template <>
struct recls_traits<char>
{
  static recls_rc_t GetNextDetails
         (hrecls_t hSrch, entry_type *pinfo)
  {
    return Recls_GetNextDetails(hSrch, pinfo);
  }

This makes it easy to update the STL mapping to handle Unicode as well as ANSI because, when the C API changes to have Recls_GetNextDetailsA() and Recls_GetNextDetailsW(), this method will be changed to be implemented in terms of Recls_GetNextDetailsA(). Similarly, recls_traits<wchar_t>::GetNextDetails will be implemented in terms of Recls_GetNextDetailsW(). All the other reclstl classes will not need to be changed. Voilà!

Again, this mapping uses some STLSoft headers. stlsoft_iterator.h defines the template iterator_base with which the container implementation is blessedly insulated from the inconsistencies and incompatibilities to be found in the various Standard (!) Library implementations of the last half decade or so. The other header used is stlsoft_proxy_ sequence.h, which defines the proxy_sequence template. Let's take a closer look at the reclstl classes:

C#

The first thing you need to do to support .NET Interop via C# is to have a DLL. Thus, I've created a DLL project that's included with Version 1.1. It is implemented by linking the static library with a C file containing a DllMain and a .DEF file containing the exports. The DLL was built with Visual C++ and statically linked to the C Runtime Library. As such, it weighs in at 48 KB. In the long run, I'd like to pare this down because there's lots of cruft in there; it doesn't have any static objects or use stdio.

To use the exported symbols from any DLL in C#, in what is known as "Native Interop," you must use the DllImport attribute from the System.Runtime.InteropServices namespace. There's a lot to learn about Interop [2], but you should be able to glean a fair amount by looking through the implementation of the C# mapping. The essential step is to declare — but not define — your imported functions and decorate them with the DllImport attribute, as in:

 [DllImport( "recls_dll", EntryPoint="Recls_Search"
          , CallingConvention=CallingConvention.StdCall
          , CharSet=CharSet.Ansi, ExactSpelling=true)]
private static extern
  int Recls_Search( string searchRoot, string pattern
                  , uint flags, out hrecls_t hSrch);

This declares a function Recls_Search(), taking two strings and a uint, and returns the search handle via an out parameter. It states that the function resides in recls_dll.dll (if not specified, the extension is assumed to be .DLL), is called Recls_Search, uses the __stdcall convention (callee cleans stack), and expects ANSI rather than Unicode character strings. The ExactSpelling attribute requires that the Interop layer use the exact name, rather than apply A or W postfixes, which it is able to do for you.

One option would be to define the recls_fileinfo_t structure within C#, using the StructLayout attribute, which would arguably be more efficient. But I couldn't face all the mess of dealing with the pointer ranges from within C#. (If you want to do that, I'll be happy to post it on the site.) Also, it would be fragile and difficult to change, especially when expanding the C API to Unicode and ANSI versions.

So the entries are treated as if they are opaque and are defined as IntPtr. One irritant is that, although it's possible to use C#'s alias mechanism to weakly typedef hrecls_t and recls_info_t from IntPtr, they are fundamentally the same type and can be mistakenly interchanged, so it's only a help in porting the code across from C (Listing 6; available at http://www.cuj.com/code/). It'd be better if C# provided a strong typedef, but it doesn't.

One smart move [2] when dealing with Interop is to isolate all the external functions in another class, as I've done by defining a recls_api class within the recls namespace. Hence, the FileSearch, FileEntry, DirectoryParts, and ReclsException types are all implemented in terms of recls_api rather than having to mess around with imported functions. As well as insulating them from change, it also means that they can deal with .NET types only; recls_api handles all the translation from the C API types to .NET types; Win32 FILETIME values to .NET's DateTime; Win32 ULARGE_INTEGER values into C#'s ulong (64-bit integer); and C-strings into .NET's String. The strange conversion is when passing character buffers to the C API. This is done by instantiating a StringBuilder instance, ensuring it has sufficient capacity, and passing it to the API as an object reference, as in:

[DllImport("recls_dll", EntryPoint = "Recls_GetDirectoryPartProperty", . . .)]
private static extern
 uint Recls_GetDirectoryPartProperty(recls_info_t fileInfo, int part
                        , StringBuilder buffer, uint cchBuffer);

public static string GetEntryDirectoryPart(recls_info_t entry, int index)
{
  StringBuilder buffer    = new StringBuilder(261);
  uint          capacity  = (uint)buffer.Capacity;
  uint          cch       = Recls_GetDirectoryPartProperty(entry, index
                                                      , buffer, capacity);
  buffer.Length = (int)cch;
  return buffer.ToString();
}

In the first cut, I had the FileEntry instances copy the IntPtr for their entries from the FileSearch class. Thus, some FileEntry instances were holding the structure while others were releasing it back to the recls C API in their finalizers. Embarrassing, certainly, but easy to fix: except that the failure symptom reported a NullReferenceException. Naturally, this makes you think "C# object reference" rather than EXCEPTION_ACCESS_VIOLATION. However, once I'd stuck in more debugging code and hit myself over the head a couple of times, all was right with the world.

There's an interesting design decision in the implementation of FileSearch (Listing 7; available at http://www.cuj.com/code/) that is made enumerable by the provision of a GetEnumerator() method, which returns an object implementing the IEnumerator interface. When creating a FileSearch instance, it is desirable to find out at that time whether the search parameters are going to lead to a valid search. Hence, we wish to start the search in the constructor. However, there are two reasons why this cannot be the case:

The consequence of deferring the search until it's used is that the exception is thrown from within the foreach, rather than from the object's construction — which is unappealing from a common-sense point of view, but necessary. Note that an empty, but otherwise valid, search does not cause an exception to be thrown.

FileSearch instances can be used within a foreach loop, so it's trivial to enumerate through the matching entries:

FileSearch  fs = new FileSearch(searchRoot,
                           pattern, flags);

foreach(FileEntry fe in fs)
{
  // do something with fe
  System.Console.WriteLine(fe.Path);

The C# compiler evaluates whether a foreach enumerator is "disposable," i.e., implements the IDisposable interface. If it does, then the compiler guarantees that the Dispose method will be called no matter how the foreach loop terminates. Since our enumerators contain unmanaged resources (search handles and entry structures), it is a good idea to implement the IDisposable interface, as can be seen in the code.

Documentation

From Version 1.0, I've created documentation (http://recls.org/help/) for the library using Doxygen (http:// doxygen.org/). You can see some of the Doxygen tags in Listing 1. Documentation is hard to write and it's probably not perfect. I'll gladly hear any comments for improvement.

The exception to using Doxygen is the C# mapping, since the C# compiler can generate (via the /doc flag) XML documentation files directly from the source, assuming you've used the correct tags (Listing 7). The resultant files, when installed alongside their assemblies, can provide Intellisense information to the Visual Studio.NET IDDE, which is nice. Also, the free documentation tool NDoc (available at http://ndoc.sourceforge.net/) can be applied to the XML files to produce compiled HTML Help (.CHM) files that also link to all the requisite .NET SDK online documentation. It produces a professional-looking package, so this is what I'm using in the case of C#. (Alas, Visual C++ does not yet perform the same service for Managed C++, so that mapping will be done using Doxygen along with all the others.)

Next Steps

In the next installment, I will address:

Notes

And References

[1] Wilson, Matthew. "Open-Source Flexibility via Namespace Aliasing," C/C++ Users Journal, July 2003.

[2] Clark, Jason. "Calling Win32 DLLs in C# with P/Invoke," MSDN magazine, July 2003.

[3] If someone wants to grant me a sandbox login (with a compiler, of course) to their architecture of choice — Mac, VMS, whatever — I'll be glad to port it. (I have to admit, I just get a big kick out of writing cross-platform code.)


Matthew Wilson is a software development consultant for Synesis Software, creator of the STLSoft libraries, and author of the upcoming Imperfect C++ (Addison-Wesley, 2004). He can be contacted at [email protected] or http://stlsoft.org/.


January 04: Introducing recls Mappings It's STL and C#'s turn

Listing 1: FileSearch public interface.

/// This class provides . . .
/// \ingroup group_recls_cppclass FileSearch
{
/// \name Construction
/// @{
public:
  /// Creates a search for . . .
  /// \param rootDir The starting directory for the search. If NULL . . .
  /// \param pattern The search pattern, e.g. "*.h". If NULL . . .
  /// \param flags Combination of enumerants from \c RECLS_FLAG enumeration
  FileSearch(char const *rootDir, char const *pattern, recls_uint32_t flags);
  ~FileSearch();
/// @}
/// \name Operations
/// @{
public:
  /// Advances the search to the next position
  recls_rc_t      GetNext();
/// @}
/// \name Attributes
/// @{
public:
  /// Returns non-zero if there is more data available
  recls_bool_t    HasMoreElements() const;
  /// Returns the current entry
  /// \note The behavior is undefined when \c HasMoreElements() returns zero
  FileEntry       GetCurrentEntry() const;
/// @}
// Members
private:
  hrecls_t  m_hSrch;
};




January 04: Introducing recls Mappings It's STL and C#'s turn

Listing 2: FileEntry class.

class FileEntry
{
// Construction
public:
  FileEntry();
  FileEntry(FileEntry const &rhs);
  ~FileEntry();
  FileEntry &operator =(FileEntry const &rhs);
/// Attributes
public:
  string_t          GetPath() const;
#ifdef RECLS_PLATFORM_API_WIN32
  char              GetDrive() const;
#endif /* RECLS_PLATFORM_API_WIN32 */
  string_t          GetDirectory() const;
  string_t          GetDirectoryPath() const;
  DirectoryParts    GetDirectoryParts() const;
  string_t          GetFile() const;
  string_t          GetShortFile() const;
  string_t          GetFileName() const;
  string_t          GetFileExt() const;
  recls_time_t      GetCreationTime() const;
  recls_time_t      GetModificationTime() const;
  recls_time_t      GetLastAccessTime() const;
  recls_time_t      GetLastStatusChangeTime() const;
  recls_filesize_t  GetSize() const;
  recls_bool_t      IsReadOnly() const;
  recls_bool_t      IsDirectory() const;
  recls_bool_t      IsLink() const;
// Members
private:
  recls_info_t    m_info;
};




January 04: Introducing recls Mappings It's STL and C#'s turn

Listing 3: recls_traits.

template <typename C>
struct reclstl_traits
{
public:
  typedef void      char_type;       // placeholder type
  typedef void      *entry_type;     // placeholder type
public:
  static hrecls_t   Search( char_type const *searchRoot, 
                          char_type const *pattern, recls_uint32_t flags);
  static recls_rc_t GetDetails(hrecls_t hSrch, entry_type *pinfo);
  static recls_rc_t GetNextDetails(hrecls_t hSrch, entry_type *pinfo);
  static void       CloseDetails(entry_type fileInfo);
  static entry_type CopyDetails(entry_type fileInfo);
  static char_type  *str_copy(char_type *, char_type const *);
};




January 04: Introducing recls Mappings It's STL and C#'s turn

Listing 4: basic_search_sequence_const_iterator methods.

template< typename C , typename T , typename V >
inline class_type &basic_search_sequence_const_iterator<C, T, V>::operator ++()
{
  recls_message_assert("Attempting to increment invalid iterator", 
                                                         NULL != m_hSrch);
  if(RECLS_FAILED(Recls_GetNext(m_hSrch)))
  {
    Recls_SearchClose(m_hSrch);
    m_hSrch = NULL;
  }
  return *this;
}
template< typename C , typename T , typename V >
inline value_type 
           basic_search_sequence_const_iterator<C, T, V>::operator *() const
{
  entry_type  info;
  if( m_hSrch != NULL &&
      RECLS_SUCCEEDED(traits_type::GetDetails(m_hSrch, &info)))
  {
    return value_type(info);
  }
  else
  {
    recls_message_assert("Dereferencing end()-valued iterator", 0);
    return value_type();
  }
}




January 04: Introducing recls Mappings It's STL and C#'s turn

Listing 5: Public interface of basic_search_sequence_value_type.

class basic_search_sequence_value_type
{
  ...
  string_t              get_path() const;
#ifdef RECLS_PLATFORM_API_WIN32
  char_type             get_drive() const;
#endif /* RECLS_PLATFORM_API_WIN32 */
  string_t              get_directory() const;
  string_t              get_directory_path() const;
  directory_parts_type  get_directory_parts() const;
  string_t              get_file() const;
  string_t              get_short_file() const;
  string_t              get_filename() const;
  string_t              get_fileext() const;
  recls_time_t          get_creation_time() const;
  recls_time_t          get_modification_time() const;
  recls_time_t          get_last_access_time() const;
  recls_time_t          get_last_status_change_time() const;
  recls_filesize_t      get_size() const;
  recls_bool_t          is_readonly() const;
  recls_bool_t          is_directory() const;
  recls_bool_t          is_link() const;
  ...
};
template <typename C, typename T>
inline class_type
  &basic_search_sequence_value_type<C, T>::operator =( class_type const &rhs)
{
  if(NULL != m_info)
  {
    traits_type::CloseDetails(m_info);
  }
  m_info = traits_type::CopyDetails(rhs.m_info);
  return *this;
}
template <typename C, typename T>
inline string_t basic_search_sequence_value_type<C, T>::get_filename() const
{
  recls_assert(NULL != m_info);
  return string_t(m_info->fileName.begin, m_info->fileName.end);
}
template <typename C, typename T>
inline directory_parts_type 
  basic_search_sequence_value_type<C, T>::get_directory_parts() const
{
  recls_assert(NULL != m_info);
  return directory_parts_type( m_info->directoryParts.begin
                             , m_info->directoryParts.end);
}


        

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.