Channels ▼
RSS

C/C++

Recursive Directory Search in C#


Several years ago I wrote the column Positive Integration for C/C++ Users Journal and later Dr. Dobb's Journal, which discussed issues involved in adapting C/C++ libraries to other languages. The main exemplar project used was recls ("recursive ls") [1], a platform-independent recursive filesystem search library written in C and C++, and with a C API. Adaptation to numerous languages (including Ch, C#/.NET (via P/Invoke), D, Java, Python, and Ruby) was examined, covering the development of the library from versions 1.0 through 1.6. Since that time, the library has continued to evolve, and now stands at 1.8. A new C/C++ version, 1.9, will be released in the coming weeks.

I have long planned to rework the library implementation. The two main changes will be a substantial refactoring of the source files and packaging for the core library and the C++ layer, and a rewrite of some/all of the language mappings in the form of full "100%" implementations. This article describes the first of these, a 100% C# implementation of recls for .NET. For clarity I'll refer to the original stream of work as recls 1.x and the new .NET library as recls 100% .NET in this article.

The reasons for these changes are:

  • The core library has grown to a level of complexity such that I no longer find it easy to make changes
  • I wanted to introduce diagnostic logging to the core library; this is included in recls 1.9
  • I wanted to ease the burden of deployment. For example, with the .NET mapping in versions up to 1.8, the recls.dll exporting the core C API (for access via P/Invoke) must be manually packaged along with the C# API in recls.NET.dll. Automated tools (such as Visual Studio) do not automatically copy it to working areas. And organisational security policies may prohibit use of assemblies that call into "unmanaged" code.
  • I wanted to take advantage of new features of languages over the last five years. As we'll see shortly, aspects of C# 3 make for improved syntax in client code for non-trivial search use cases
  • I wanted to implement two long asked-for features: breadth-first search, and search-depth limiting. recls 1.x provides only depth-first search, and always does a full-depth search.

Despite being written entirely in C#, the implementation of recls 100% .NET is larger than can be fully covered here. So I intend to focus on the interesting design points, language features, and the differences in functionality between recls 1.x and recls 100% .NET.

API Differences

The first difference is a cosmetic one. To placate FxCop [2], and also to clearly distinguish the new recls .NET API from the old for anyone who wishes to port their code to it, I changed the old recls namespace to Recls.

Similarly, the RECLS_FLAG enumeration is now SearchOptions (see Listing 1), and its enumerators are Files not FILES, Directories not DIRECTORIES, and so on. There are also fewer enumerators. Notably absent from the original [3] are RECURSIVE, LINKS, DEVICES, NO_FOLLOW_LINKS, DIRECTORY_PARTS, DETAILS_LATER, PASSIVE_FTP, and ALLOW_REPARSE_DIRS. The changes reflect the intended increase in portability and improvements to discoverability and transparency [4, 5] of the new API, based on user feedback.


 [Flags]
public enum SearchOptions
{
  None                         = 0x00000000,
  Files                        = 0x00000001,
  Directories                  = 0x00000002,
  IgnoreInaccessibleNodes      = 0x00100000,
  MarkDirectories              = 0x00200000,
  IncludeHidden                = 0x00000100,
  IncludeSystem                = 0x00000200,
  DoNotTranslatePathSeparators = 0x00002000,
}

Listing 1: The SearchOptions enumeration.

The FileEntry class is gone, replaced by the IEntry interface (see Listing 2). The FtpSearch class goes entirely, as the first version of recls 100% .NET does not support FTP search. The DirectoryParts class is no longer externally visible; the DirectoryParts getter-property now returns (an instance implementing) the interface IDirectoryParts; see Listing 3. The FileSearch class goes, and search is now provided by the (static) FileSearcher class.


// in namespace Recls
public interface IEntry
{
  string Path { get; }
  string SearchRelativePath { get; }
  string Drive { get; }
  string DirectoryPath { get; }
  string Directory { get; }
  string SearchDirectory { get; }
  string UncDrive { get; }
  string File { get; }
  string FileName { get; }
  string FileExtension { get; }
  DateTime CreationTime { get; }
  DateTime ModificationTime { get; }
  DateTime LastAccessTime { get; }
  DateTime LastStatusChangeTime { get; }
  long Size { get; }
  FileAttributes Attributes { get; }
  bool IsReadOnly { get; }
  bool IsDirectory { get; }
  bool IsUnc { get; }
  IDirectoryParts DirectoryParts { get; }
}

Listing 2: The IEntry interface.


public interface IDirectoryParts
  : IEnumerable<string>
{
  int Count { get; }
  string this[int index] { get; }
  bool Contains(string item);
  void CopyTo(string[] array, int index);
}

Listing 3: The IDirectoryParts interface.

IEntry vs. FileEntry

Table 1 compares the public interfaces of the old FileEntry class and recls 100% .NET's IEntry interface. The differences, highlighted in bold, involve changes to both syntax and semantics, and result from lessons learned by users of recls 1.x.

Table 1: Mappings Between Old and New Entry class/interface Methods and Properties.

Drive changed from a character to a string so that there'd be less hassle when manipulating UNC-based paths: Now users can deal with a single property, rather than a drive letter character in one, and a (UNC) drive string in another. The spellings of UNCDrive and IsUNC changed to follow .NET idiom. The Size property changed from ulong to long to be CLS compatible (for example, to be able to be used from VB.NET and other .NET languages that don't support unsigned integral types). IsLink and ShortFile had to go by the wayside because of the need to be implemented 100% in terms of the CLR facilities (and not go to P/Invoke). The Attributes property was added to allow recls to stay relevant in light of evolution in the CLR of the file attributes that may be made available to managed programmers.

There are also some semantic changes. The form of the file extension has changed, and now includes the dot, so "abc.net" will have an extension of ".net", rather than "net" as was the case with recls 1.x. Since this is a breaking change, I've removed the previous name, FileExt, and given it a new name FileExtension. (This also fits better with the .NET way of doing things, which is to avoid unnecessary contractions in names.)

It's useful to be able to paste the extension to another file name without having to pollute client code with logic to determine whether or not to insert the dot. Now, all of the following combinations will reproduce the full path (and, to be useful, may be used in combination with other strings to build correctly-formed new paths):

  • DirectoryPath + File
  • DirectoryPath + FileName + FileExtension
  • Drive + Directory + File
  • Drive + Directory + FileName + FileExtension


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video