Channels ▼
RSS

C/C++

Recursive Directory Search in C#


Specifying Search Criteria

The change from instance to class methods has probably the greatest impact on usage, so let's look at a pair of examples illustrating the old and the new. To search for all the font files in the Windows directory and its subdirectories with the old recls 1.x .NET mapping we'd write:


using recls;
foreach(FileEntry entry in new FileSearch(@"C:\Windows", "*.fon|*.ttf", RECLS_FLAG.RECURSIVE))
{
  Console.WriteLine(entry);
}

With recls 100% .NET, you would write:


using Recls;
foreach(IEntry in FileSearcher.Search(@"C:\Windows", "*.fon|*.ttf"))
{
  Console.WriteLine(entry);
}

With such a simple example the differences are not huge, which is a good thing. Nonetheless, two of the most significant changes are illustrated:

  • Recursive search is now the default, so the RECLS_FLAG.RECURSIVE flag is not needed (or available)
  • Search sequences are now obtained via static methods on a static class

A search is conducted from a specified directory, in which all entries (files or directories) that match the given pattern(s) and correspond to the given search options, up to a given depth, are retrieved. A directory may be absolute or relative, but must exist. If null (or the empty string) is specified, the current directory is assumed. A pattern may be a file (or directory) name, or may use wildcards, as in "*.ttf". Furthermore, the library supports multi-part patterns, allowing discovery of entries matching different wildcards within the same string, as in "*.fon|*.ttf". A null argument for the patterns parameter is interpreted to mean the "everything" pattern for the given platform (i.e. "*" on UNIX, and "*.*" on Windows).

The search options can select Files, Directories, or both; absence of both is interpreted as Files). Other options allow for tailoring the search policy, as follows:

  • Searching of Hidden and/or System files, both of which are not normally listed
  • Ignoring access-denied conditions, which would otherwise cause the search to be terminated, by specifying IgnoreInaccessibleNodes; this option is ignored if an exception handler is specified. This is often necessary when specifying Hidden and/or System.
  • Marking of the Path, SearchRelativePath and File properties of entries that are directories (via MarkDirectories) with a trailing slash
  • Preventing the automatic translation of per-platform pattern separators -- ':' for UNIX; ':' for Windows -- into the platform-independent pattern separator '|' (via DoNotTranslatePathSeparators)

Exceptions that interrupt the processing may be filtered by specifying an exception handler (see Listing 4).


enum ExceptionHandlerResult
{
  PropagateException = 0,
  ConsumeExceptionAndContinue
}
interface IExceptionHandler
{
  ExceptionHandlerResult OnException(string path, Exception x);
}

Listing 4: Exception Handler Interface.

Returning PropagateException causes the exception to be rethrown, causing the search to be cancelled and the caller to receive the exception. Returning ConsumeExceptionAndContinue consumes the exception (perhaps after logging the condition) and continues the search, skipping the offending directory. Naturally, the purpose of this callback is not to allow users to attempt to suppress unrecoverable conditions, and the library does not invoke the callback in some such cases. Unfortunately, because the .NET exception hierarchy is such an abject mess, discriminating between logical errors, practically unrecoverable conditions, and recoverable runtime conditions is not a simple task, and it is likely that the set of exceptions made suppressible in this regard will change in future implementations. Users are expected to consume only specific expected exceptions; for instance, System.IO.DirectoryNotFoundException, rather than doing anything as unwise as consuming System.Exception.

Finally, processing a large directory tree with highly-specific pattern(s) can lead to a user experience with discernible pauses, due to filesystem latencies. Consequently, a progress callback mechanism is also provided, in the form of the IProgressHandler interface (see Listing 5), which allows callers to be notified as each new (sub-)directory is searched, perhaps to log the directory traversal changes to console, status bar, etc. It also affords the opportunity to apply search policy on a location basis, via return of a control code from the ExceptionHandlerResult enumeration: CancelDirectory causes the given directory and all its sub-directories to be excluded from the search; CancelSearch causes exclusion of all remaining directories, even those at a higher level in the tree.


enum ProgressHandlerResult
{
  Continue = 0,
  CancelDirectory,
  CancelSearch
}
interface IProgressHandler
{
  ProgressHandlerResult OnProgress(string directory, int depth);
}

Listing 5: Progress Handler Interface.

Given this richness in search specification -- directory, patterns, depth, options, exception-handler, progress-handler -- there is clearly a conflict between flexibility and discoverability [4, 5] in the possible overloads for the file search functions. Ignoring parameter ordering and ambiguities caused by some parameters (directory and patterns) sharing the same type, there are 64 possible combinations of the six parameters. If we add in parameter ordering, it becomes 121. To get a handle on the problem consider the case for just three parameters, directory (dir), patterns (ptns) and depth, with and without parameter ordering considerations. (In both lists, type ambiguities are marked with a <-X->.)

Without considering parameter ordering, we have eight combinations, of which six are viable:


()
(dir)
(ptns) <-X-> (dir)
(depth)
(dir, ptns)
(dir, depth)
(ptns, depth) <-X-> (dir, depth)
(dir, ptns, depth)

If we add in parameter ordering, we get 16, of which nine are viable:


()
(dir)
(ptns) <-X->(dir)
(depth)
(dir, ptns)
(ptns, dir) <-X->(dir, ptns)
(dir, depth)
(ptns, depth) <-X->(dir, depth)
(depth, dir)
(depth, ptns) <-X->(depth, dir)
(dir, ptns, depth)
(ptns, dir, depth) <-X->(dir, ptns, depth)
(dir, depth, ptns)
(ptns, depth, dir) <-X->(dir, depth, ptns)
(depth, dir, ptns)
(depth, ptns, dir) <-X->(depth, dir, ptns)

You can imagine the complexity when permuting all six parameters! Clearly we need to make some judicious cuts. Since none of the parameters obviate the need for any of the others, the obvious must-have overload is one in which all six are present. The order is somewhat moot, but I'd suggest it should be either of the following:


(dir, ptns, options, depth, progressHandler, exceptionHandler)
(dir, ptns, options, progressHandler, exceptionHandler, depth)

Let's leave that decision for the moment while we consider the other options.

Another obvious decision is that we can throw out all permutations that don't have both directory and patterns parameters (and in that order). The utility of being able to specify only directory (to search for all files) or only patterns (to search in the current directory), rather simply specifying null in the stead of the omitted argument is vanishingly small. Not to mention the detraction from discoverability. So we can treat them as a mandatory unit. Furthermore, I think we can also stipulate that they'll always come first in the parameter list.

A further simplification that I felt was justified was that, as "advanced" options, we could treat the two handler arguments as a pair. The cost is a slight extra effort in specifying null for whichever is not needed, at the gain of reducing the overload set.

Given that it can make sense to specify depth independently of options, and them both independently of progress+exception, we can now cut the list down to eight (which ignores parameter ordering for the moment):


(dir, ptns)
(dir, ptns, options)
(dir, ptns, depth)
(dir, ptns, progressHandler, exceptionHandler)
(dir, ptns, options, depth)
(dir, ptns, options, progressHandler, exceptionHandler)
(dir, ptns, depth, progressHandler, exceptionHandler)
(dir, ptns, options, depth, progressHandler, exceptionHandler)

Another concern is that, so far, we've discussed the handlers in terms of the interfaces IExceptionHandler and IProgressHandler. But C# allows a different callback construct, the delegate, which is particularly useful with the advent of C# 2 and (even more so) with C# 3. recls 100% .NET defines two delegates, for handling exceptions and progress (see Listing 6).


public delegate ExceptionHandlerResult OnException(string path, Exception x);
public delegate ProgressHandlerResult OnProgress(string directory, int depth);

Listing 6: Handler Delegates.

So, even if we thought eight was a survivable number of overloads (which is in doubt), providing for the delegate forms (which are highly convenient, as we'll see later on) would push this out to a minimum of 12. Unequivocally, this is too much choice, one of the enemies of discoverability.

Consequently, some hard decisions had to be made, and I made the necessary (and somewhat arbitrary) decisions to give the following overload set (where {D} designates a delegate form, as opposed to an interface form):


(dir, ptns)
(dir, ptns, options)
(dir, ptns, depth)
(dir, ptns, options, depth)
(dir, ptns, options, depth, progressHandler, exceptionHandler)
(dir, ptns, options, depth, progressHandler{D}, exceptionHandler{D})

Although six may still feel like a lot, the fact that C# discriminates between int and enumeration types makes it pretty easy to live with it without ambiguity. We couldn't do the same in C++.

There's one final refinement to the overloading. Even though a search involving progress and/or exception handler may not usually require depth, I chose to keep the parameter ordering consistent (i.e. depth follows options) as this is an established principle of interface design [4, 6, 7]. It also fits in better with Visual Studio's Intellisense: when scrolling through the list of options the additional parameters appear naturally at the end of the list, rather than jumping around confusingly. Because of these reasons, I decided to remove the third overload -- (directory, patterns, depth) -- giving a final five. You may demur, but I think these five overloads represent an appropriate balance between flexibility and discoverability.

The class interface for FileSearcher is shown in Listing 7.


public static class FileSearcher
{
// Properties
  public static int UnrestricedDepth { get; }
  public static string WildcardsAll { get; }

// Search Operations
  public static IEnumerable<IEntry> Search(
    string directory
  , string patterns
  );
  public static IEnumerable<IEntry> Search(
    string directory
  , string patterns
  , SearchOptions options
  );
  public static IEnumerable<IEntry> Search(
    string directory
  , string patterns
  , SearchOptions options
  , int depth
  );
  public static IEnumerable<IEntry> Search(
    string directory
  , string patterns
  , SearchOptions options
  , int depth
  , IProgressHandler progressHandler
  , IExceptionHandler exceptionHandler
  );
  public static IEnumerable<IEntry> Search(
    string directory
  , string patterns
  , SearchOptions options
  , int depth
  , OnProgress progressHandler
  , OnException exceptionHandler
  );
  public static class BreadthFirst
  {
    . . . // Search() x 5 same overloads
  }
  public static class DepthFirst
  {
    . . . // Search() x 5 same overloads
  }
// Utility Operations
  public static IEntry Stat(string path);
  public static long CalculateDirectorySize(string directory);
  public static long CalculateDirectorySize(string directory, int depth);
}

Listing 7: FileSearcher class interface.

The first thing to note is that there are 15 search functions, in three groups of five, representing depth-first, breadth-first, and mechanism-agnostic search; each returns an enumerable type implementing the IEnumerable<IEntry> interface. It would certainly have been possible to include BreadthFirst and DepthFirst flags in the SearchOptions enumeration, but it's unlikely that a choice between depth-first and breadth-first search is one you will need to make at runtime, and expressing design-time choices at runtime is best avoided because it detracts from discoverability.

Rather than define 15 methods (5 x Search(), 5 x BreadthFirstSearch(), 5 x DepthFirstSearch()) in the same class, they are segregated them into three groups of 5, with the algorithm-specific overloads associated with the nested (static) classes BreadthFirst and DepthFirst. Obviously this transgresses the accepted wisdom that class names be nouns, but in this case it's acceptable because it engenders transparency of client code in the form of human-readable statements, as in:


foreach(IEntry in FileSearcher.BreadthFirst.Search(@"C:\Windows", "*.fon|*.ttf"))
{
  Console.WriteLine(entry);
}

This is a technique that will be familiar to many .NET programmers. The provision of (static) Write()/WriteLine() methods of the Console class offers a syntactic convenience over calling them via its Out (static) property. Similarly, the notionally algorithm-agnostic FileSearcher.Search() methods simply call corresponding FileSearcher.DepthFirst.Search() methods. (This is consistent with recls 1.x, where depth-first was the only algorithm.)

We've now discussed the nuances of all parameters, with the exception of depth. There are two special values for depth: FileSearcher.UnrestrictedDepth and 0. The former places no restrictions on depth. The latter causes the search to be non-recursive, i.e. it searches only in the specified directory.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video