Channels ▼
RSS

C/C++

Recursive Directory Search in C#


Special Search Functions

As UNIX programmers will know, the stat() system call provides status information about a given path, in the form of the struct stat type. The recls core C API provides the function Recls_Stat(), which provides status information about a given path, in the form of the recls_info_t type (a multi-attribute type analogous to IEntryj). Several recls mappings provide a stat()/Stat() method that returns a file entry object, or null/nil if no such entry exists. I have found this a handy tool over the years, particularly when working in Python and Ruby, and I wanted to continue to offer it for .NET users, as FileSearcher.Stat(). This method either returns null if the file does not exist, or an instance implementing IEntry representing the filesystem entry if it can be accessed, or throws an exception if it cannot. (In other words, System.IO.FileNotFoundException and System.IO.DirectoryNotFoundException are caught, and null returned.)

The other function set, FileSearcher.CalculateDirectorySize(), does exactly what it says on the tin: it calculates the size of a directory, as the sum of the sizes of all files in that directory or in any of its sub-directories (up to a given depth). Since this is an expensive operation, I chose not to have directory size automatically calculated during a code>Search()-based enumeration. But it's a useful thing to have available, as in the following example, which displays the sizes of all immediate subdirectories of the current directory:


foreach(IEntry entry in FileSearcher.Search(
  null, null, SearchOptions.Directories,
  0 // Don't recurse
))
{
  Console.WriteLine("{0} : {1}", entry.Path
        , FileSearcher.CalculateDirectorySize(entry.Path));
}

Listing 8: Example using CalculateDirectorySize().

Path Utility Functions

As well as the FileSearcher methods, recls 100% .NET provides a number of additional utility functions via the static class PathUtil (see Listing 9).


public static class PathUtil
{
  public static string DeriveRelativePath(string origin, string target);
  public static string CanonicalizePath(string path);
  public static string GetAbsolutePath(string path);
  public static string GetDirectoryPath(string path);
  public static string GetFile(string path);
  public static string GetDrive(string path);
}

Listing 9: PathUtil class interface.

Each of these represents some functionality essential to the proper workings of Recls's searching that is not available in, or corrects defective alternatives in, the CLR's path manipulation facilities:

  • DeriveRelativePath(), CanonicalizePath(), and GetDrive() do not have CLR equivalents
  • GetAbsolutePath() corrects drive-only UNC paths, i.e. "\\server\share" to append a slash, in the same way that System.IO.Path.GetFullPath() does for drive-only volume paths, such as "C:"
  • PathUtil.GetDirectoryPath() yields the directory path -- a recls notion of encapsulating drive (for operating systems that have the concept of a drive) + directory -- and corrects the (in my opinion) defective behaviour of System.IO.Path.GetDirectoryName(), which returns the empty string when given a root path such as "C:\" or "\\server\share\"
  • PathUtil.GetFile() yields the file component - file name + extension - of a path and works correctly with UNC paths such as "\\server\share" (for which System.IO.Path.GetFileName() returns "share"!)

Extension Methods

With C# 3 comes the ability to enhance the (apparent) operations available on existing types by the use of Extension Methods [8, 9]. I've taken advantage of this for recls 100% .NET by adding the ForEach, Select, and Where methods, as shown in Listing 10. We'll see an example of how these are used (with LINQ [8, 9]) shortly.


public static class SearchExtensions
{
  public static void ForEach(
    this IEnumerable<IEntry> sequence
  , Action<IEntry> action
  )
  {
    foreach(IEntry entry in sequence)
    {
      action(entry);
    }
  }
  public static IEnumerable<TTarget> Select<TTarget>(
    this IEnumerable<IEntry> sequence
  , Func<IEntry, TTarget>    function
  )
  {
    foreach(IEntry entry in sequence)
    {
      yield return function(entry);
    }
  }
  public static IEnumerable<IEntry> Where(
    this IEnumerable<IEntry> sequence
  , Func<IEntry, bool>       predicate
  )
  {
    foreach(IEntry entry in sequence)
    {
      if(predicate(entry))
      {
        yield return entry;
      }
    }
  }
}

Listing 10: Search Extensions.

In C++ terms, this is akin to a partial template specialization, because the extension methods are defined only for IEnumerable<IEntry>.

Predicates or Functions?

There was one interesting twist here, with implementing Where. Since it requires a predicate -- a decision function that returns a Boolean value -- I defined it in terms of System.Predicate, which is a delegate defined as follows:


namespace System
{
  public delegate bool Predicate<T>(T arg);
}

That works fine with IEnumerable<IEntry>, as in Listing 11.


namespace WhereDemo
{
  using Recls;
  using System;
  class WhereDemo
  {
    public static void WhereDemo()
    {
      // with lambda expression
      foreach(IEntry entry in FileSearcher.Search(null, null)
        .Where((e) => e.IsReadOnly))
      {
        Console.WriteLine(entry);
      }
      // with anonymous delegate
      foreach(IEntry entry in FileSearcher.Search(null, null)
        .Where(delegate(IEntry e) { return e.IsReadOnly; }))
      {
        Console.WriteLine(entry);
      }
    }
  }
}

Listing 11: Use of Extension Methods with Predicate(s).

However, if we add in a "using System.Linq;" statement to the WhereDemo namespace, we get a compile error (with some namespace qualifications removed for clarity):


error CS0121: The call is ambiguous between the following methods or properties: 'System.Linq.Enumerable.Where<Recls.IEntry>(IEnumerable<IEntry>, System.Func<IEntry,bool>)' and 'Recls.SearchExtensions.Where(IEnumerable<IEntry>, System.Predicate<IEntry>)'

What appears to be happening here is that the compiler resolves the lambda expression (e) => e.IsReadOnly) (or the equivalent anonymous delegate expression, also shown) to System.Func<IEntry, bool>, rather than System.Predicate<IEntry>.


namespace System
{
  public delegate TResult Func<T, TResult>(T arg);
}

Consequently, the two possible Where (extension) functions each have one precisely matching argument and one possibly matching argument, hence the ambiguity. This is why I had to implement the recls Where extension in terms of System.Func<IEntry, bool>, giving two precisely matching arguments, and removing the ambiguity. Obviously, if the C# team ever decide to change the compiler to interpret one-parameter Boolean-returning anonymous delegates / lambda expressions as System.Predicate<>, any such "partial specialisations" will be broken, so I'm guessing that'll never happen, and we just need to get used to using System.Func<T, bool>, even though a predicate makes more sense.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video