Channels ▼

Matthew Wilson

Dr. Dobb's Bloggers

Recursive search examples, pt2: C

November 23, 2009

In my previous post,I started looking at a simple recursive-search implementation in C#,using recls 100%.NET.I'd next planned to look at the equivalent C++. However,because the recls (C/C++) library is still in-flux - between versions 1.8 and 1.9 - presenting a C++ example right now will require either using the 1.8 C++ API, which is a little tired, or using the 1.9 C++ API, which is not yet available in the public-domain. So, my current plan is to present an example using recls' C API, while I get the 1.9 released; hopefully in a week or two I'll be able complete the story with the C++ implementation.

Here's the C program code:

 <br />#include <recls/recls.h>  <br />#include <stdio.h>  <br />#include <stdlib.h>  <br />  <br />struct search_results  <br />{  <br />  unsigned          numFiles;  <br />  unsigned <strong>__int64</strong>  totalSize;  <br />};  <br />  <br />int RECLS_CALLCONV_DEFAULT search_func(  <br />    recls_info_t                entry  <br />,   recls_process_fn_param_t    param  <br />)  <br />{  <br />  struct search_results* results = (struct search_results*)param;  <br />  if(0 == <strong>_strcmpi</strong>("AssemblyInfo.cs", entry->fileName.begin))  <br />  {  <br />    struct recls_strptrs_t const* partsEnd = entry->directoryParts.end;  <br />    --partsEnd; /* Move to the last part */  <br />    if(0 == <strong>_strnicmp</strong>("Properties\\", partsEnd->begin, (size_t)(partsEnd->end - partsEnd->begin)))  <br />    {  <br />      return 1;  <br />    }  <br />  }  <br />  ++results->numFiles;  <br />#if RECLS_VER < 0x01090100  <br />  results->totalSize += entry->size.QuadPart;  <br />#else /* ? RECLS_VER */  <br />  results->totalSize += entry->size;  <br />#endif /* RECLS_VER */  <br />  return 1;  <br />}  <br />  <br />int main()  <br />{  <br />  struct search_results results = { 0, 0 };  <br />  recls_rc_t rc = <strong>Recls_SearchProcess</strong>(NULL, "*.cs", RECLS_F_FILES | RECLS_F_RECURSIVE | RECLS_F_DIRECTORY_PARTS, search_func, &results);  <br />  if(RECLS_FAILED(rc))  <br />  {  <br />    fprintf(stderr, "search failed: %s\n", <strong>Recls_GetSearchCodeString</strong>(rc));  <br />    return EXIT_FAILURE;  <br />  }  <br />  else  <br />  {  <br />    printf("%u file(s); <strong>%I64u</strong> bytes\n", results.numFiles, results.totalSize);  <br />    return EXIT_SUCCESS;  <br />  }  <br />} 

For a C program, this is pretty succinct. I think the main reasons for this are the simplicity of the program algorithm (in particular that we don't need to store results in containers) and the fact that recls (C/C++) encapsulates a lot of functional complexity in a relatively straightforward API. There's also a secondary reason, which is that I've relied upon compiler-specific constructs - the __int64 type, the _strcmpi and _strnicmp() functions, and the %I64u 64-bit integer format specifier - that are available only to Visual C++ (and compatible) compilers.

Notwithstanding the relative succinctness, the code is not terribly transparent, given the need to understand how the directory parts structure is represented. There's also the use of the pre-processor to enable the code to compile with both recls 1.8 and recls 1.9 (which will be released in a couple of weeks).

Once again, the development time was pretty short, around 15-20 minutes. It was, necessarily, longer than the .NET version because the code's a little more involved and (of significance only in such simple cases) because specifying compile and link dependencies is a little more involved with C/C++ than it is with .NET, which requires a single "Add Reference" action.

I don't expect that you'll be any more surprised than I was that this version operates with greater speed than the .NET version. What may be surprising, however - it was to me! - was the degree of superiority, as shown in the following table (where each time is the average over five runs, discarding the largest to avoid caching/system effect).

Language Average Elapsed Time (ms) Average Kernel Time (ms) Average User Time (ms) Observed Handles Observed "I/O Other"s
C#/.NET 20648 12375 8078 ~80 ~550,000
C 8093 3964 3906 ~16 ~210,000
C++ (to be determined soon)

Clearly, the times indicate that the .NET recls library is doing a lot more in the kernel than the C/C++ one. Since I wrote both libraries, and cannot clearly see where one has much more kernel activity than the other, my guess is that the way the CLR's directory enumeration functions elicit file/directory information en bloc is contributing to the extra work. Being suspicious of that presumption, I ran again, this time recording the number of maximum number of handles the processes used, and the maximum number of "I/O Other" actions (which incorporates kernel file-system search requests).

I must stress that these values are human-observed (via Task Manager). For both programs, the handles level hit the given mark almost immediately, and then hovered around that level +-2. In regards to the problem area it's probably not meaningful, since I am confident that the .NET runtime is opening a bunch more kernel objects that would be necessary for more sophisticated programs, even if they may be superfluous in this particular case. However, the "I/O Other" counter cannot be so readily dismissed. Clearly, the suspicion of extra kernel activity is borne out. Absent deeper investigation I'm loathe to draw any conclusions, beyond that recls (C/C++) works faster than recls 100% .NET. But, putting my natural interest in performance to one side, are performance differences in the range of 100-300% relevant in file-system search? I'd suggest that, in most cases, they are not. Certainly, now that I have recls 100% .NET in my toolkit, I will be writing certain system tools in .NET, rather than C/C++, when portability is not required.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video