Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

The C5 Generic Collection Library


Unusual Functionality

In addition to what you'd expect from a good collection library, C5 offers some unusual functionality:

  • Slidable sublist views that reconcile array lists (fast access by index) with linked lists (fast insertion and deletion of single items), without exposing implementation details such as linked-list nodes. Sublist views make for good orthogonality because all operations that can be applied to lists can also be applied to sublist views. For instance, you need to provide only one Reverse() method, not both Reverse(int start) and Reverse(int start, int length)—the latter two can be performed as Reverse() on suitable sublist views. The same conceptual economy applies to numerous other operations such as several kinds of search, backwards search, tests, shuffling, and so on.
  • Listenable collection change events allow one to efficiently track changes to collections by attaching event handlers. This makes it easy to build composite collections, such as multidictionaries, using the given components.
  • Fast snapshots of tree-based collections let you have many almost-identical versions of a set or bag. This supports time- and space-efficient implementation of geometric algorithms.
  • Handles can be associated with priority queue items to permit fast access, update, or removal of items. In practice, a priority queue is pretty useless without this functionality, yet many collection library implementations do not provide it.

Using C5

To illustrate the use of C5, consider a simple index of a file—a sorted list of all words that appear in the file, and for each such word, a sorted list of the lines in which it appears. We use a dictionary to map each word to a set of line numbers; this avoids duplicate line numbers for each word. Using a tree dictionary makes sure the words come out sorted when the dictionary is printed, and similarly for the line numbers. For each word, the line numbers are represented by a TreeSet<int>, and hence the entire file index is represented by a TreeDictionary<String, TreeSet<int>>.

Let filename be the name of the file. Listing One(a) then builds the index in the tree-dictionary index. The foreach loop body is not optimal: It makes at least two searches in index for each string s; namely, one call to Contains and one or two calls to the indexer index[s]. Since this is a common pattern, dictionaries in the C5 library provide a method bool Find(K k, out V v) that combines the Contains(k) test with the lookup of the associated value v. Using Find, the number of searches is reduced to one or two; see Listing One(b).

(a)
IDictionary<String, TreeSet<int>> index
   = new TreeDictionary<String, TreeSet<int>>();
Regex delim = new Regex("[^a-zA-Z0-9]+");
using (TextReader rd = new StreamReader(filename)) {
   int lineno = 0;
   for (String line = rd.ReadLine(); line != null; line = rd.ReadLine()) {
      String[] res = delim.Split(line);
      lineno++;
      foreach (String s in res)
         if (s != "") {
            if (!index.Contains(s))
               index[s] = new TreeSet<int>();
            index[s].Add(lineno);
         }
   }
}

(b) 
foreach (String s in res)
   if (s != "") {
      TreeSet<int> theSet;
      if (!index.Find(s, out theSet))
         index[s] = theSet = new TreeSet<int>();
      theSet.Add(lineno);
   }

(c)
foreach (String word in index.Keys) {
   Console.Write("{0}: ", word);
   foreach (int ln in index[word])
      Console.Write("{0} ", ln);
   Console.WriteLine();
}

(d)
 ...
characterized: 150
characters: 229 230 319 321
checks: 78
circular: 348
class: 149 320 402 492 516
classes: 7 105 110 117 152 226 232 393
classic: 331
 ...
Listing One

The number of searches can be further reduced to exactly one for each string s by using the C5 method FindOrAdd, at the expense of creating a single empty TreeSet<int> in addition to those needed for all the words in the index. Regardless of how the index is built, it can be printed in alphabetical order using standard C# idioms; see Listing One(c). Listing One(d) is a fragment of the output.

A similar approach can be used to find all anagrams, such as "generate" and "teenager," in a text. From a word, you can compute the corresponding bag of characters, such as {a, e, e, e, g, n, r, t g}; this represents the anagram class of the word. You can create a dictionary mapping bags of characters to sets of words, and populate the dictionary in much the same way as the previous file index.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.