Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Writing Filters in an Object-Oriented Language


DEC89: WRITING FILTERS IN AN OBJECT-ORIENTED LANGUAGE

WRITING FILTERS IN AN OBJECT-ORIENTED LANGUAGE

Object-oriented programming with Actor makes writing filter-type programs easy

Marty Franz

Marty Franz is a software specialist for Allen Testproducts in Kalamazoo Michigan. He also is a free-lance author whose articles have appeared in numerous computer magazines. Marty is the author of Object-Oriented Programming featuring Actor published by Scott Foresman & Co., from which this article is excerpted.


A filter is a program that copies one file into another file while processing it in some way. Filters are a common type of program. For example, you can think of a C language compiler as a filter, because it reads an input file (in this case, a C source program), processes it (translates it), and creates an output file (the object program). A listing generator is another example of a filter: it reads an input file, processes it by paginating it, and outputs it to a printer. Because filters are a common type of program, they have a common structure. This article discusses how the facilities in an object-oriented language, such as Actor, makes writing filter programs easy, using several common classes and methods.

Object-Oriented Programming

Object-oriented programming (OOP) treats a program as a set of objects, with the code and data that are needed to make an object exhibit appropriate behavior fused together in the program. By dividing a program into objects that embody both data and operations, the structure of the program more closely represents the structure of the problem being solved. As a result, object-oriented programs are more easily written, read, and maintained than programs in a procedural language.

The creation of abstract data types is an important task in object-oriented programming. This means the actual implementation of an object is encapsulated by high-level operations. Objects therefore have a clear separation into a public and private protocol. For example, a queue object that defines a public protocol based on the operations of adding data to or removing data from the queue. The queue may be written as an array with separate variables (called "instance" variables) that maintain the first and last positions in the queue, but how this is done is private to the queue object. By adhering to the public protocol elsewhere in the program, you could later change the implementation of the queue (to a threaded list, for example) and not change the rest of the program that used the queue.

Programming in an object-oriented language involves creating objects and then sending them messages to do things. Messages in Actor resemble function calls in other programming languages, but they are not. The first parameter in the message is actually the receiver object of the message. The receiver object determines how to respond to the message. Different objects can respond in different ways to the same message name, a property called "polymorphism." Polymorphism allows many objects within a program to respond to the same common protocol, which makes the program easier to understand. It also allows you to write more general, reusable code, because you don't have to worry about the types of objects you're dealing with as long as they follow the same public protocol.

Within an object, methods are defined that detemine how the object will respond to a given message. Methods in Actor resemble functions or procedures in C or Pascal: They have local variables and use control structures such as do-while and if-then-else. Blocks can be defined along with methods. You can think of these as methods without names that can be treated as separate objects and passed to objects as parts of messages.

In most object-oriented languages, objects are organized hierarchically into classes. Classes descend from one another in a kind of family tree, with the most general classes being at the top. All classes have a common ancestor called the "grandfather class." In Actor, the grandfather class is the Object class. The Object class specifies safe default behavior for all objects, such as what to do when an unknown message is received (in this case, print an error message). As subclasses descend from classes, they inherit the functions of their parent class. This allows the programmer to use existing behavior in his new objects, specifying only what's different for the new application. This "programming by difference" means that less duplicate code needs to be written.

Actor provides a rich set of classes as part of the base language, including Boolean values (which are similar to Booleans in Pascal), files, Strings (similar to strings in Basic but with a 16K character limit), and Ordered and Sorted Collections for data structures such as arrays and queues. Actor also provides a set of classes for conveniently creating and handling Microsoft Windows objects, such as windows, dialog boxes, and scroll bars. I'll use the text-based objects in Actor only to demonstrate how inheritance and classes can make writing common utility programs much easier.

Plan of Attack

My goal in this article is to develop two filter programs using Actor classes and methods. The first program is an object-oriented version of the Unix utility grep, which searches for and prints all occurrences of a regular expression in a file. (The word "Grep" is a twisted acronym for "globally search for regular expression.") I'll then extend Grep in order to replace the pattern with a new string. This second program is called Reviser. By using Actor's inheritance wisely, the Reviser program will be extremely easy to write once Grep is working.

By using a straightforward loop in a procedural language, you could simply create a Grep function and a Reviser function and write both programs. This is a direct, but not very compact solution. A close look at this problem reveals the many similarities between the two programs. The most obvious is that the main function for both of them is the same, as illustrated in Figure 1.

Figure 1: Grep and Reviser have similar main functions

  initialize regular expression
  open input file
  open output file
  while not at end of input file
              read line from input
              process it
  end while
  close input file
  close output file

For Grep, the output file is the display where you want to see the lines of text that match the regular expression. For the Reviser program, the output file is another file that holds the revised text. The Reviser differs from Grep in its output file, and in the processing that's done to each line. Grep searches the line for the expression and copies it to the output file if it matches, while Reviser goes a step further and changes it before copying it to the output file.

As I said earlier, both Grep and Reviser are examples of a filter, a program that reads an input file and filters it by some sort of processing into an output file. The input file can be filtered a character or a line at a time. For the purpose of this article, a line at a time is sufficient.

Because this is such a common design for a program, it's worthwhile to first develop a Filter class of objects that can be used to write the Grep and Reviser programs with. Later on, you can add more types of filters using this class and save a great deal of code.

Further, because the Reviser is so similar to Grep, you should be able to inherit most of its functions from Grep, add the part that uses change(-), and send the output to a file instead of the display. This will save even more code.

This approach will save the time and effort of developing both the Grep and Reviser from scratch. The first step is to develop the Filter class, and then develop the Grep so it uses a Filter. Most of the Grep can then be inherited in order to write the Reviser.

The RegularExpression Class

Before writing the Filter class, however, let's take a closer look at the problem of searching for strings in the file. This is an important, common function in both Grep and Reviser. Searching for literal strings is easy, but also limiting at times. It would be useful to search not only for a literal string, such as Fred, but also for the four-letter words that begin with the letter "F." This is especially true when editing programs, because you might have used "If" in one place and "if" somewhere else. You want to be able to search for both by entering a single string expression.

A convenient notation for these types of string searches exists, and can be implemented here. Borrowed from the Unix world, these are called "regular expressions." You already use a regular expression when entering a wildcard file specification in MS-DOS, such as DEL*.*.

This regular expression format will be a bit more limited than that of MS-DOS, but still powerful, and will allow regular expression strings to contain normal alphanumeric characters, plus the special characters (sometimes called metacharacters) listed in Table 1.

Table 1: Some special characters

  Character       Meaning
  ---------------------------------------------

      ?           match any character
      %           match the start of the string
      $           match the end of the string

For example, the regular expression F??? matches any string that started with the letter F and was followed by three characters. The regular expression %F???$ matches any string that consists only of these characters, nothing else.

You can match sets of characters by using the metacharacters [] to enclose a set. For example, the string F[aeiou]?? will match any four-character string that starts with the letter F and is followed first by a vowel, and then by any two characters. If the first character in the set is tilde (~), then the remaining characters enclosed by [ ] are excluded from matching. For example, the string F [~ri]?? matches any strings that start with F but are not followed by r or i. Sets can include ranges of characters: For example, the set [A-Za-z] matches all the upper and lowercase alphabetic characters.

When searching for strings that contain metacharacters, a backslash (\) will specify one of the other metacharacters literally afterwards. Table 2 provides the complete list of metacharacters supported.

Table 2: Characters in regular expressions

  Character  Meaning
  -----------------------------------------------

      ?      match any character
      %      match the start of the string
      $      match the end of the string
      [      begin list
      ]      end list
      ~      exclude set of characters from match
      \      take next character literally

This is an ambitious project, but the rewards will be worth the effort as the use of this class can add efficient pattern matching to virtually any text-based program. The key, however, is to make the pattern allow for fast searching and compact representation.

Although it's tempting to use one of the predefined Ordered or Sorted Collection classes in the Actor language when writing this class, this is slow and wasteful of storage for large patterns and requires a lot of searching. The best approach is to encode and embed the characters shown in Table 2 as control characters in the pattern string. This allows the encoded patterns to be stored anywhere a String can be stored, and searched rapidly with character-by-character comparisons.

I call this class RegularExpression. It contains a String as an instance variable (the encoded pattern), and a Boolean that determines whether the case of an alphabetic character is significant. You must immediately write several methods that access the instance variables without resorting to dot notation: setPattern( ) allows you to send a previously encoded pattern to the RegularExpression as the new pattern to use, and setCaseMatch( ) allows you to set the Boolean instance variable that determines if upper and lowercase versions of the same character are to be treated as equal. So you can save an encoded pattern for later use, pattern( ) will return the pattern instance variable. This allows a single RegularExpression instance to handle multiple patterns.

The first hurdle is encoding the pattern. This is done by makePattern( ). It loops through the String that is the regular expression, building another String that is the encoded pattern. When it finds a metacharacter in the source, it places a control character into the pattern that will be used later for comparisons. These control characters are kept in a header file called REGULARE.H (see Listing One, page 112) and can be changed later if necessary.

Building the pattern is a big job. It requires another message, fillSet( ), which takes a set enclosed in [] and encodes it. Use an INCLUDE_SET or OMIT_SET character for the set delimiter (depending on whether the characters that follow match), followed by a character that is the number of characters in the set, and then by the characters themselves. If a - (hyphen) was used to indicate a range of characters, such as A-Z, this set is expanded and copied into the pattern string. This is done when the pattern is built, and it won't be repeated when matching is performed because this would slow down the program considerably. A sample encoded pattern is shown in Figure 2.

Though this seems like a lot of trouble, the result is that you can compare a String against a pattern in a running program by characters only, which are efficiently handled. There is, however, a limit of 255 characters in a set enclosed by [], because that's the largest count you can place in a single character.

The comparison of a RegularExpression against a String is handled by match( ). It calls aMatch( ), which matches the encoded pattern against every possible substring in the source string. If case matching is enabled (case- Match is non-nil), then match( ) converts the pattern and source to uppercase before calling aMatch. The method aMatch( ) in turn calls oneMatch( ), which matches a single character in the pattern against a single character in the string. When a comparison fails, oneMatch( ) fails, and aMatch goes on to the next string.

Because of the way the pattern is encoded, a control character always determines what type of comparison to perform. Even a single character has an A_CHAR control character preceding it, which directs oneMatch( ) to compare it against a single character in the source. As a result of this property, oneMatch( ) can use a select/case statement to handle each type of control character. Two instance variables, arrow and cursor, hold the indexes of the source string and the pattern during the matching process.

In Actor, constants such as A_CHAR can be placed in separate files for easier maintenance, just as in C. For the RegularExpression class, the header file is called REGULARE.H. The header file and class file for this class are in Listing Two (page 112).

You can test this code by first loading the header file and the class file, then typing:

  Fred:= new(RegularExpression, "F???");   match(Fred, "My name is Fred");   11

If the pattern was found in the source string, I could have simply returned true or false. Instead, match( ) returns the position in the source string where the match occurred, or nil if it didn't. Including this extra touch allows the source string to be edited. The other methods in RegularExpression make the change( ) method easy to write and allow a string matching the pattern to be replaced by another:

  change(Fred, "My name is Fred", "Sam");   My name is Sam

The search-and-replace function of a word processor or text editor becomes easy to write with a RegularExpression class.

The Filter Class

It's important to be able to create and compare regular expressions before moving on to the Filterclass. Looking at the structure of a filter program, (including Grep and Reviser) we can see it follows the general design shown in Figure 3. The phrases "initialize processing," "process line," and "terminate processing" will vary from program to program. Also notice that the input and output files from the previous iteration of the design have now become objects. Polymorphism allows an object to be used in place of an input or output file as long as it obeys the protocol of a File, adding even more generality to the idea of a Filter.

Figure 3: Structure of a typical filter program

  initialize processing
  open the input object
  open the output object
  while not at end of input object
              read a line from input object
              process line
  end while
  terminate processing
  close input object
  close output object

In order to change the parts of the Filter that will vary from program to program, blocks can be used to pass the class-independent processing parts to the Filter. This design requires three blocks, one for each of the processing phrases. The first is initBlock, because it receives control before the input and output files are opened. Its job is to initialize anything needed by the object using the Filter. The second block is processBlock. It receives the string read from the input object, plus the output object, and does whatever processing is needed by the Filter. The third is closeBlock, which is called before the input and output objects are closed. It performs any cleanup and termination the Filter needs.

The file for the Filter class is in Listing Three, page 114. The run( ) method actually does the Filter's processing, according to the design in Figure 3. It also calls checkError( ) at the proper times to ensure that the input and output objects are opened correctly, and that the end of the input object hasn't been reached.

Like the RegularExpression class before it, the Filterclass is not particularly useful by itself -- it needs a Grep class to take advantage of the Filter's generality.

The Grep and Reviser Classes

The Grep class will be a descendant of Object, because it must use behavior from three other classes: Files, RegularExpressions, and Filters. If Grep is made a descendant of one of these, the others will lend a lot of code and unwanted behavior. Make it a unique class and use all three of the other objects as instance variables.

The goal of the Grep class is to create a Filter and provide it with the initBlock, processBlock, and closeBlock needed to search a file for a regular expression, and to print the lines that contain it. Formulating the blocks is easy: The blocks will simply call the start( ), process( ), and finish( ) messages in the Grep class. Deciding what these three messages should do takes a bit more time.

The finish( ) method is the easiest. It simply prints a count of the lines that were found in the file. It doesn't even have to do that, but this is easily accomplished with an instance variable (called matches) that keeps a count of the lines that match.

The start( ) method opens the input file that was passed as an argument when the Grep object was initialized with init( ). It also prints a message that tells the user that the program has started, and lists the name of the file being searched.

The process( ) method takes the string read from the input file and compares it against a RegularExpression pattern held in an instance variable. If it matches, it prints the line with printLine( ) and increments the matches instance variable. If not, it doesn't do anything.

Finally, the init( ) method is responsible for setting up the Filter and RegularExpression objects and the blocks that the Filter will use. Two other instance variables are initialized by init( ): fileName, which holds the name of the file being searched, and pattern, the RegularExpression used as the search pattern.

Three other messages in this class might puzzle you. They are: create( ), close( ), and checkError. All return self. These are here because the Filter object treats the Grep object as the output file. The output isn't being written to a real file, but to a Grep object that checks the lines read from the input file against the regular expression. For this deception to work, the Grep class must support the File object's public protocol, even down to checkError( ). Because these messages don't have to do anything (the Grep has already been created, and it can't be closed), they can return self. This is a common technique in object-oriented programming, another use for fall-through methods and polymorphism: Making one object obey the protocol of another so they can be used interchangeably. The Grep class file is shown in Listing Four, page 114.

A new( ) method is defined in this class that will allow you to create and initialize a Grep with a single message, with which you specify the name of the file and the regular expression to search for as arguments:

  run(Grep,"README.TXT", "[Ww]hite-water");

The sample output from Grep is shown in Figure 4.

Reviser can now be written. Most of Reviser is already done, because it does the same thing as Grep except the process( ) method. In a Reviser, the line read from the input file must be searched using match( ), then changed if the pattern is found, and then written to the output file in either case. Therefore Reviser should be a descendant of Grep.

The output file needs instance variables for its name and object, and an instance variable for the text that's going to be substituted when the pattern is found. The other instances variables will be inherited from Grep.

The start( ) and finish( ) messages use the Grep versions of these messages and do their own unique processing. Early binding notation indicates that the Grep version of start should be called:

  ^start(self:Grep, inFile);

In the Grep class, the create( ), checkError( ), and close( ) methods didn't have to do anything, because the output file was really a Grep. In Reviser they have to function as File methods, and are passed on to the File class by the newFile instance variable.

With inheritance, the code for the Reviser is even shorter than for the Grep. The class file is shown in Listing Five, page 115. The Reviser inherits the run( ) message from Grep, so it can be used with a single message:

  run(Reviser, "README.TXT", "[Ff]red", "Sam");

Though building the RegularExpression and Filter classes took a while, Actor's polymorphism and inheritance can now be used to write tools such as Grep and Reviser easily. Both of these utilities take much longer to develop in procedural programming languages, because the functions of the Filter and Grep classes cannot be easily separated.

WRITING FILTERS IN AN OBJECT-ORIENTED LANGUAGE_ by Marty Franz

[LISTING ONE]

<a name="0257_000e">

/* header file for RegularExpression class */

#define ANY_CHAR  1  /* ^A */
#define A_CHAR 3 /* ^C */
#define BEGIN_STR 2 /* ^B */
#define END_STR 5 /* ^E */
#define INCLUDE_SET 19  /* ^S */
#define OMIT_SET 14 /* ^N */




<a name="0257_000f"><a name="0257_000f">
<a name="0257_0010">
[LISTING TWO]
<a name="0257_0010">

/* *************************************************
 *   REGULARE.CLS: RegularExpression class file    *
 ************************************************* */

/* Class used to hold regular expressions for string
matching. */!!

inherit(Object, #RegularExpression, #(
pattern /* a pattern to match */
caseMatch /* nil, case doesn't matter */
arrow /* used to scan strings */
cursor /* used to scan patterns */), 2, nil)!!

now(RegularExpressionClass)!!

/* Create a new RegularExpression from String s. */
Def  new(self, s | re)
{ re := init(new(self:Behavior));
  re.pattern := makePattern(re, s);
  ^re;
}!!

now(RegularExpression)!!

/* Change every occurrence of string s matching the pattern to string t.
  Does not allow recursion: searching occurs after the new string has been
  substituted. The target string t is not a pattern, just another string,
  inserted into the source string at the point of the match.  Returns the
  changed source string s */
Def  change(self, s, t | from, last, source)
{ source :=
  if caseMatch
  then asUpperCase(s);
  else s
  endif;
  from := 0;
  loop
  while from < size(source)
  begin last := aMatch(self, source, from);
    if last
    then s := replace(s, t, 0, size(t), from, last);
      from := from + size(t);
    else from := from + 1;
    endif;
  endLoop;
  ^s;
}!!

/* Set case matching.  Converts pattern to upper-case, too.  You should
  save the old pattern if you plan on toggling case matching a lot.*/
Def  setCaseMatch(self, c)
{ caseMatch := c;
  if caseMatch
  then pattern := asUpperCase(pattern);
  endif;
  ^self;
}!!

/* Match the String s against the pattern.  Calls aMatch() for each
  possible substring in the String. If caseMatch flag is set, then fold case
  to upper before doing the search. */
Def  match(self, s)
{ if caseMatch
  then s := asUpperCase(s);
  endif;
  do(size(s),
  {using(i)
    if aMatch(self, s, i)
    then ^i;
    endif;
  });
  ^false;
}!!

/* Compare string s against pattern starting at position from. Makes
  successive calls to oneMatch.  Returns index of matching character, or
  nil. Remember that oneMatch() advances the arrow in the source string. */
Def  aMatch(self, s, from | found)
{ arrow := from;
  cursor := 0;
  found := true;
  loop
  while found cand (cursor < size(pattern))
  begin
    if oneMatch(self, s)
    then cursor := cursor + patternSize(self, cursor);
    else ^false;
    endif;
  endLoop;
  ^arrow;
}!!

/* Match a single character passed as an argument to the set pointed to by
  the pattern cursor.  Returns nil if not in the set, otherwise non-nil. */
Def  locate(self, c | setSize, fromCurs, toCurs, found)
{ setSize := asInt(pattern[cursor+1]);
  fromCurs := cursor + 2;
  toCurs := fromCurs + setSize;
  found := false;
  do(over(fromCurs, toCurs),
  {using(i)
    if pattern[i] = c
    then found := true;
    endif;
  });
  ^found;
}!!

/* Match a single character in the target string against a single
  character in the pattern.  Returns nil or non-nil.  Also advances arrow
  scanning target string */
Def  oneMatch(self, s | next, c)
{ next := -1;
  if arrow < size(s)
  then c := asInt(pattern[cursor]);
    select
      case c = ANY_CHAR
      is next := 1;
      endCase;
      case c = BEGIN_STR
      is
        if arrow := 0
        then next := 0;
        endif;
      endCase;
      case c = A_CHAR
      is
        if s[arrow] = pattern[cursor+1]
        then next := 1;
        endif;
      endCase;
      case c = INCLUDE_SET
      is
        if locate(self, s[arrow])
        then next := 1;
        endif;
      endCase
      case c = OMIT_SET
      is
        if not(locate(self, s[arrow]))
        then next := 1;
        endif;
      endCase
    endSelect;
  else /* at end of string, check for $ */
    if asInt(pattern[cursor]) = END_STR
    then next := 0;
    endif;
  endif;
  if next >= 0
  then arrow := arrow + next;
  endif;
  ^(next >= 0);
}!!

/* Get the pattern String from a Regular Expression. */
Def  pattern(self)
{ ^pattern;
}!!

/* Set the pattern String in a RegularExpression. */
Def  setPattern(self, s)
{ pattern := s;
  ^self;
}!!

/* Initialize a RegularExpression. */
Def  init(self)
{ ^self;
}!!

/* Return the size of the Pattern from position p. */
Def  patternSize(self, p | c)
{ c := asInt(pattern[p]);
  select
    case c = A_CHAR
    is ^2;
    endCase;
    case c = ANY_CHAR cor c = BEGIN_STR cor c = END_STR
    is ^1;
    endCase;
    case c = 14 cor c = 19
    is ^asInt(pattern[p+1]+2);
    endCase;
    default ^1;
  endSelect;
}!!

/* Return a set of characters for inclusion in a Pattern. The first and
  last characters should be [ and ].  Returns INCLUDE_SET for set or OMIT_SET
  for exluded set, followed by a count of the characters in the set (just a
  single character), followed by the characters themselves. */
Def  fillSet(self, s | work, i, count, from, to)
{ i := 1;
  count := 0;
  if s[i] = '~'
  then work := asString(asChar(OMIT_SET));
    i := i + 1;
  else work := asString(asChar(INCLUDE_SET));
  endif;
  work := work+asString(asChar(1)); /* holds count later */
  loop
  while i < size(s) and s[i] <> ']'
  begin
    select
      case s[i] = '\'
      is work := work + asString(s[i+1]);
        i := i + 2;
        count := count + 1;
      endCase;
      case s[i] = '-'
      is from := asInt(s[i-1])+1;
        to := asInt(s[i+1])+1;
        do(over(from, to),
        {using(c) work:=work+asString(asChar(c));
        });
        i := i + 2;
      endCase;
      default work := work + asString(s[i]);
      i := i + 1;
      count := count + 1;
    endSelect;
  endLoop;
  work[1] := asChar(count);
  ^work;
}!!

/* Convert a normal String into a pattern String.  Needs error checking,
  and to use more OOP technique. Note that it does not set the Regular-
  Expression's instance variable. */
Def  makePattern(self, s | work, c, i, j)
{ work := "";
  i := 0;
  loop
  while i < size(s)
  begin c := s[i];
    select
      case c = '?'
      is work := work+asString(asChar(ANY_CHAR));
        i := i + 1;
      endCase;
      case c = '%'
      is work := work+asString(asChar(BEGIN_STR));
        i := i + 1;
      endCase;
      case c = '$'
      is work := work+asString(asChar(END_STR));
        i := i + 1;
      endCase;
      case c = '['
      is j := indexOf(s, ']', i+1);
        work := work+fillSet(self, subString(s, i, j+1));
        i := j + 1;
      endCase;
      default work := work+asString(asChar(A_CHAR))+asString(c);
      i := i + 1;
    endSelect;
  endLoop;
  ^work;
}!!





<a name="0257_0011"><a name="0257_0011">
<a name="0257_0012">
[LISTING THREE]
<a name="0257_0012">

/* *************************************************
 *          FILTER.CLS: Filter class file          *
 ************************************************* */

/* This is a class that makes possible filter programs,
like Listers, Greps, etc.  in and out are objects that
respond to Stream or File protocols.  initBlock, processBlock,
and closeBlock are executed when the filter is started,
running, and done, respectively. */!!

inherit(Object, #Filter, #(inObj
outObj
initBlock
processBlock
closeBlock
), 2, nil)!!

now(FilterClass)!!

/* Create a new Filter. */
Def  new(self, input, output, b1, b2, b3 | f)
{ f := init(new(self:Behavior), input, output, b1, b2, b3);
  ^f;
}!!

now(Filter)!!

/* Once a filter has been set up, run it.  Call initBlock to
   initialize anything other than opening inObj and outObj.
   Read a line from inObj and call processBlock.  When done,
   close both objects.  Note that outObj is optional.  Also
   not the outObj is the receiver of the blocks.  This is because
   it's the object that's programmer-defined. */
Def  run(self | str)
{ eval(initBlock, outObj, inObj);
  open(inObj, 0);
  checkError(inObj);
  if outObj
  then create(outObj);
    checkError(outObj);
  endif;
  loop
  while str := readLine(inObj)
  begin eval(processBlock, outObj, str);
  endLoop;
  eval(closeBlock, outObj, inObj);
  close(inObj);
  if outObj
  then close(outObj);
  endif;
}!!

/* Initialize a new Filter.  Fill-in all its instance
   variables. */
Def  init(self, input, output, b1, b2, b3)
{ inObj := input;
  outObj := output;
  initBlock := b1;
  processBlock := b2;
  closeBlock := b3;
  ^self;
}!!





<a name="0257_0013"><a name="0257_0013">
<a name="0257_0014">
[LISTING FOUR]
<a name="0257_0014">

/* *************************************************
 *           GREP.CLS: Grep class file             *
 ************************************************* */

/* This is a class that holds a single method that will analyze
   a file for regular expressions.  This method is an example of a
   generic Filter, too. */!!

inherit(Object, #Grep, #(fileName
pattern
matches), 2, nil)!!

now(GrepClass)!!

/* Create and run Grep for a file and a pattern. */
Def  run(self, file, expr | g)
{ g := init(new(self:Behavior), file, expr);
  ^g;
}!!

now(Grep)!!

/* The Filter will try to close us.  Make sure something
   safe happens. */
Def  close(self)
{ ^self;
}!!

/* We need a checkError message because the Filter will try to
   call one.  For a Grep this doesn't do anything. */
Def  checkError(self)
{ ^self;
}!!

/* A dummy method, needed because the Filter will try to
   create() a Grep object with a mode value.  Ours doesn't do
   anything.  If it were a real file, it would be created. */
Def  create(self)
{ ^self;
}!!

/* When done, print the number of matches that were found. */
Def  finish(self, inFile)
{ printLine(asStringRadix(matches, 10)+" lines matched pattern.");
  ^self;
}!!

/* Process a single line.  Compare it against the pattern.
   If it matches, print it. */
Def  process(self, s)
{ if match(pattern, s)
  then printLine(s);
    matches := matches + 1;
  endif;
  ^self;
}!!

/* Start grep.  Assign the filename to the TextFile */
Def  start(self, inFile)
{ setName(inFile, fileName);
  printLine("");
  printLine("Searching file: "+fileName);
  ^self;
}!!

/* Open the input text file and search it for the expression.
   If found, print it. */
Def  init(self, file, expr | f, start, process, finish,
          input, output)
{ input := new(TextFile);
  setDelimiter(input, CR_LF);
  output := self;
  pattern := new(RegularExpression, expr);
  fileName := file;
  matches := 0;
  start :=
  {using(me, inFile) start(me, inFile);
  };
  process :=
  {using(me, theLine) process(me, theLine);
  };
  finish :=
  {using(me, inFile) finish(me, inFile);
  };
  f := new(Filter, input, output, start, process, finish);
  run(f);
  ^self;
}!!




<a name="0257_0015"><a name="0257_0015">
<a name="0257_0016">
[LISTING FIVE]
<a name="0257_0016">

/* *************************************************
 *         REVISER.CLS: Reviser class file         *
 ************************************************* */

/* A Reviser is like a Grep, but it takes an additional
instance variable: the new text to be revised when the
pattern is found.  Another filename is also required,
to hold the revised text.  Finally, there is a handle
for the new file. */!!

inherit(Grep, #Reviser, #(newText
newFileName
newFile), 2, nil)!!

now(ReviserClass)!!

/* Create and run a Reviser. */
Def  run(self, file, outFile, expr, text | r)
{ r := init(new(self:Behavior), file, outFile, expr, text);
  ^r;
}!!

now(Reviser)!!

/* Initialize what's different about the Reviser from
   the Grep. */
Def  init(self, file, outFile, expr, text)
{ newText := text;
  newFileName := outFile;
  ^init(self:Grep, file, expr);
}!!

/* Process a single line.  Compare it against the pattern.  If it matches,
  change it.  For simplicity, this does not handle multiple substitutions in
  the same line. */
Def  process(self, s)
{ if match(pattern, s)
  then write(newFile, change(pattern, s, newText) + CR_LF);
    matches := matches + 1;
  endif;
  ^self;
}!!

/* Set the name for the input file, then do whatever else
   a Grep does. */
Def  start(self, inFile)
{ newFile := new(TextFile);
  setName(newFile, newFileName);
  setDelimiter(newFile, CR_LF);
  ^start(self:Grep, inFile);
}!!

/* Do a checkError on the newFile after creation. */
Def  checkError(self)
{ ^checkError(newFile);
}!!

/* Create the new file. */
Def  create(self)
{ create(newFile);
  ^self;
}!!

/* Close the new file */
Def  close(self)
{ close(newFile);
  ^self;
}!!













Copyright © 1989, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.