Channels ▼
RSS

Tools

Go Tutorial: Object Orientation and Go's Special Data Types


If the user did not request usage information, we check to see if they entered any command-line arguments, and if they did, we set the inFilename return variable to their first command-line argument and the outFilename return variable to their second command-line argument. Of course, they may have given no command-line arguments, in which case, both inFilename and outFilename remain empty strings; or they may have entered just one, in which case inFilename will have a filename and outFilename will be empty.

At the end, we do a simple sanity check to make sure that the user doesn't overwrite the input file with the output file, exiting if necessary — but if all is well, we return. (In fact, the user could still overwrite the input file using redirection — for example, $./americanise infile>infile — but at least we have prevented an obvious accident.) Functions or methods that return one or more values must have at least one return statement. It can be useful for clarity, and for godoc-generated documentation, to give variable names for return types, as we have done in this function. If a function or method has variable names as well as types listed for its return values, then a bare return is legal (so, a return statement that does not specify any variables). In such cases, the listed variables' values are returned. We do not use bare returns in this article because they are considered to be poor Go style.

Go takes a consistent approach to reading and writing data that allows us to read and write to files, to buffers (for example, to slices of bytes or to strings), and to the standard input, output, and error streams — or to our own custom types — so long as they provide the methods necessary to satisfy the reading and writing interfaces.

For a value to be readable it must satisfy the io.Reader interface. This interface specifies a single method with signature, Read([]byte) (int, error). The Read() method reads data from the value it is called on and puts the data read into the given byte slice. It returns the number of bytes read and an error value that will be nil if no error occurred, or io.EOF ("end of file") if no error occurred and the end of the input was reached, or some other non-nil value if an error occurred. Similarly, for a value to be writable, it must satisfy the io.Writer interface. This interface specifies a single method with signature, Write([]byte) (int, error). The Write() method writes data from the given byte slice into the value the method was called on, and returns the number of bytes written and an error value (which will be nil if no error occurred).

The io package provides readers and writers but these are unbuffered and operate in terms of raw bytes. The bufio package provides buffered input/output where the input will work on any value that satisfies the io.Reader interface (that is, it provides a suitable Read() method), and the output will work on any value that satisfies the io.Writer interface (that is, provides a suitable Write() method). The bufio package's readers and writers provide buffering and can work in terms of bytes or strings, and thus are ideal for reading and writing UTF-8 encoded text files.

var britishAmerican = "british-american.txt"

func americanise(inFile io.Reader, outFile io.Writer) (err error) { 
    reader := bufio.NewReader(inFile) 
    writer := bufio.NewWriter(outFile) 
    defer func() {
        if err == nil { 
            err = writer.Flush()
        }
    }()

    var replacer func(string) string 
    if replacer, err = makeReplacerFunction(britishAmerican); err != nil {
        return err
    } 
    wordRx := regexp.MustCompile("[A-Za-z]+") 
    eof := false 
    for !eof {
        var line string 
        line, err = reader.ReadString('\n') 
        if err == io.EOF {
            err = nil     // io.EOF isn't really an error
            eof = true    // this will end the loop at the next iteration 
        } else if err != nil {
            return err    // finish immediately for real errors
        } 
        line = wordRx.ReplaceAllStringFunc(line, replacer) 
        if _, err = writer.WriteString(line); err != nil { 
            return err 
        }
    }
    return nil
}

The americanise() function buffers the inFile reader and the outFile writer. Then it reads lines from the buffered reader and writes each line to the buffered writer, having replaced any British English words with their U.S. equivalents.

The function begins by creating a buffered reader and a buffered writer through which their contents can be accessed as bytes — or more conveniently, in this case, as strings. The bufio.NewReader() construction function takes as argument any value that satisfies the io.Reader interface (that is, any value that has a suitable Read() method) and returns a new buffered io.Reader that reads from the given reader. The bufio.NewWriter() function is synonymous. Notice that the americanise() function doesn't know or care what it is reading from or writing to — the reader and writer could be compressed files, network connections, byte slices ([]byte), or anything else that supports the io.Reader and io.Writer interfaces. This way of working with interfaces is very flexible and makes it easy to compose functionality in Go.

Next, we create an anonymous deferred function that will flush the writer's buffer before the americanise() function returns control to its caller. The anonymous function will be called when americanise() returns normally — or abnormally due to a panic. If no error has occurred and the writer's buffer contains unwritten bytes, the bytes will be written before americanise() returns. Since it is possible that the flush will fail, we set the err return value to the result of the writer.Flush() call. A less defensive approach would be to have a much simpler defer statement of defer writer.Flush() to ensure that the writer is flushed before the function returns and ignoring any error that might have occurred before the flush — or that occurs during the flush.

Go allows the use of named return values, and we have taken advantage of this facility here (err error), just as we did previously in the filenamesFromCommandLine() function. Be aware, however, that there is a subtle scoping issue we must consider when using named return values. For example, if we have a named return value of value, we can assign to it anywhere in the function using the assignment operator (=) as we'd expect. However, if we have a statement such as if value := …, because the if statement starts a new block, the value in the if statement will be a new variable, so the if statement's value variable will shadow the return value variable. In the americanise() function, err is a named return value, so we have made sure that we never assign to it using the short variable declaration operator ( := ) to avoid the risk of accidentally creating a shadow variable. One consequence of this is that we must declare the other variables we want to assign to at the same time, such as the replacer function (➊) and the line we read in. An alternative approach is to avoid named return values and return the required value or values explicitly, as we have done elsewhere.

One other small point to note is that we have used the blank identifier, _. The blank identifier serves as a placeholder for where a variable is expected in an assignment, and discards any value it is given. The blank identifier is not considered to be a new variable, so if used with :=, at least one other (new) variable must be assigned to.

The Go standard library contains a powerful regular expression package called regexp. This package can be used to create pointers to regexp.Regexp values (that is, of type *regexp.Regexp). These values provide many methods for searching and replacing. Here we have chosen to use the regexp.Regexp.ReplaceAllStringFunc()method which, given a string and a "replacer" function with signature func(string)string, calls the replacer function for every match, passing in the matched text, and replacing it with the text the replacer function returns.

If we had a very small replacer function, say, one that simply upper-cased the words it matched, we could have created it as an anonymous function when we called the replacement function. For example:

Line = wordRx.ReplaceAllStringFunc(line, 
    func(word string) string { return strings.ToUpper(word) })

However, the americanise() program's replacer function, although only a few lines long, requires some preparation, so we have created another function, makeReplacerFunction(), that, given the name of a file that contains lines of original and replacement words, returns a replacer function that will perform the appropriate replacements.

If the makeReplacerFunction() returns a non-nil error, we return and the caller is expected to check the returned error and respond appropriately (as it does).

Regular expressions can be compiled using the regexp.Compile() function, which returns a *regexp.Regexp and nil, or nil and error if the regular expression is invalid. This is ideal for when the regular expression is read from an external source, such as a file or received from the user. Here, though, we have used the regexp.MustCompile() function — this simply returns a *regexp.Regexp, or panics if the regular expression, or regexp is invalid. The regular expression used in the example matches the longest possible sequence of one or more English alphabetic characters.

With the replacer function and the regular expression in place, we start an infinite loop that begins by reading a line from the reader. The bufio.Reader.ReadString() method reads (or, strictly speaking, decodes) the underlying reader's raw bytes as UTF-8 encoded text (which also works for 7-bit ASCII) up to and including the specified byte (or up to the end of the file). The function conveniently returns the text as a string, along with an error (or nil). If the error returned by the call to the bufio.Reader.ReadString() method is not nil, either we have reached the end of the input or we have hit a problem. At the end of the input, err will be io.EOF, which is perfectly okay, so in this case we set err to nil (because there isn't really an error), and set eof to true to ensure that the loop finishes at the next iteration, so we won't attempt to read beyond the end of the file. We don't return immediately ,instead we get io.EOF, since it is possible that the file's last line doesn't end with a newline, in which case, we will have received a line to be processed, in addition to the io.EOF error.

For each line, we call the regexp.Regexp.ReplaceAllStringFunc() method, giving it the line and the replacer function. We then try to write the (possibly modified) line to the writer using the bufio.Writer.WriteString() method — this method accepts a string and writes it out as a sequence of UTF-8 encoded bytes, returning the number of bytes written and an error (which will be nil if no error occurred). We don't care how many bytes are written so we assign the number to the blank identifier, _. If err is not nil, we return immediately, and the caller will receive the error.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video