Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

How Do I Handle Multiline Strings?


Jun01: Java Q&A


It is mystifying that a language like Java, which is quite advanced in most respects, would have such primitive support for a heavily used data type such as strings. Java strings are delimited by double quotes, like this: "This is a string." New-line characters and embedded quotes within strings are forbidden unless rewritten as escape sequences like \" or \n. For sequences of text (such as HTML pages), the best you can do is to write each line as a Java string connected with string concatenation operators like this:

"<html>\n" +

" <body>\n" +

" ...and so forth

th\n" +

" </body>\n" +

"<html>\n"

In addition, any double-quote characters in the HTML text must be escaped with a backslash. All this just to get ordinary HTML text into a Java program. And I haven't even started on the real work of integrating computed information (executable inclusions) into an outgoing stream of text. Getting all the quotes and plus signs right is sufficiently tedious that most quickly give up and turn to HTML-based tools such as JSP, particularly when working with user-interface specialists who expect HTML to look like HTML without extraneous quotes and plus signs.

The MLS preprocessor (see Listing One; also available at http://www.virtualschool.edu/wap/index.html) I present here addresses these two needs — multiline strings and executable inclusions — by means of a single pair of digraphs, {{ and }}, which it processes differently according to the digraph nesting level. Examples 1, 2, and 3 demonstrate the simplest case, where digraphs replace double quotes for delimiting an ordinary string. The program is color coded in alternating red and blue to make the nesting level obvious. MLS simply passes the blue text through unchanged and converts the red text to Java strings. If the red text contains new-line or double-quote characters, MLS converts them to Java conventions. The executable inclusion feature lets the answer be computed at run time instead of being hard coded into a string. Simply enclose any Java expression within digraphs, as in Example 3.

MLS is governed entirely by the nested digraphs. It has no knowledge of Java other than how to emit concatenated Java strings as output. The {{ digraph begins a multiline string and the second, }}, terminates it. The same pair of digraphs serves double duty to begin and end executable inclusions.

Doubled brackets were chosen as the digraphs because:

  • They do not occur in properly formatted Java programs.
  • Brackets are recognized by the parenthesis nesting logic of text editors such as vi and emacs, so out-of-balance conditions are readily noticed.

MLS emits exactly one line of Java for each line of MLS source text so any error diagnostics from the Java compiler match the line numbers in the MLS input. Also notice that new-line characters within multiline strings are exactly represented in the generated code.

The variable inclusion feature is based on the fact that Java will automatically convert any type concatenated with a string by calling the types toString() method that all objects inherit from Object automatically. Thus, the theAnswer variable in Example 3 could be of any type whatsoever, including built-in types such as int or float and application-specific types.

Since most Java compilers optimize away concatenation of known constants at compile time, MLS adds no run-time overhead at all.

Multilevel Nesting

As implied by the color coding convention, the digraphs can nest to any level. In Examples 1, 2, and 3, the file as a whole (blue) is the 0th level of nesting, the multiline string argument to the System.out.println(); statement (red) is the first level of nesting, and the {{ theAnswer }} executable inclusion (blue) is the second-level nesting. But any Java expression might appear where theAnswer appears now, including subroutine calls that might have String arguments, which might be written as multiline strings, which might contain executable inclusions. In other words, it is possible for the nesting to continue to any depth. MLS supports this even though it is not often encountered in practice.

MLS handles nested digraphs by relying on recursive calls between a pair of subroutines. MLS starts execution in 0th level (blue) mode by passing control to the doCode() subroutine.

  • Even-level nestings (blue) are handled by the doCode() subroutine. This simply passes input to the output unchanged. In this example, this mode applies to the 0th level nesting (the file as a whole), and also to the second level nesting represented by the {{ theAnswer }} expression. If the doCode() subroutine detects a {{ digraph, it invokes doData() to process it. If it finds a }} digraph, it returns to its caller.
  • Odd-level nestings (red) are handled by the doData() subroutine. This simply converts the incoming text to a Java string by surrounding it in quotes and concatenating it and its neighbors with a + while prefixing any internal quotes or new-line characters with \. If doData() finds a {{ digraph, it calls doCode() to process it. If it finds a }} digraph, it returns to its caller.

Both subroutines check for and report unbalanced nesting by throwing exceptions as appropriate.

Using MLS

By default, MLS replaces the input file suffix (I use .j as the postfix for MLS files) with a .java suffix and emits each output file into the same directory as the input. In practice, it is more convenient to invoke the preprocessor as mls -d outputDirectory inputFile.j ..., in which case it will emit the output files into the specified outputDirectory.

As a convenience feature, MLS will print the name of each output file on stdout to facilitate the typical usage pattern demonstrated in the Makefile in Example 4. This example Makefile simply recompiles the entire web site each time it is run. Better Makefiles could be devised, but I've never bothered: The MLS/Jikes combination is so fast that I've never felt a need for a more selective compilation procedure.

DDJ

Listing One

package com.sdi.tools.mls;
import java.util.*;
import java.io.*;
/** Multiline Java Strings with Executable Inclusions
 * A Java Preprocessor by Brad Cox, Ph.D. [email protected]
 */
public class Main 
{
  private static int nestingLevel;
  private static String fileName = "";
  private static int lineNumber;
  private static PushbackInputStream in;
  private static PrintWriter out;
  private static final String usage = 
    "Usage: java com.sdi.jp.Main inputFileName...";
/** Insert method's description here. Creation date: (12/27/00 09:37:35) */
private static void doCode()
  throws Exception
{
  nestingLevel++;
  int thisInt, nextInt;
  while((thisInt = in.read()) != -1)
  {
    switch(thisInt)

    {
      case '\n':
        lineNumber++;
        out.print((char)thisInt);
        break;
      case '{':
        nextInt = in.read();
        if (nextInt == '{')
        {
          doString(lineNumber);
          break;
        }
        else 
        {
          out.print((char)thisInt);
          in.unread(nextInt);
          break;
        }
      case '}':
        nextInt = (char)in.read();
        if (nextInt == '}')
        {
          if (--nestingLevel <= 1)
            throw new Exception(fileName + 
                              ":: Extraneous }} at line " + lineNumber);
          return;
        }
        else 
        {
          out.print((char)thisInt);
          in.unread((char)nextInt);
          break;
        }
      default:
        out.print((char)thisInt);
break;
    }
  }
}
/** Process a PushBackInputStream */
public static void doStream(InputStream is, PrintWriter os)
  throws Exception
{
  in = new PushbackInputStream(is);
  out = os;
  lineNumber = 0;
  nestingLevel = 0;
  doCode();
}
private static void doString(int line)
  throws Exception
{
  nestingLevel++;
  int thisInt, nextInt;
  out.print("\"");
  while((thisInt = in.read()) != -1)
  {
    switch(thisInt)
    {
      case '\n':
        lineNumber++;
        out.print("\\n\"+\n\"");
        break;
      case '\\':
        out.print("\\" + (char)thisInt);
        break;
      case '"':
        out.print("\\\"");
        break;
      case '{':
        nextInt = in.read();
        if (nextInt == '{')
        {
          out.print("\"+");
          doCode();
          out.print("+\"");
          break;
        }
        else 
        {
          out.print((char)thisInt);
          in.unread(nextInt);
          break;
        }
      case '}':
        nextInt = (char)in.read();
        if (nextInt == '}')
        {
          out.print("\"");
          return;
        }
        else 
        {
          out.print((char)thisInt);
          in.unread((char)nextInt);
          break;
        }
      default:
        out.print((char)thisInt);
        break;
    }
  }
  throw new IOException(fileName + 
                          ": unterminated {{string}} at line " + line);
}
/** Insert the method's description here. Creation date: (12/27/00 09:24:51)
 * @param args java.lang.String[]
 */
public static void main(String[] args)
{
  File outDirectory = new File(".");
  try
  {
    Vector files = new Vector();
    for (int i = 0; i < args.length; i++)
    {
      if (args[i].startsWith("-"))
      {
        if (args[i].startsWith("-d"))
        {
          outDirectory = new File(args[++i]);
          if (!outDirectory.isDirectory() && !outDirectory.mkdirs())
          {
            System.err.println("Couldn't create " + outDirectory);
            System.exit(-1);
          }
        }
        else
          System.err.println(usage + "\n   invalid switch: " + args[i]);
      }
      else
        files.addElement(args[i]);
    }
    for (Enumeration e = files.elements(); e.hasMoreElements(); )
    {
      fileName = (String)e.nextElement();
      lineNumber = 0;
      File inFile = new File(fileName);
      BufferedInputStream bis = null;
      try
      {
        FileInputStream fis = new FileInputStream(inFile);
        bis = new BufferedInputStream(fis);
      }
      catch (FileNotFoundException ex)
      {
        System.err.println("Cannot read " + fileName);
        continue;
      }
      String base = fileName.substring(0, fileName.lastIndexOf("."));
      File outFile = new File(outDirectory, base + ".java");
      PrintWriter pw = null;
      try
      {
        FileOutputStream fos = new FileOutputStream(outFile);
        BufferedOutputStream bos = new BufferedOutputStream(fos);
        pw = new PrintWriter(bos);
      }
      catch (IOException ex)
      {
        System.err.println("Cannot write " + outFile);
        continue;
      }
      doStream(bis, pw);
      bis.close();
      pw.close();
      /** Print names of output files on stdout to support
       * the usage pattern: jikes `mls inputfiles`
       */
      System.out.println(outFile);
    }
  }
  catch (Throwable e)
  {
    System.err.println(e.getMessage());
    e.printStackTrace();
    System.exit(-1);
  }
}
}







Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.