Easy DOM Parsing in Java
There are a few ways to parse XML in Java:
- SAX parser: An event-based sequential access parser API that only operates on portions of the XML document at any one time.
- DOM parser: The Document Object Model parser is a hierarchy-based parser that creates an object model of the entire XML document, then hands that model to you to work with.
- JAXB: The Java Architecture for XML Binding maps Java classes to XML documents and allows you to operate on the XML in a more natural way.
- String operations: I've seen some people, due to performance or memory constraints, actually perform
String
operations on a loaded XML document to manually find bits of information within the XML as aString
; for instance, using theString
class'sindexOf
and other built-in methods. This is not a scalable or reusable solution.
In my experience, the most popular way to work with XML is to use the DOM parser. With the DOM, the XML is broken down into three main pieces, called entities:
- Elements (sometimes called tags)
- Attributes
- The data (also called values) that the elements and attributes describe
Conceptually, this is simple enough, and most people choose DOM parsing for this very reason. However, when parsing XML, traversing the Document Object Model (DOM) is not always easy. I typically include a set of easy-to-use methods, such as getNode
and getNodeValue
(shown below), to help me pull data from a parsed XML document. This saves me from rewriting all of the otherwise recursive code to traverse a nested XML hierarchy. Before we dive into the code, I'll give a quick overview of what you need to prepare for when processing XML in Java code.
Look at the XML document below as an example:
<?xml version="1.0" encoding="UTF-8" ?> <Company> <Name>My Company</Name> <Executive type="CEO"> <LastName>Smith</LastName> <FirstName>Jim</FirstName> <street>123 Main Street</street> <city>Mytown</city> <state>NY</state> <zip>11234</zip> </Executive> </Company>
An element is always enclosed in "<" and ">" brackets and can consist of any piece of text, such as <Company>
. Attributes are additional name/value pairs placed within an element's brackets, but after the element's tag name, such as <Executive type="CEO">
. The attribute name is always followed by an equals sign (=), and then the value in quotes. An element can contain zero or more attributes, where each attribute name/value pair is separated by whitespace. Elements and attributes, themselves, make up what is called metadata, which is data that describes data.
When parsing XML via a DOM parser, each of the three important parts of the XML structure (elements, attributes, and the data) are represented by the Node
class. To process this XML in a meaningful way, you need to create a series of nested loops that start from the document's root node, and recursively navigate through the child nodes, then each child node's children, and so on. Then, when you've found the node by name, you need to check its child nodes and their types to be sure you're reading an attribute or value. For instance, the node data (or value) has the type Node.TEXT_NODE
, while an attribute has the type Node.ATTRIBUTE
.
Here are the helper methods I use most often:
import com.sun.org.apache.xerces.internal.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.w3c.dom.NodeList; // ... protected Node getNode(String tagName, NodeList nodes) { for ( int x = 0; x < nodes.getLength(); x++ ) { Node node = nodes.item(x); if (node.getNodeName().equalsIgnoreCase(tagName)) { return node; } } return null; } protected String getNodeValue( Node node ) { NodeList childNodes = node.getChildNodes(); for (int x = 0; x < childNodes.getLength(); x++ ) { Node data = childNodes.item(x); if ( data.getNodeType() == Node.TEXT_NODE ) return data.getNodeValue(); } return ""; } protected String getNodeValue(String tagName, NodeList nodes ) { for ( int x = 0; x < nodes.getLength(); x++ ) { Node node = nodes.item(x); if (node.getNodeName().equalsIgnoreCase(tagName)) { NodeList childNodes = node.getChildNodes(); for (int y = 0; y < childNodes.getLength(); y++ ) { Node data = childNodes.item(y); if ( data.getNodeType() == Node.TEXT_NODE ) return data.getNodeValue(); } } } return ""; } protected String getNodeAttr(String attrName, Node node ) { NamedNodeMap attrs = node.getAttributes(); for (int y = 0; y < attrs.getLength(); y++ ) { Node attr = attrs.item(y); if (attr.getNodeName().equalsIgnoreCase(attrName)) { return attr.getNodeValue(); } } return ""; } protected String getNodeAttr(String tagName, String attrName, NodeList nodes ) { for ( int x = 0; x < nodes.getLength(); x++ ) { Node node = nodes.item(x); if (node.getNodeName().equalsIgnoreCase(tagName)) { NodeList childNodes = node.getChildNodes(); for (int y = 0; y < childNodes.getLength(); y++ ) { Node data = childNodes.item(y); if ( data.getNodeType() == Node.ATTRIBUTE_NODE ) { if ( data.getNodeName().equalsIgnoreCase(attrName) ) return data.getNodeValue(); } } } } return ""; }
To use this class, simply create a DOMParser
class instance, provide it with the path and name of your XML document, navigate to the proper place in the XML hierarchy, and call getNodeValue
(or getNodeAttr
) for each data item you want to pull out, as shown in the sample code below:
try { DOMParser parser = new DOMParser(); parser.parse("mydocument.xml"); Document doc = parser.getDocument(); // Get the document's root XML node NodeList root = doc.getChildNodes(); // Navigate down the hierarchy to get to the CEO node Node comp = getNode("Company", root); Node exec = getNode("Executive", comp.getChildNodes() ); String execType = getNodeAttr("type", exec); // Load the executive's data from the XML NodeList nodes = exec.getChildNodes(); String lastName = getNodeValue("LastName", nodes); String firstName = getNodeValue("FirstName", nodes); String street = getNodeValue("street", nodes); String city = getNodeValue("city", nodes); String state = getNodeValue("state", nodes); String zip = getNodeValue("zip", nodes); System.out.println("Executive Information:"); System.out.println("Type: " + execType); System.out.println(lastName + ", " + firstName); System.out.println(street); System.out.println(city + ", " + state + " " + zip); } catch ( Exception e ) { e.printStackTrace(); }
I realize that many of you are probably already XML parsing veterans, but I'm sure there are some newbies as well. Whether you're experienced or not, I hope you find these helper methods, well, helpful.
Happy coding!
-EJB
More on this theme: Helper Methods for Writing XML in Java.