In this article I will examine a Java XML parser in detail. The parser will be used to parse and display a Scalable Vector Graphic (SVG) file. 

Several things become apparent when using Xerces or other full-featured parsers. There is a large amount of overhead in size and memory.  There is a significant learning curve required to perform basic tasks. 

A stripped down parser may be appropriate to decrease overhead in a small program. It may be appropriate for  "closed" XML systems.  In closed system, XML created by one program is parsed and handled by another.  In that case, the parser does not need to handle every possible scenario. It needs to handle all possible output from program that is creating the XML.  There may be cases where an application specific parser may be the best choice.

This parser will handle elements, attributes, and empty tags.  It will not "validate" the document and it will ignore processing instructions and comments.  Validating the document refers to assuring that the document conforms to a specified DTD. 

Elements are expressed in XML documents by using matching tags of the format
<tag>  element data  </tag>.

Elements can also be nested:
<topic>Great Novels
 <book>
  <title>The Fountainhead </title>
  <author>Ayn Rand </author>
 </book>
 <book>
  <title>The Great Gatsby </title>
  <author> F. Scott Fitzgerald </author>
 </book>
</topic>

Attributes can also be used to convey information in an XML document.  In this author element, there are 2 attributes; name and wife. All of the data is contained in attributes.

<author name=" F. Scott Fitzgerald " wife="Zelda"> </author>

In cases where all information in an element is contained in the attributes, the concept of an "empty tag" is used.  An empty tag is a shorthand method that eliminates the second occurrence of the tag name.  This provides precisely the same information as the previous example:

<author name="F. Scott Fitzgerald " wife="Zelda" />

The parser for this article contains 2 classes; Parser and Element. 

The Parser constructor takes a string as a parameter.  The string is the filename of the XML file to parse.  Parser contains a getElements method that returns an enumeration of all of the elements in the document.

The Element class is used to handle information regarding individual elements.  It has 4 important methods; getTagName, getValue, getAttribute, and hasAttribute.

The following listing creates a new Parser called test from a file called "test.xml."  Calling getElements creates an enumeration.  Each element is checked to see if it is a Title element.  If it is, the title is printed with a call to getValue.  If the Author element is encountered, a check is made for the wife attribute with a call to hasAttribute.  If hasAttribute is true, then the wife attribute data is printed with a call to getAttribute.


           Parser test = new Parser("test.xml");
           Enumeration e = test.getElements();
           while (e.hasMoreElements()){
              Element elem = (Element) e.nextElement();
               if (elem.getTagName().equals("title")) {
                   System.out.println ( "title " + elem.getValue());
               } // equal title
               if (elem.getTagName().equals("author")) {
                    if (elem.hasAttribute("wife")) {
                            System.out.println ( "wife is  " + elem.getAttribute("wife"));
                    } // has attribute wife
               } // equal author
           } // end while


The parser parses in broad strokes.  The first idea is that the less than character '<' indicates that something needs to be parsed.  The parseElementInfo method provides chunks of data determined by '<'.  Each chunk is passed to the parseElement method.

The parseElement method contains logic to determine whether it is handling an empty tag, the beginning of an element or the end of an element. It also checks whether it is handling an element at all.  Processing instructions are indicated by "<?" and comments are indicated by "<!".  Both comments and processing instructions are ignored.

The end of an element is indicated by a string that begins with "</".

An empty string is indicated by the presence of "/>".

All other information passed to the parseElement method indicated the beginning of an element.

If the XML document contains <title>War and Peace </title>,  parseElementInfo will pass <title>War and Peace  to parseElement.  The '>' will indicate to parseElement that the tag name is "title". 

The Element class stores an instance of a single element.  The Parser class populates a vector of Elements. That is done in the parseElement method.  An Element is defined by a tagName, a string of data containing the element's information, and an integer to indicate the nest level of this element.

The tagName in the preceding example is "title."  The element string information is "War and Peace." 

Nest Level cannot be determined by examining a single element.  Nest Level indicates how "nested" the element is in the XML document.  The root element has nest level 0.  Nest Level is increased and decreased in parseElement as new elements end tags are encountered.

Error checking is provided for matching start and end tags.  A stack is used and tagNames are pushed onto the stack for new elements.  They are popped off for end tags.  If the stored tag name does not match the current end tag, an error is given. 

In the case of elements with attributes, the value stored will contain the string of attributes.  The Element class parses this data when it is requested. The hasAttribute  and getAttribute methods handle this parsing.  If the attribute does exist, it is put into a Hashtable for subsequent look up. 

This parser can handle common XML tasks.  This will be illustrated by creating a program that parses a Scalable Vector Graphics file and displays the associated image.

Scalable Vector Graphics (SVG) format is a W3C standard for describing two-dimensional graphics in XML.  SVG 1.0 was finalized in September 2001.  In January 2003, SVG 1.1 was released as well as 2 mobile profiles; SVG Tiny and SVG Basic. SVG Tiny is intended to be suitable for cell phones and SVG Basic for PDA's. SVG 1.1 is a comprehensive specification that covers vector graphics, complex rendering and transformations, and interactivity. 

The SVGImage program reads an SVG file and displays the associated image.  It handles shapes, painting, transformations, and grouping of elements in SVG. 

I will review the infrastructure used to display an image and focus on the role of the parser.

SVGImage extends Java's Canvas class.  The Canvas class is a GUI component with no default appearance or event handling.   Canvas extends the abstract class Component.  Canvas includes the public methods addNotify and paint. These methods provide the infrastructure for rendering.  When a Component is invoked, the addNotify method is called for any set up that is required. 

In SVGImage, a BufferedImage named svgImageBuffer is created within addNotify.  A BufferedImage is a Java2D object that will accept drawing commands.  Since a BufferedImage must be created with a width and height, the SVG file is read once to get that information.  

The method renderSVG parses the SVG file and calls the graphics commands that write to svgImageBuffer.  RenderSVG is called from the Canvas's paint method.  The last thing the paint method does is to draw the image contained in svgImageBuffer. 

The work of interpreting the SVG file is done in renderSVG.

The main loop in renderSVG reads individual elements, parses the embedded graphics commands, and renders those commands into the BufferedImage svgImageBuffer.  The Graphics class svgGraphics corresponds to the BuffereredImage.

The Parser method getTagName is used to match elements.  Attributes are retrieved using getAttribute.  In the listing below, the element contains a line.  The attributes x1, y1, x2, and y2 used to define the line.  For rendering, a Java2D path is created and drawn into the BufferImage.

if (elem.getTagName().equals("line"))
              GeneralPath l = new GeneralPath(GeneralPath.WIND_EVEN_ODD);
              tempBuffer = elem.getAttribute("x1");
              fx1 = (float)new Float(tempBuffer).floatValue();             
              tempBuffer = elem.getAttribute("y1");
              fy1 = (float)new Float(tempBuffer).floatValue();             
              tempBuffer = elem.getAttribute("x2");
              fx2 = (float)new Float(tempBuffer).floatValue();             
              tempBuffer = elem.getAttribute("y2");
              fy2 = (float)new Float(tempBuffer).floatValue();                           
              l.moveTo(fx1, fy1);
              l.lineTo(fx2,fy2);
              svgGraphics.setPaint(localStyle.getStrokeColor());
              svgGraphics.setStroke(localStyle.getStroke());
              svgGraphics.draw(l);             
            }

The rendering of an individual element is straightforward.  XML in general, and SVG in particular, provide the ability to define relationships between elements. 

In SVG, The group element definition  <g> defines graphics commands that should be applied to all elements within the group.

The following example defines 4 rectangles:

<svg width="200px" height="200px">
<g style= "stroke: red;">
<rect x="10" y="10" width="20" height="40" />
<g style= "fill: yellow;">
<rect x="10" y="60" width="20" height="40" />
</g>
<rect x="110" y="10" width="20" height="40" />
</g>
<rect x="110" y="60" width="20" height="40" />
</svg>

The first rectangle has a red line border defined by stroke.  The second fills the rectangle with yellow. The third, by the fact that the group with the yellow fill definition has been closed, precisely matches the first. The fourth rectangle is not contained within any group element, so it has default attributes.

The SVG example above creates an equivalent image to the example below. This examples explicitly defines the rectangle attributes:

<rect x="10"  y="10" width="20" height="40" style= "stroke: red;" />
<rect x="10"  y="60" width="20" height="40" style= "stroke: red; fill: yellow;" />
<rect x="110" y="10" width="20" height="40" style= "stroke: red;" />
<rect x="110" y="60" width="20" height="40" />

These are both perfectly valid.  If a project called for XML to be generated from an existing file, the explicit method may be more straightforward.  In an SVG drawing program the group element would be a natural item to implement.

The group element requires special consideration for parsing and rendering. 

In a typical Document Object Model (DOM) implementation, the group element would be a node and the rectangles would be considered child nodes.  The graphic commands in the parent node would apply to the children.  In DOM, the nodes in an XML document are traversed by handling the node, checking for children, and then handling the children. 
A method is defined for the node and it is applied recursively for the children

In this parser,  the nestLevel associated with each element is used as an indicator for child nodes.  The elements directly under the root element are nest level 1.  NestLevels are increased if an element is embedded within another element. 

The application using the parser is responsible for handling nested elements.  For certain XML files, handling nested elements may never be necessary.  Because of the group element, nesting is significant in SVG.  The code must recognize a new group and handle the end of the group. 

Three classes are required for handling group elements; Group, Style, and Transform. The Group class stores information about each group element.  The Style Class parses and contains style information such as fill and stroke colors.  The Transform Class parses and contains transformation information such as scaling, rotating, and translating.  The Group class is made up of a Transform and Style Object.

As each new group element is encountered, a new Group object is created.  For subsequent group elements, each new Group object retains data from previous Group objects.  In the rectangle example, a Group object is created for the red-bordered rectangle.  A second group object is created for the red-bordered, yellow-filled rectangle.

Each new Group Object is stored in a Hashtable.  The key to this Hashtable is based on the nest level of the current group element.  The Hashtable contains the current state and all previous states. 

By tracking the nest levels of the last element and the current element it is possible to recognize when nest levels decrease.  When nest levels decrease, the previous state is restored from the HashTable. 

SVGImage shows that the parser can handle a complex XML file. The parser provides an enumeration for handling elements.  It stores attribute data and parses it when required by the application.  Nest level information is available to the application.