SAX Parser in java provides API to parse XML documents. SAX parser is different from DOM parser because it doesn’t load complete XML into memory and read xml document sequentially.
javax.xml.parsers.SAXParser
provides method to parse XML document using event handlers. This class implements XMLReader
interface and provides overloaded versions of parse()
methods to read XML document from File, InputStream, SAX InputSource and String URI. The actual parsing is done by the Handler class. We need to create our own handler class to parse the XML document. We need to implement org.xml.sax.ContentHandler
interface to create our own handler classes. This interface contains callback methods that receive notification when an event occurs. For example StartDocument, EndDocument, StartElement, EndElement, CharacterData etc. org.xml.sax.helpers.DefaultHandler
provides default implementation of ContentHandler interface and we can extend this class to create our own handler. It’s advisable to extend this class because we might need only a few of the methods to implement. Extending this class will keep our code cleaner and maintainable.
Let’s jump to the SAX parser example program now, I will explain different features in detail later on. employees.xml
<?xml version="1.0" encoding="UTF-8"?>
<Employees>
<Employee id="1">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="2">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
<Employee id="4">
<age>25</age>
<name>Meghna</name>
<gender>Female</gender>
<role>Manager</role>
</Employee>
</Employees>
So we have a XML file stored somewhere in file system and by looking at it, we can conclude that it contains list of Employee. Every Employee has id
attribute and fields age
, name
, gender
and role
. We will use SAX parser to parse this XML and create a list of Employee object. Here is the Employee object representing Employee element from XML.
package com.journaldev.xml;
public class Employee {
private int id;
private String name;
private String gender;
private int age;
private String role;
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getGender() {
return gender;
}
public void setGender(String gender) {
this.gender = gender;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public String getRole() {
return role;
}
public void setRole(String role) {
this.role = role;
}
@Override
public String toString() {
return "Employee:: ID="+this.id+" Name=" + this.name + " Age=" + this.age + " Gender=" + this.gender +
" Role=" + this.role;
}
}
Let’s create our own SAX Parser Handler class extending DefaultHandler class.
package com.journaldev.xml.sax;
import java.util.ArrayList;
import java.util.List;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import com.journaldev.xml.Employee;
public class MyHandler extends DefaultHandler {
// List to hold Employees object
private List<Employee> empList = null;
private Employee emp = null;
private StringBuilder data = null;
// getter method for employee list
public List<Employee> getEmpList() {
return empList;
}
boolean bAge = false;
boolean bName = false;
boolean bGender = false;
boolean bRole = false;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("Employee")) {
// create a new Employee and put it in Map
String id = attributes.getValue("id");
// initialize Employee object and set id attribute
emp = new Employee();
emp.setId(Integer.parseInt(id));
// initialize list
if (empList == null)
empList = new ArrayList<>();
} else if (qName.equalsIgnoreCase("name")) {
// set boolean values for fields, will be used in setting Employee variables
bName = true;
} else if (qName.equalsIgnoreCase("age")) {
bAge = true;
} else if (qName.equalsIgnoreCase("gender")) {
bGender = true;
} else if (qName.equalsIgnoreCase("role")) {
bRole = true;
}
// create the data container
data = new StringBuilder();
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (bAge) {
// age element, set Employee age
emp.setAge(Integer.parseInt(data.toString()));
bAge = false;
} else if (bName) {
emp.setName(data.toString());
bName = false;
} else if (bRole) {
emp.setRole(data.toString());
bRole = false;
} else if (bGender) {
emp.setGender(data.toString());
bGender = false;
}
if (qName.equalsIgnoreCase("Employee")) {
// add Employee object to list
empList.add(emp);
}
}
@Override
public void characters(char ch[], int start, int length) throws SAXException {
data.append(new String(ch, start, length));
}
}
MyHandler contains the list of the Employee
object as a field with a getter method only. The Employee
objects are getting added in the event handler methods. Also, we have an Employee field that will be used to create an Employee object and once all the fields are set, add it to the employee list.
The important methods to override are startElement()
, endElement()
and characters()
. SAXParser
starts parsing the document, when any start element is found, startElement()
method is called. We are overriding this method to set boolean variables that will be used to identify the element. We are also using this method to create a new Employee object every time Employee start element is found. Check how id attribute is read here to set the Employee Object id
field. characters()
method is called when character data is found by SAXParser inside an element. Note that SAX parser may divide the data into multiple chunks and call characters()
method multiple times (Read ContentHandler class characters() method documentation). That’s why we are using StringBuilder to keep this data using append() method. The endElement()
is the place where we use the StringBuilder data to set employee object properties and add Employee object to the list whenever we found Employee end element tag. Below is the test program that uses MyHandler
to parse above XML to list of Employee objects.
package com.journaldev.xml.sax;
import java.io.File;
import java.io.IOException;
import java.util.List;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import com.journaldev.xml.Employee;
public class XMLParserSAX {
public static void main(String[] args) {
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxParserFactory.newSAXParser();
MyHandler handler = new MyHandler();
saxParser.parse(new File("/Users/pankaj/employees.xml"), handler);
//Get Employees list
List<Employee> empList = handler.getEmpList();
//print employee information
for(Employee emp : empList)
System.out.println(emp);
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
}
}
Here is the output of the above program.
Employee:: ID=1 Name=Pankaj Age=29 Gender=Male Role=Java Developer
Employee:: ID=2 Name=Lisa Age=35 Gender=Female Role=CEO
Employee:: ID=3 Name=Tom Age=40 Gender=Male Role=Manager
Employee:: ID=4 Name=Meghna Age=25 Gender=Female Role=Manager
SAXParserFactory
provides factory methods to get the SAXParser
instance. We are passing File object to the parse method along with MyHandler instance to handle the callback events. SAXParser is a little bit confusing in the start but if you are working on a large XML document, it provides a more efficient way to read XML than DOM Parser. That’s all for SAX Parser in Java.
You can download the project from our GitHub Repository.
Reference: SAXParser, DefaultHandler
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
Hi, Pankaj your article really helped me thankyou for posting such amazing content
- shashi kumar reddy
I have an XML file as List in the input. How can I use SAX to parse the List? I don’t have access to the XML file directly. The list will be holding the line by line strings for XML.
- Sam
Hi Pankaj I am a java developer ,I have been reading your documentation and following it since long time it has been a very helpfull for me ,now in one of our requirements we need to parse maven pom file in the java class to check all it dependencies ,so can you please help me with the soggestion for parsing pom file with skipping some of dependencies tags
- Raghavendra Joshi
Hi Pankaj, Thanks for the content. In our application we are using SAXParser for parsing XML document and now we want to extend our functionality to json support as well. Can you suggest which is the best parser available performance vise as well as consistent with the XML’s SAXParser. Your quick reply on this is appreciated. Thanks, Krutik
- Krutik Maheta
i want to display only specific tags of xml file in jtree i am using sax parser and defaultmutabletreenode here is my code: import javax.swing.*; import java.awt.*; import java.awt.event.*; import java.io.*; import javax.swing.tree.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.*; public class ScfA { // private String currentElement; private SAXTreeBuilder saxTree = null; private static String file = “”; public static void main(String args[]) { JFrame frame = new JFrame(“[ PROSscf ]”); frame.setSize(400, 400); frame.addWindowListener(new WindowAdapter() { public void windowClosing(WindowEvent ev) { System.exit(0); } }); file = “lastscf.xml”; new ScfA(frame); } public ScfA(JFrame frame) { frame.getContentPane().setLayout(new BorderLayout()); DefaultMutableTreeNode top = new DefaultMutableTreeNode(file); // DefaultMutableTreeNode top1= new // DefaultMutableTreeNode(“XML Document”); saxTree = new SAXTreeBuilder(top); try { SAXParser saxParser = new SAXParser(); saxParser.setContentHandler(saxTree); saxParser.parse(file); } catch (Exception ex) { top.add(new DefaultMutableTreeNode(ex.getMessage())); } JTree tree = new JTree(saxTree.getTree()); JScrollPane scrollPane = new JScrollPane(tree); frame.getContentPane().add(“Center”, scrollPane); frame.setVisible(true); } } class SAXTreeBuilder extends DefaultHandler { private DefaultMutableTreeNode currentNode = null; private DefaultMutableTreeNode previousNode = null; private DefaultMutableTreeNode rootNode = null; public SAXTreeBuilder(DefaultMutableTreeNode root) { rootNode = root; currentNode = rootNode; } public DefaultMutableTreeNode getRoot() { return rootNode; } // check whether required or not public void characters(char[] data, int start, int end) { String str = new String(data, start, end); if (!str.equals(“”) && Character.isLetter(str.charAt(0))) { currentNode.add(new DefaultMutableTreeNode(str)); } } public void startElement(String uri, String qName, String lName, Attributes atts) throws SAXException { /* if (“Application”.equals(currentNode)) */ /* String name = (String)currentNode.getUserObject(); */ previousNode = currentNode; // Use the result of the getNodeText method to construct the tree currentNode = new DefaultMutableTreeNode(getNodeText(lName, atts)); // Add attributes as child nodes // if (shouldFilter(currentNode)) { attachAttributeList(currentNode, atts); previousNode.add(currentNode); } } /* * DefaultMutableTreeNode newNode = new DefaultMutableTreeNode(lName); * * if (currentNode == null) { rootNode = newNode; } else { * if(shouldFilter(newNode)) * * * currentNode.add(newNode); } currentNode = newNode; * * } */ /* * private void filterNodes(DefaultMutableTreeNode node) { * System.out.println(node); int childCount = node.getChildCount(); for (int * i = childCount - 1; i >= 0; i–) { DefaultMutableTreeNode child = * (DefaultMutableTreeNode) node .getChildAt(i); if (shouldFilter(child)) { * node.remove(i); } else { filterNodes(child); } */ public void endElement(String uri, String qName, String lName) { if (currentNode.getUserObject().equals(lName)){ currentNode = (DefaultMutableTreeNode) currentNode.getParent(); } } private String getNodeText(final String lName, final Attributes atts) { final String postfix; if (“Cube”.equals(lName) && atts.getValue(“name”) != null) postfix = “: " + atts.getValue(“name”); else if (“Dimension”.equals(lName) && atts.getValue(“name”) != null) postfix = " :” + atts.getValue(“name”); else if (“AggregateTable”.equals(lName) && atts.getValue(“name”) != null) postfix = " :" + atts.getValue(“name”); else if (“PrimaryKey”.equals(lName) && atts.getValue(“name”) != null) postfix = " :" + atts.getValue(“name”); else if (“Measure”.equals(lName) && atts.getValue(“name”) != null) postfix = " :" + atts.getValue(“name”); else if (“FactDimension”.equals(lName) && atts.getValue(“name”) != null) postfix = " :" + atts.getValue(“name”); /* * else if (“AggregateLevel”.equals(lName) && atts.getValue(“name”) != * null) postfix = " :" + atts.getValue(“name”); */ else postfix = “”; return lName + postfix; } public DefaultMutableTreeNode getTree() { return rootNode; } boolean shouldFilter(DefaultMutableTreeNode node) { String name = (String) node.getUserObject(); /* System.out.println(name); */ return ((name.contains(“Cube”))); } private void attachAttributeList(DefaultMutableTreeNode node, Attributes atts) { for (int i = 0; i < atts.getLength(); i++) { String name = atts.getLocalName(i); String value = atts.getValue(name); if (!name.equals(“name”)) node.add(new DefaultMutableTreeNode(name + " = " + value)); } } }
- krishna
Is that possible if my xml have 100 Employee tag and I want to parse 20 employee at a time. I don’t want to get the a list of 100 employee in my memory.So, I just want to parse 20 employee in a single list.
- Rahul
Hi, Could you pls give example for unstructured xml file?
- Tint
when to use SAX parser and when to use DOM parser?
- Sundara Baskaran
Your example was amazing… Its very easy to understand and I fixed my issue using this example.
- Aishwarya
Hi , Issue --> Actually I am facing issue with xml parsing (SAX Parser) in Unix Machine. Same Jar/Java-Code behave differently on windows and Unix Machine, why ? :( Windows Machine --> works fine , Using SAX Parser to load huge xml file , Read all values correctly and populate same values. Charset.defaultCharset() windows-1252 Unix Machine --> After then created JAR and deployed at Unix --> tomcat and execute the jar. Tried to load same huge xml file But noticed that some values or characters are populated empty or incomplete like Country Name populated as “ysia” instead of “Malaysia” or transaction Date populate as “3 PM” instead of “18/09/2016 03:31:23 PM”. Charset.defaultCharset() UTF-8 Issue is only with Unix , Because when I load same xml at windows or my local eclipse it works fine and all values populate correctly. Also I tried to modify my code and set encoding as UTF-8 for inputSteamReader but still it’s not read value correctly at unix box. Note : There is no special characters in xml. Also noticed one thing that when I take out same records (those value not populated correctly) in other xml file and load in unix machine with same jar it works fine. It means issues occur while load these records with huge data. :( Please suggest , What should be the solution ?
- Amit