Sometimes while programming in java, we get String
which is actually an XML and to process it, we need to convert it to XML Document (org.w3c.dom.Document
). Also for debugging purpose or to send to some other function, we might need to convert Document object to String. Here I am providing two utility functions.
Document convertStringToDocument(String xmlStr)
: This method will take input as String and then convert it to DOM Document and return it. We will use InputSource and StringReader for this conversion.String convertDocumentToString(Document doc)
: This method will take input as Document and convert it to String. We will use Transformer
, StringWriter
and StreamResult
for this purpose.package com.journaldev.xml;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
public class StringToDocumentToString {
public static void main(String[] args) {
final String xmlStr = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"+
"<Emp id=\"1\"><name>Pankaj</name><age>25</age>\n"+
"<role>Developer</role><gen>Male</gen></Emp>";
Document doc = convertStringToDocument(xmlStr);
String str = convertDocumentToString(doc);
System.out.println(str);
}
private static String convertDocumentToString(Document doc) {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer;
try {
transformer = tf.newTransformer();
// below code to remove XML declaration
// transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(doc), new StreamResult(writer));
String output = writer.getBuffer().toString();
return output;
} catch (TransformerException e) {
e.printStackTrace();
}
return null;
}
private static Document convertStringToDocument(String xmlStr) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
Document doc = builder.parse( new InputSource( new StringReader( xmlStr ) ) );
return doc;
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
}
When we run above program, we get the same String output that we used to create DOM Document.
<?xml version="1.0" encoding="UTF-8"?><Emp id="1"><name>Pankaj</name><age>25</age>
<role>Developer</role><gen>Male</gen></Emp>
You can use replaceAll("\n|\r", "")
to remove new line characters from String and get it in compact format.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
I m using same code but stringwriter truncates before printing entire xml in to string…if u can help why it is happening?
- Rishi Naik
is your xml very long? are you running on Eclipse or command line, try to write it on File and check if it’s writing full content or not.
- Pankaj
I need to convert XML String to XML SAX document…how can that be done?
- simran
Hi, this line Document doc = builder.parse( new InputSource( new StringReader( xmlStr ) ) ); Gives me an error when i’m running… Fatal Error: XML document structures must start and end within the same entity. My xmlStr = " 1 2 3 "; There is something I’m not doing right? Thank you for you help and article!
- German
Your string is not a valid xml.
- Pankaj
Thanks Pankaj you r a lifesaver
- Deepu
Hi, Am using the above code example but getting null value returning in document
- ragu
Getting below error for DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); Exception in thread “main” javax.xml.parsers.FactoryConfigurationError: Provider for class javax.xml.parsers.DocumentBuilderFactory cannot be created at javax.xml.parsers.FactoryFinder.findServiceProvider(Unknown Source) at javax.xml.parsers.FactoryFinder.find(Unknown Source) at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source)
- RInu
Getting the null from builder.parse( new InputSource( new StringReader( xmlStr ) ) ); … I validated my xml, it’s valid
- Anuj
successfully executed but did not found useful. I want convert doc file into xml
- ahmad
hi can u please tell me how to convert doc file into xml using java code
- sunil
package com.avankia.sunil; import java.io.ByteArrayInputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.util.logging.Level; import java.util.logging.Logger; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMResult; import javax.xml.transform.dom.DOMSource; import org.w3c.dom.Document; public class DocToXmlResumeConvertor { // get path of xsl file private static String styleSheetPath = SystemManager.getInstance() .getConfigUrl().getPath() + “xhtml2fo.xsl”; // static String styleSheetPath = null; static java.util.logging.Logger logger = Logger .getLogger(DocToXmlResumeConvertor.class.getName()); private static Document xml2FO(Document xml, String styleSheetPath) throws Exception { DOMSource xmlDomSource = new DOMSource(xml); DOMResult domResult = new DOMResult(); Transformer transformer = getTransformer(styleSheetPath); if (transformer == null) { throw new Exception(“Error in creating trnasformer”); } try { transformer.transform(xmlDomSource, domResult); } catch (javax.xml.transform.TransformerException e) { logger.log(Level.INFO, “Error in transforming xml to xsl-fo: " + e.getMessage()); return null; } return (Document) domResult.getNode(); } private static Transformer getTransformer(String styleSheetPath) { try { TransformerFactory tFactory = TransformerFactory.newInstance(); DocumentBuilderFactory dFactory = DocumentBuilderFactory .newInstance(); dFactory.setNamespaceAware(true); DocumentBuilder dBuilder = dFactory.newDocumentBuilder(); Document xslDoc = dBuilder.parse(new File(styleSheetPath)); logger.log(Level.INFO, xslDoc.getTextContent()); DOMSource xslDomSource = new DOMSource(xslDoc); return tFactory.newTransformer(xslDomSource); } catch (javax.xml.transform.TransformerException e) { logger.log(Level.SEVERE, “”, e); return null; } catch (java.io.IOException e) { logger.log(Level.SEVERE, “”, e); return null; } catch (javax.xml.parsers.ParserConfigurationException e) { logger.log(Level.SEVERE, “”, e); return null; } catch (org.xml.sax.SAXException e) { logger.log(Level.SEVERE, “”, e); return null; } } /* private static byte[] fo2PDF(Document foDocument) { FopFactory fopFactory = FopFactory.newInstance(); try { ByteArrayOutputStream out = new ByteArrayOutputStream(); Fop fop = fopFactory.newFop(MimeConstants.MIME_PDF, out); TransformerFactory tFactory = TransformerFactory.newInstance(); Transformer transformer = tFactory.newTransformer(); Source src = new DOMSource(foDocument); Result res = new SAXResult(fop.getDefaultHandler()); transformer.transform(src, res); return out.toByteArray(); } catch (Exception ex) { logger.log(Level.SEVERE, “”, ex); return null; } } */ public static byte[] getXmlResumeBytes(byte[] bytes) throws Exception { byte[] XmlBytes = null; ByteArrayInputStream input = new ByteArrayInputStream(bytes); //final HtmlCleaner cleaner = new HtmlCleaner(); CleanerProperties props = cleaner.getProperties(); DomSerializer doms = new DomSerializer(props, true); Document xmlDoc = null; try { TagNode node = cleaner.clean(input, “UTF-8”); xmlDoc = doms.createDOM(node); // System.out.println(xmlDoc.getFirstChild().getTextContent()); } catch (Exception e) { throw e; } Document foDoc = null; try { foDoc = xml2FO(xmlDoc, styleSheetPath); // System.out.println(foDoc.getFirstChild().getTextContent()); } catch (Exception e) { logger.log(Level.INFO, “ERROR: " + e.getMessage()); throw e; } //XmlBytes = fo2PDF(foDoc); input.close(); if (XmlBytes != null) { logger.log(Level.INFO, “your doc has been converted into xml”); } else { String errorString = “doc File is not converted into xml properly”; XmlBytes = errorString.getBytes(); } return XmlBytes; } public static byte[] readBytes(String fileName) { FileInputStream fileInputStream = null; byte[] bytes = null; try { File file = new File(fileName); System.out.println(fileName); bytes = new byte[(int) file.length()]; fileInputStream = new FileInputStream(file); fileInputStream.read(bytes); fileInputStream.close(); return bytes; } catch (Exception ie) { bytes = null; logger.log(Level.SEVERE, “”, ie); return bytes; } } public static void main(String[] args) { // TODO Auto-generated method stub String htmlFileName = “C://Users//raktim//Downloads//ava.doc”; styleSheetPath = “D:/WORKAREA/AVANKIA/ResumeParser/src/www/WEB-INF/conf/xhtml2fo.xsl”; File htmlFile = new File(htmlFileName); byte[] XmlBytes = new byte[(int) htmlFile.length()]; File XmlFile = new File(htmlFileName.replace(”.doc”, “.Xml”)); FileOutputStream fop = null; try { pdfBytes = readBytes(htmlFileName); fop = new FileOutputStream(XmlFile); byte[] newBytes = DocToXmlResumeConvertor .getXmlResumeBytes(pdfBytes); fop.write(newBytes); fop.flush(); fop.close(); System.out.println(“Done”); } catch (Exception e) { logger.log(Level.SEVERE, “”, e); } } }
- sunil
The variable doc allways return null
- mmonikm
Even for me :(
- Sridhar Raj
Please check the method carefully, that’s only in case of an exception.
- Pankaj
Where is the replaceAll() method supposed to be used? I was thinking it should be placed on the string str before printing it out, like so: String str = convertDocumentToString(doc); str.replaceAll(“\n|\r”, “”); System.out.println(str); But the output doesn’t change…
- nekonutchi
Same problem it doesn’t work…!
- Shailesh
return output.replaceAll("\n|\r", "");
inconvertDocumentToString
method. Come on guys, use some brains yourself too.- Pankaj