Parse XML using Multiple Threads In Java

Aqeel Haider :

Im reading a large XML around 4 GB in java using JAXB, I have a good system with SSDs, RAM and multiple CPU cores. I want to read that XML file using multiple threads. I have research it but not found any solution yet.

I was thinking that if I can read the XML using multiple Threads and send the chunks of bytes to parse through XML parser it will be good, but wondering if a solution is already there with implementation.

My code Snippet is

public void parseXML() throws Exception{

    try(InputStream is = new BufferedInputStream(new FileInputStream(xmlFile),XML_READ_BUFFER)){
    //try(InputStream is = new ByteArrayInputStream(removeAnd.getBytes(StandardCharsets.UTF_16))){ 
        XMLInputFactory xmlif = XMLInputFactory.newInstance();
        XMLStreamReader sr = xmlif.createXMLStreamReader(is);

        JAXBContext ctx = JAXBContext.newInstance(XwaysImage.class);
        Unmarshaller unmar = ctx.createUnmarshaller();

        int c=0;
        while (sr.hasNext()){

            while(this.pause.get())Thread.sleep(100);
            if(this.cancel.get()) break;

            int eventType = sr.next();
            if(eventType == XMLStreamConstants.START_ELEMENT){
                if("ImageFile".equals(sr.getName().getLocalPart())){
                    XwaysImage xim = unmar.unmarshal(sr,XwaysImage.class).getValue();
                    //TODO code here. 
                }
            }
        }
        sr.close();
        is.close();
    }catch(Exception e){
        log.error("",e);
    }
}
Alex Chernyshev :

Since this is not a DOM-style parser, the low-level reading of XML file from disk is fast, especially from SSD. So don't think multi-threaded reading will help there.

But, multi-threaded processing of retrieved data could increase overall performance, so instead of 'read the XML using multiple Threads and send the chunks of bytes to parse' try to read in single thread, but process in parallel.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related