marknorkin marknorkin - 2 months ago 19
Java Question

Spring Jaxb2: How to append batch data to XML file with no reading it to memory?

I need to write data to xml in batches.

There are following domain objects:

@XmlRootElement(name = "country")
public class Country {
@XmlElements({@XmlElement(name = "town", type = Town.class)})
private Collection<Town> towns = new ArrayList<>();


@XmlRootElement(name = "town")
public class Town {
private String townName;
// etc

I'm marhalling objects with Jaxb2. Configuration as follows:

marshaller = new Jaxb2Marshaller();
marshaller.setClassesToBeBound(Country.class, Town.class);

Because simple marshalling doesn't work here as
marhaller.marshall(fileName, country)
- it malformes xml.

Is there a way to tweek marhaller so that it would create file if it's not exists with all marhalled data or if exists just append it at the end of xml file ?

Also as this files are potentially large I don't want to read whole file in memory, append data and then write to disk.


I've used StAX for xml processing as it stream based, consumes less memory then DOM and has ability to read and write comparing to SAX which can only parse xml data, but can't write it.

The is the approach I came up with:

public enum StAXBatchWriter {
    private static final Logger LOGGER = LoggerFactory.getLogger(StAXBatchWriter.class);

    public void writeUrls(File original, Collection<Town> towns) {
        XMLEventReader eventReader = null;
        XMLEventWriter eventWriter = null;
        try {
            String originalPath = original.getPath();
            File from = new File(original.getParent() + "/old-" + original.getName());
            boolean isRenamed = original.renameTo(from);
            if (!isRenamed)
                throw new IllegalStateException("Failed to rename file: " + original.getPath() + " to " + from.getPath());
            File to = new File(originalPath);

            XMLInputFactory inFactory = XMLInputFactory.newInstance();
            eventReader = inFactory.createXMLEventReader(new FileInputStream(from));

            XMLOutputFactory outFactory = XMLOutputFactory.newInstance();
            eventWriter = outFactory.createXMLEventWriter(new FileWriter(to));

            XMLEventFactory eventFactory = XMLEventFactory.newInstance();

            while (eventReader.hasNext()) {
                XMLEvent event = eventReader.nextEvent();
                if (event.getEventType() == XMLEvent.START_ELEMENT && event.asStartElement().getName().toString().contains("country")) {
                    for (Town town : towns) {
                        writeTown(eventWriter, eventFactory, town);
            boolean isDeleted = from.delete();
            if (!isDeleted)
                throw new IllegalStateException("Failed to delete old file: " + from.getPath());
        } catch (IOException | XMLStreamException e) {
            LOGGER.error(e.getMessage(), e);
            throw new RuntimeException(e);
        } finally {
            try {
                if (eventReader != null)
            } catch (XMLStreamException e) {
                LOGGER.error(e.getMessage(), e);
            try {
                if (eventWriter != null)
            } catch (XMLStreamException e) {
                LOGGER.error(e.getMessage(), e);

    private void writeTown(XMLEventWriter eventWriter, XMLEventFactory eventFactory, Town town) throws XMLStreamException {
        eventWriter.add(eventFactory.createStartElement("", null, "town"));

        // write town id
        eventWriter.add(eventFactory.createStartElement("", null, "id"));
        eventWriter.add(eventFactory.createEndElement("", null, "id"));

        //write town name
        if (StringUtils.isNotEmpty(town.getName())) {
            eventWriter.add(eventFactory.createStartElement("", null, "name"));
            eventWriter.add(eventFactory.createEndElement("", null, "name"));

        // write other fields

        eventWriter.add(eventFactory.createEndElement("", null, "town"));

It's not the best approach, dispite the fact that it's stream based and it's working, it has some overhead. When a batch will be added - the old file has to be re-read.

It will be nice to have an option to append the data at some point in file (like "append data to that file after 4 line"), but seems this can't be done.