Java - Update Elements in Large XML Files

182 views Asked by At

I work with very large XML datasets (1 GB+) and need to backtrack and update specific elements per node, depending on the values of other elements that follow.

For example, in this record/node:

<user>
    <role>Associate</role>
    <team>Hufflepuff</team>
    <experience>7</experience>
</user>

Since "experience" is greater than 5 years, the role needs to be updated from "Associate" to "Senior."

I would like to avoid loading the entire file into memory via the DOM.

Ideally, I would process each single "user" in the XML and append the data to a new XML file one at a time. I started off by processing in a stream using StAX, but I don't know how to transform each XMLEventWriter event content into a useable DOM document that writes to an XML file and clears from memory afterwards.

If the description is unclear in any way, please let me know. Any help on this will be greatly appreciated.

Thanks.

1

There are 1 answers

3
Michael Kay On

Using streaming in XSLT 3.0, you can do

<xsl:template match="user" mode="streamed">
  <xsl:apply-templates select="copy-of(.)" mode="unstreamed"/>
</xsl:template>

and in the unstreamed mode you can then process the (copied) user element as a subtree in memory with no streaming restrictions.

I've done the same with SAX; it's easy enough when you hit a startElement event for user to start building a tree, and when the corresponding endElement event occurs, to process that tree any way you like.

I wouldn't use a pull API like StAX for this. I'm sure it can be done, but it's probably more effort.