XMLBeam is a nice XML to POJO unmarshaler (via XPath), but it only allows you to configure a DocumentBuilder or DocumentBuilderFactory.
TagSoup is a nice SAX parser that lets you parse nasty HTML documents as though they were XML.
I would like to use TagSoup as the XML parser for XMLBeam, so that I can unmarshal nasty HTML to POJOs using XPath.
Is there a way to convert or wrap a SAX parser, so that I can use it as a DocumentBuilder or DocumentBuilderFactory?
You can wrap SAX in a DocumentBuilder. XMLBeam only uses the parse(InputSource) method of DocumentBuilder, so it's pretty simple:
Then, elsewhere you can tell XMLBeam to use your DocumentBuilder: