I'm having a hard time escaping xml to be processed by Java. I'm using JTidy to escape unwanted characters, but struggle to remove "<" and ">" from values such as <tag> capacity < 1000 </tag>
I'm using below code to escape the input
public String CleanXML(String input){
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-16");
tidy.setOutputEncoding("UTF-16");
tidy.setWraplen(Integer.MAX_VALUE);
tidy.setXmlOut(true);
tidy.setSmartIndent(true);
tidy.setXmlTags(true);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
tidy.setQuiet(true);
tidy.setShowWarnings(false);
StringReader in = new StringReader(input);
StringWriter out = new StringWriter();
tidy.parse(in, out);
return out.toString();
}
use following function
It uses regular expression search to get values between tags then, remove all non alphanumeric characters. Regular expressions and basic idea was gained from Java regex to extract text between tags