I'm looking for an XPath 1.0 query that would look somewhat like this:
//*[contains(text(), 'EXAMPLE') and translate(text(), translate(text(), '0123456789', ''), '') != '']
But the problem with this query is that there is no longer the text "EXAMPLE" after it performs the second part of the query, so it ultimately fails.
What I need is to match all elements that contain both the text "EXAMPLE" and any number of any digits.
I'm using Octoparse and it only supports XPath 1.0. Is that possible to achieve at all? I've tried asking ChatGPT a thousand times about this, but it keeps giving me the same illogical queries like the one above, which cannot work due to violating basic logic.
You haven't provided your source XML, so it's not possible to be sure why your query doesn't work for you. My guess, though, is that your bug is due to the behaviour of the
translateandcontainsfunctions when their first parameter is a nodeset containing more than one item. The String Functions section of the XPath 1.0 spec says:So if your XML looked like this:
... then the following expression would return
false:... because the expression
text()would return two text nodes which are children of<root>, and theEXAMPLEtext is contained in the second of those text nodes.Perhaps you should be checking the string-value of the elements themselves, rather than the value of their first text node? If you pass the element itself as the first parameter to the string functions, then that will be converted to a concatenation of all the text nodes contained within that element (including inside child elements). In XPath 1.0 it's not possible to concatenate just the child text nodes and exclude the text nodes within child elements.
e.g. you could try:
NB the result of this expression would include not just the leaf elements for which this is true, but also all the ancestors of that element, right up to the root element. To exclude those ancestor elements, you could use this expression:
That would exclude an element whose text value matched your criteria if it also contained a child element whose text value matched those criteria.