How to unescape JSON/Java with support to ampersand?

Question

How to unescape JSON/Java with support to ampersand?

490 views Asked by android developer At 14 May 2023 at 13:27

Background

I need to parse some string from HTML that is of a URL (seems it's inside JSON), so I tried to use org.apache.commons.text.StringEscapeUtils.unescapeJson.

An example of such a URL started with this as the input:

https:\/\/scontent.cdninstagram.com\/v\/t51.2885-19\/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\\u0026

The problem

It seems it had some characters that weren't handled so if I perform this:

val test="https:\\/\\/scontent.cdninstagram.com\\/v\\/t51.2885-19\\/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\\\\u0026\n"
Log.d("AppLog", "${StringEscapeUtils.unescapeJson(test)}")

the result is:

https://scontent.cdninstagram.com/v/t51.2885-19/40405422_462181764265305_1222152915674726400_n.jpg?stp=dst-jpg_s150x150\u0026

You can see that there is still "0026" in it, so I've found that using this solved it:

StringEscapeUtils.unescapeJson(input).replace("\\u0026","&").replace("\\/", "/")

This works, but I think I should use something more official, as it might fail due to too-direct replacing of substrings.

What I've tried

Looking at unescapeJson code (which is the same for Java&Json, it seems), I thought that maybe I could just add the rules:

/**based on StringEscapeUtils.unescapeJson, but with addition of 2 more rules*/
fun unescapeUrl(input: String): String {
    val unescapeJavaMap= hashMapOf<CharSequence, CharSequence>(
        "\\\\" to "\\",
        "\\\\" to "\\",
        "\\\"" to "\"",
        "\\'" to "'",
        "\\" to StringUtils.EMPTY,
        //added rules:
        "\\u0026" to "&",
        "\\/" to "/"
    )
    val aggregateTranslator = AggregateTranslator(
        OctalUnescaper(),
        UnicodeUnescaper(),
        LookupTranslator(EntityArrays.JAVA_CTRL_CHARS_UNESCAPE),
        LookupTranslator(Collections.unmodifiableMap(unescapeJavaMap))
    )
    return aggregateTranslator.translate(input)
}

This doesn't work. It leaves the string with "\u0026" in it.

The questions

What did I do wrong here? How can I fix this?
It is true it's best to use something similar to the original code, instead of using "replace", right?

BTW, I use this on Android using Kotlin, but same can be done on Java on PC.

Original Q&A

There are 1 answers

**Dmitry Khamitov** · Answer 1 · 2023-05-14T15:10:31+00:00

Let me just give you my working example using StringEscapeUtils.unescapeJson(input) without replace. I've also looked into the StringEscapeUtils source code, which might help you a bit.

Here is my working Kotlin code (Java works the same in my test).

fun main(args: Array<String>) {
    val input = "Hello ampersand \\u0026 and forward slash \\/"
    println(input)

    val output1 = StringEscapeUtils.unescapeJson(input)
    println(output1)

    val output2 = StringEscapeUtils.unescapeJson(input).replace("\\u0026", "&").replace("\\/", "/")
    println(output2)
}

Output:

Hello ampersand \u0026 and forward slash \/ - original input
Hello ampersand & and forward slash /       - StringEscapeUtils.unescapeJson(input)
Hello ampersand & and forward slash /       - StringEscapeUtils.unescapeJson(input).replace...

As you can see, the outputs are identical regardless of using the replace logic. I'm using org.apache.commons:commons-text:1.10.0.

If we look into their source code, it's could be clear that we don't have to add any replace("\\u0026", "&").replace("\\/", "/") because:

the escaped-unicode representation of ampersand is handled by UnicodeUnescaper. You can see it being used in your unescapeUrl originally replicated from UNESCAPE_JAVA implementation.
the \\/ string is handled by another existing rule at UNESCAPE_JAVA, which is unescapeJavaMap.put("\\", StringUtils.EMPTY) and also replicated in your unescapeUrl.

So, answering your questions (NB: also see the UPDATE below taking into account the "broken" input from the author, which was posted later):

Not obvious what is wrong in your example using just StringEscapeUtils.unescapeJson(input) as you can see, it works in my Kotlin example (Java as well). Maybe the version of the "common-text" library? But I doubt that. I'm also using PC, not Android. See the UPDATE below explaining the "broken" input posted later by the author and how to deal with that.
It is true, totally agree. And in this particular example, not even "something similar". You should be fine using the out-of-the-box method. No need to customise that in either way.

I hope, this answer helps. Also, as it was mentioned in the comments, a good example from you would be very helpful!

UPDATE: Looking into the author's example (posted later), I can see that the escaped-unicode representation of ampersand is sort of double-escaped in the input as \\u0026 instead of \u0026. Thus, the problem. If you look into the source code of that UNESCAPE_JAVA (UNESCAPE_JSON), you will see that the \\ string get transformed into a single backslash \ as in unescapeJavaMap.put("\\\\", "\\"), and then in that translators iteration the index advances by 2 as two characters have been replaced, which places the index at the u character.

I would say, this is the upstream problem that sends you a badly formatted string. Ideally, it should be fixed so that they don't double escape the characters represented in escaped-unicode format. Then \\u0026 should become \u0026.

You can also compose your own AggregateTranslator the way it properly handles this scenario. There might be few options but they could all be error-prone and stop working properly in other scenarios. So, you have to be careful with that.

You can also run the unescapeJson method twice and it works in your particular example as in StringEscapeUtils.unescapeJson(StringEscapeUtils.unescapeJson(input)). But obviously, you could easily over-unescape the input.

TechQA.

How to unescape JSON/Java with support to ampersand?

Background

The problem

What I've tried

The questions

There are 1 answers

Related Questions in JAVA

Related Questions in ANDROID

Related Questions in KOTLIN

Related Questions in ESCAPING

Related Questions in STRINGESCAPEUTILS

Popular Questions

Trending Questions