Why does URLDecoder.decode() not work properly when I decode URL from non-ASCII into .txt file?

65 views Asked by At

I tried to decode this url %D0%9A%D1%80%D0%B0%D1%81%D0%B0%D0%B2%D0%B8%D1%86%D0%B5-%D0%BA%D0%BE%D1%82%D0%BE%D1%80%D0%B0%D1%8F-%D0%BD%D1%8E%D1%85%D0%B0%D0%BB%D0%B0-%D1%82%D0%B0%D0%B1%D0%B0%D0%BA, but the result when writing the URL to a .txt file was Кра�?авице-котора�?-нюхала-табак, including question marks and replacement characters.

However, that only happens when I build my project to an .exe file using package as EXE Installer from Netbeans IDE using Inno Setup Compiler, and I run it. The result of decoding the URL yields question marks and replacement characters, but if I run my project from Netbeans 8.2 it works fine; no question marks or replacement characters are shown in the URL in the .txt file. i use java version 1.8.0.

Here is my code

File newPath = new File(pathTextField.getText() + "\\" + filename.matcher(sitemapMatcher.group(i)).replaceAll("$2") + "_" + renameFile + ".txt");
FileWriter writeUrl = new FileWriter(newPath);

URL sitemapUrl = new URL(sitemapMatcher.group(i));
URLConnection sitemapUrlConn = sitemapUrl.openConnection();
sitemapUrlConn.setRequestProperty("User-Agent", userAgentArray[randomUserAgent.nextInt(userAgentArray.length)]);
sitemapUrlConn.connect();

BufferedReader sitemapUrlIn = new BufferedReader(new InputStreamReader(sitemapUrlConn.getInputStream(), StandardCharsets.UTF_8.name()));
String sitemapUrlInputLine;

if (clearCheck.isSelected()) {
    while ((sitemapUrlInputLine = sitemapUrlIn.readLine()) != null) {
        final Pattern sitemapUrlPattern = Pattern.compile(regex);
        final Matcher sitemapUrlMatcher = sitemapUrlPattern.matcher(sitemapUrlInputLine);

        while (sitemapUrlMatcher.find()) {
            for (int j = 1; j <= sitemapUrlMatcher.groupCount(); j++) {
                if (sitemapUrlMatcher.group(j) != null) {
                    if (regexArray.length > 1 && regexArray.length == replaceArray.length) {
                        final Pattern resultCleanUrl = Pattern.compile(regexArray[0]);
                        if (resultCleanUrl.matcher(sitemapUrlMatcher.group(j)).replaceAll(replaceArray[0]) != null) {
                            tempCleanURL = URLDecoder.decode(resultCleanUrl.matcher(sitemapUrlMatcher.group(j)).replaceAll(replaceArray[0]));
                            for (int indexArray = 1; indexArray < regexArray.length; indexArray++) {
                                if (indexArray + 1 == regexArray.length) {
                                    tempCleanURL = tempCleanURL.replaceAll(regexArray[indexArray], replaceArray[indexArray]);
                                    String cleanUrlSpaceText = tempCleanURL.replaceAll("(\\s+)", " ");
                                    writeUrl.write(cleanUrlSpaceText.trim() + "\n");
                                } else {
                                    tempCleanURL = tempCleanURL.replaceAll(regexArray[indexArray], replaceArray[indexArray]);
                                }
                            }
                        } else {
                            tempCleanURL = URLDecoder.decode(sitemapUrlMatcher.group(j));
                            writeUrl.write(tempCleanURL.trim() + "\n");
                        }
                    } else {
                        final Pattern resultCleanUrl = Pattern.compile(regexArray[0]);
                        if (resultCleanUrl.matcher(sitemapUrlMatcher.group(j)).replaceAll(replaceArray[0]) != null) {
                            tempCleanURL = URLDecoder.decode(resultCleanUrl.matcher(sitemapUrlMatcher.group(j)).replaceAll(replaceArray[0]));
                            writeUrl.write(tempCleanURL.trim() + "\n");
                        } else {
                            tempCleanURL = URLDecoder.decode(sitemapUrlMatcher.group(j));
                            writeUrl.write(tempCleanURL.trim() + "\n");
                        }
                    }
                } else {
                    writeUrl.write("sitemapUrlMatcher[j] null \n");
                }
            }
        }
    }
} else {
    while ((sitemapUrlInputLine = sitemapUrlIn.readLine()) != null) {
        final Pattern sitemapUrlPattern = Pattern.compile(regex);
        final Matcher sitemapUrlMatcher = sitemapUrlPattern.matcher(sitemapUrlInputLine);

        while (sitemapUrlMatcher.find()) {
            for (int j = 1; j <= sitemapUrlMatcher.groupCount(); j++) {
                if (sitemapUrlMatcher.group(j) != null) {
                    tempCleanURL = URLDecoder.decode(sitemapUrlMatcher.group(j));
                    writeUrl.write(tempCleanURL.trim() + "\n");
                } else {
                    writeUrl.write("sitemapUrlMatcher[j] null \n");
                }
            }
        }
    }
}

writeUrl.flush();
writeUrl.close();

This is the incorrect URL which is produced when running the .exe file:

Running with exe

This is the correct URL which is produced when running through Netbeans 8.2:

Running in NetBeans

0

There are 0 answers