All accented charaters are turned into question marks

1.9k views Asked by At

I have a file that has accented charaters: ÇÍââÇÍ

I need to change them into ISO-8859-15 encoding

The code:

    String fileName = "C:/Users/User/AppData/Local/Temp/temp6893820181068878551.txt";

    File file = new File(fileName);
    FileInputStream fin = new FileInputStream(file);

    FileChannel ch = fin.getChannel();
    int size = (int) ch.size();
    MappedByteBuffer buf = ch.map(FileChannel.MapMode.READ_ONLY, 0, size);

     byte[] utf8bytes = new byte[size];
    buf.get(utf8bytes);

    System.out.println(new String(utf8bytes));  

    System.out.println();
    System.out.println();

        Charset utf8charset = Charset.forName("UTF-8");
        Charset iso88591charset = Charset.forName("ISO-8859-15");

        String string = new String ( utf8bytes, utf8charset );
        System.out.println(string);
        System.out.println();
        System.out.println();

        byte[] iso88591bytes = string.getBytes(iso88591charset);

        for ( byte b : iso88591bytes )
            System.out.printf("%02x ", b);

        System.out.println();
        System.out.println();

        String string2 = new String ( iso88591bytes, iso88591charset );

        System.out.println(string2);

But I get as output:

ÇÍââÇÍ


??????


3f 3f 3f 3f 3f 3f 

??????
2

There are 2 answers

0
AudioBubble On BEST ANSWER

I found the solution!

The problem was the file itself.

When writing to the original file, it must be in encoding UTF-8.

1
David Ekholm On

Try normalizing the string before calling .getBytes() on it, i.e. call Normalizer.normalize(string, Normalizer.Form.NFC)

The same looking accented characters can be represented in different unicode binary forms. Perhaps only the NFC form can be converted to iso-8859-15?