C++ List files with special nonstandard characters

313 views Asked by At

I'm to recursively list files and sub-directories of a given directory, and it's working so far (using dirent.h) but files featuring special characters such as the en dash or any Japanese or Chinese characters.

Full Code here https://gist.github.com/VikiMaster2/f14a19aa5cf042f0787467a37a616ded

I only get '?'s for files containing odd characters in their names. I understand that such characters cannot be displayed properly in a console and that dirent probably doesn't support non ASCII chars but how do I store all the paths to files and put them to use then?

Demo picture

1

There are 1 answers

0
Shiv On

Following is sample hexdump of a sample output(generated with simple command ./a.out>abcd.txt):

00000000  20 20 2d 20 61 2e 6f 75  74 0a 20 20 2d 20 61 62  |  - a.out.  - ab|
00000010  63 64 2e 74 78 74 0a 20  20 2d 20 76 69 65 77 73  |cd.txt.  - views|
00000020  6f 75 72 63 65 2e 63 73  73 0a 20 20 2d 20 e0 a4  |ource.css.  - ..|
00000030  b2 e0 a5 87 0a 20 20 2d  20 74 65 73 74 2e 63 0a  |.....  - test.c.|

and the file is:

- a.out
- abcd.txt
- viewsource.css
- ले
- test.c

So now as you see that the non-ASCII character is a multibyte character and you can figure the encoding in which it is stored. Once you understand the encoding in which it is stored it is trivial to read it.

The simplest way to know the encoding is execute file command like:

$ file abcd.txt
abcd.txt: UTF-8 Unicode text

However, this is how redirection saves it. You can store it in any encoding you want with UTF-8 being a very particular/good choice. Now all you needs to handle is UTF-8 encoding. There are libraries which will help you with this but you can always try to do it yourself.

EDIT 1: I am sorry that I did not observe that you are on Windows and I used Linux for file command. I do not know if Windows has file command. But you can detect the presence of UTF-8 character by yourself in your code. It is very simple to code that and I think that you will be able to do it.