I'm to recursively list files and sub-directories of a given directory, and it's working so far (using dirent.h) but files featuring special characters such as the en dash or any Japanese or Chinese characters.
Full Code here https://gist.github.com/VikiMaster2/f14a19aa5cf042f0787467a37a616ded
I only get '?'s for files containing odd characters in their names. I understand that such characters cannot be displayed properly in a console and that dirent probably doesn't support non ASCII chars but how do I store all the paths to files and put them to use then?
Following is sample hexdump of a sample output(generated with simple command ./a.out>abcd.txt):
and the file is:
So now as you see that the non-ASCII character is a multibyte character and you can figure the encoding in which it is stored. Once you understand the encoding in which it is stored it is trivial to read it.
The simplest way to know the encoding is execute
file
command like:However, this is how redirection saves it. You can store it in any encoding you want with UTF-8 being a very particular/good choice. Now all you needs to handle is UTF-8 encoding. There are libraries which will help you with this but you can always try to do it yourself.
EDIT 1: I am sorry that I did not observe that you are on Windows and I used Linux for
file
command. I do not know if Windows has file command. But you can detect the presence of UTF-8 character by yourself in your code. It is very simple to code that and I think that you will be able to do it.