I have an ASCII encoded Java properties file with unicode escapes (\u0123) in them that I need to convert to the new Java 9 UTF-8 format. So control character escapes (\r, \n, ...) need to stay but e.g. \u00E4 should become ä (UTF-8 encoded).
The motivation to convert them to UTF-8 is that it simplifies the workflow with translators.
I've tried multiple options with iconv and uconv (from ICU) but was unable to get a good result. Asking chat GPT also didn't yield a fully working solution.
This is not about troubles with UTF-8 Java properties files in editors or how to get UTF-8 properties files to work in Java pre-9.
RS='\\\u[0-9a-fA-F]{4}'sets the record separator to a regex which matches unicode escapesThe
{...} END {print ""}block is then run for every record (string without unicode escape plus one unicode escape).ORS=""makes sure the record separator is not replaced by anything when printing $0print $0prints the string before the record separatorprintf "%c",strtonum("0x"substr(RT,3))First extracts the hex value from \u0123 (the substr part), then prepends '0x' so strtonum interprets it as hex, converts this to a number (strtonum) and prints it as a character code (printf "%c").END {print ""}is a no-op but needed to close the previous {} block