How to convert from Java ASCII properties to UTF8 (Java 9) properties

Question

How to convert from Java ASCII properties to UTF8 (Java 9) properties

34 views Asked by Florian At 28 March 2024 at 23:13

I have an ASCII encoded Java properties file with unicode escapes (\u0123) in them that I need to convert to the new Java 9 UTF-8 format. So control character escapes (\r, \n, ...) need to stay but e.g. \u00E4 should become ä (UTF-8 encoded).

The motivation to convert them to UTF-8 is that it simplifies the workflow with translators.

I've tried multiple options with iconv and uconv (from ICU) but was unable to get a good result. Asking chat GPT also didn't yield a fully working solution.

This is not about troubles with UTF-8 Java properties files in editors or how to get UTF-8 properties files to work in Java pre-9.

Original Q&A

There are 1 answers

**Florian** · Answer 1 · 2024-03-28T23:13:13+00:00

cat messages.properties | awk -v RS='\\\\u[0-9a-fA-F]{4}' '{ORS=""; print $0; printf "%c",strtonum("0x"substr(RT,3)) } END {print ""}'

RS='\\\u[0-9a-fA-F]{4}' sets the record separator to a regex which matches unicode escapes

The {...} END {print ""} block is then run for every record (string without unicode escape plus one unicode escape).

ORS="" makes sure the record separator is not replaced by anything when printing $0

print $0 prints the string before the record separator

printf "%c",strtonum("0x"substr(RT,3)) First extracts the hex value from \u0123 (the substr part), then prepends '0x' so strtonum interprets it as hex, converts this to a number (strtonum) and prints it as a character code (printf "%c").

END {print ""} is a no-op but needed to close the previous {} block

TechQA.

How to convert from Java ASCII properties to UTF8 (Java 9) properties

There are 1 answers

Related Questions in JAVA

Related Questions in UTF-8

Related Questions in PROPERTIES

Popular Questions

Trending Questions