How do I convert ICU formatted strings into an TMX (Translation memory exchange) file?

99 views Asked by At

I am attempting to aggregate multiple data sources and locales into a single TMX translation memory file.

I cannot seem to find any good documentation/existing tools on how converting into TMX format might be achieved. These converters are the closest thing I have found but they do not appear to be sufficient for formatting ICU syntax.

Right now I have extracted my strings into JSON format which would look something like this:

{
  foo_id: {
    en: "This is a test",
    fr: "Some translation"
  },
  bar_id: {
    en: "{count, plural, one{This is a singular} other{This is a test for count #}}",
    fr: "{count, plural, one{Some translation} other{Some translation for count #}}"
  }
}

Based on how many translation vendors allow ICU formatting when submitting content and then exporting their TM as .tmx files it feels like this must be a solved problem but information seems scarce, does anyone have experience with this? I am using formatjs to write the ICU strings.

1

There are 1 answers

0
domspurling On

Since TMX only really supports plain segments with simple placeholders (not plural forms) it's not easy to convert from ICU to TMX.

Support for ICU seems pretty patchy in translation tools but there is another format which does a similar job and has better support: .po gettext. Going via .po to get to TMX might work:

  1. Use this tool ICU2po to convert from ICU to .po format
  2. Import the .po file into a TMS e.g. Phrase or a CAT tool e.g. Trados
  3. Run human/machine translation process
  4. Export a TMX