Why is `tr` replacing one character with two?

65 views Asked by At

I am using tr (GNU coreutils v8.32) to transliterate non-basic-Latin characters into basic Latin, and it is replacing them with characters I didn't tell it to or more than one of the desired character.

Example:

% echo é | tr é e
> ee

What's going on?

2

There are 2 answers

0
Mark Setchell On BEST ANSWER

I think the issue is that tr is oriented to the transliteration of single bytes, but if you look at your é, you will see it is two bytes, plus a linefeed:

echo é | xxd                                         
00000000: c3a9 0a                                  ...

I think you need to look to sed which is oriented towards patterns, however long they may be:

echo éàéà | sed -e 's/é/elephant/g' -e 's/à/antelope/g'
elephantantelopeelephantantelope
2
Philippe On

é has two bytes, maybe that's why tr produces two e.

You can achieve the epxected effect with :

echo 'é' | iconv -t ASCII//TRANSLIT