Does UTF-8 have more than one version?

Question

Does UTF-8 have more than one version?

110 views Asked by user8240761 At 23 February 2023 at 20:05

I read the following in PHP Manual > Language Reference > Types: Details of the String Type:

Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. For instance, is the string "á" equivalent to "\xE1" (ISO-8859-1), "\xC3\xA1" (UTF-8, C form), "\x61\xCC\x81" (UTF-8, D form) or any other possible representation?

What does "UTF-8, C form" and "UTF-8, D form" mean - are they two versions of UTF-8?

Original Q&A

There are 1 answers

**Ayb009** · Answer 1 · 2023-02-23T22:48:51+00:00

UTF-8 C form and UTF-8 D form are two alternate ways of encoding the same Unicode code points in UTF-8, with C form using a single code unit for characters that can be represented in ASCII, and D form using two code units for all characters. Example:

(é) in UTF-8 C is represented as two bytes: 0xC3 and 0xA9
(é) UTF-8 D is represented as a single code point: 0xE9

TechQA.

Does UTF-8 have more than one version?

There are 1 answers

Related Questions in PHP

Related Questions in UNICODE

Related Questions in UTF-8

Related Questions in UNICODE-NORMALIZATION

Popular Questions

Trending Questions