What caseless matching algorithm does Autocad use to compare layer names?

128 views Asked by At

Autocad DXF and DWG files use unicode strings to identify layers. I've determined experimentally that Autocad must employ some sort of case folding and normalisation (Autocad considers 'groß' and 'GROSS' to be the same, and 'Am\U+00e9lie' and 'Ame\U+0301lie' to be the same). I'd like to know in my own software if two layer names are the same according to Autocad. Default Caseless Matching algorithm from the Unicode standard seems to give me the right answer but I'd like to be sure.

  1. Can anyone conform that Default Caseless Matching is the algorithm used by Autocad? Or if it isn't what is.

  2. Are there test inputs I can use to distinguish between different caseless matching algorithms?

2

There are 2 answers

0
Peter Graham On BEST ANSWER

I intercepted the api calls and discovered that Autocad 2018 on Windows uses CompareStringW(LOCALE_USER_DEFAULT, NORM_IGNORECASE | SORT_STRINGSORT, ...) to check layer names for equality.

1
nwellnhof On

I don't have a definite answer, but the Unicode standard defines four algorithms for caseless matching:

  1. Default Caseless Matching (D144): This only uses (full) case folding but no normalization. Since you mentioned that Am\U+00e9lie and Ame\U+0301lie match, this variant can definitely be ruled out.

  2. Canonical caseless matching (D145): This uses (standard NFC or NFD) normalization in addition to case folding.

  3. Compatibility caseless matching (D146): This uses the "compatibility" (NFKC or NFKD) normalization form in addition to case folding.

  4. Identifier caseless matching (D147): Like compatibility caseless matching but also ignores Default Ignorable characters.

So I'd suggest the following additional tests:

  • If \U+0133 (LATIN SMALL LIGATURE IJ with a compatibility mapping) and ij match, then Autocad seems to use compatibility normalization and canonical caseless matching (D145) can be ruled out.

  • If A\U+00adB (SOFT HYPHEN with property Default_Ignorable_Code_Point) and AB match, then Autocad seems to ignore Default Ignorable characters and compatibility caseless matching (D146) can be ruled out.

It's of course possible that Autocad uses neither of the Unicode algorithms, but the tests above should help to narrow it down. Please consider to post any additional findings to help other users.