CGGlyph from CTRunGetGlyphs doesn't exist, how to combine manually?

65 views Asked by At

I'm using CTRunGetGlyphs on a NSAttributedString to the glyphs for the following string, I want to be able to get each Glyph by-itself to pass to another function.

Note: The ff should be a ligature and contains a unicode flag at the end

However, in my font, ff does not exist as a ligature.

"ffifion "

The glyphs are returned as:

1st CTRun
  - 0 : 398 (ff)
  - 1 : 257
  - 2 : 399 (fi)
  - 3 : 295
  - 4 : 287
  - 5 : 1

2nd CTRun
  - 0 : 428 

I'm looping over the glyphs:

extension NSMutableAttributedString {
func readGlyphs(size: CGSize) {

        var output: [Glyph] = []
        var characterCounter: Int = 0

        let attributedString = self

        let string = attributedString.string

        let framesetter = CTFramesetterCreateWithAttributedString(attributedString)
        let bounds = CGRect(x: 0, y: 0, width: size.width, height: size.height)

        let frame = CTFramesetterCreateFrame(framesetter, CFRange(), CGPath(rect: bounds, transform: nil), nil)

        let lines = CTFrameGetLines(frame) as? [CTLine] ?? []

        let linesCount = lines.count

        let lineOrigins: [CGPoint] = [CGPoint](unsafeUninitializedCapacity: linesCount) { (bufferPointer, count) in
            if let baseAddress = bufferPointer.baseAddress {
                CTFrameGetLineOrigins(frame, CFRange(), baseAddress)
                count = linesCount
            }
        }

        for i in 0..<lineOrigins.count {
            let line = lines[i]
            guard let runs = CTLineGetGlyphRuns(line) as? [CTRun] else {
                continue
            }

            for run in runs {

                let runGlyphsCount = CTRunGetGlyphCount(run)
                let glyphPositions = [CGPoint](unsafeUninitializedCapacity: runGlyphsCount) { (bufferPointer, count) in
                    if let baseAddress = bufferPointer.baseAddress {
                        CTRunGetPositions(run, CFRange(), baseAddress)
                        count = runGlyphsCount
                    }
                }

                let glyphs = [CGGlyph](unsafeUninitializedCapacity: runGlyphsCount) { (bufferPointer, count) in
                    if let baseAddress = bufferPointer.baseAddress {
                        CTRunGetGlyphs(run, CFRange(), baseAddress)
                        count = runGlyphsCount
                    }
                }

                guard var attributes: [String: Any] = (CTRunGetAttributes(run) as NSDictionary as? [String: Any]) else { return }
                   attributes = attributes
                    .reduce([:]) { (partialResult: [String: Any], tuple: (key: String, value: Any)) in
                       var result = partialResult
                       result[tuple.key] = tuple.value
                       return result
                   }

                // swiftlint:disable force_cast
                let font = attributes["NSFont"] as! CTFont
                // swiftlint:enable force_cast

                let map = createUnicodeFontMap(ctFont: font)

                var indices = Array(repeating: CFIndex(), count: runGlyphsCount)
                CTRunGetStringIndices(run, CFRange(), &indices)

                for k in 0..<glyphs.count {

                    let char: Character = Array(string)[k]
                    let scalar: UnicodeScalar? = map[glyphs[k]]
                    let indicie = indices[k]

                    let charValue: String
                    if let scalar {
                        charValue = String(scalar)
                    } else {
                        charValue = String(char)
                    }


                    //processCharacter(char: ?????)

                    characterCounter += 1
                }
            }
        }

    }
}

Used as

let _ = NSAttributedString("ffifion ").readGlyphs(size: CGSize(width: 380, height: 10_000))

The Problem

The first CGGlyph is nil, so how do I go about combining the first character with the next to make "ff", whilst allowing valid emoji and found Glyphs to work?. I feel like I should be looping over characters and combining until I get a valid scalar...but it also feels wrong.

Has anyone else dealt with this? I need a solution to work with any string.

I'm using the following method to get a CGGlyph map to UnicodeScalar

func createUnicodeFontMap(ctFont: CTFont) -> [CGGlyph: UnicodeScalar] {

    let charset = CTFontCopyCharacterSet(ctFont) as CharacterSet

    var glyphToUnicode = [CGGlyph: UnicodeScalar]() // Start with empty map.

    // Enumerate all Unicode scalar values from the character set:
    for plane: UInt8 in 0...16 where charset.hasMember(inPlane: plane) {
        for unicode in UTF32Char(plane) << 16 ..< UTF32Char(plane + 1) << 16 {
            if let uniChar = UnicodeScalar(unicode), charset.contains(uniChar) {

                // Get glyph for this `uniChar` ...
                let utf16 = Array(uniChar.utf16)
                var glyphs = [CGGlyph](repeating: 0, count: utf16.count)
                if CTFontGetGlyphsForCharacters(ctFont, utf16, &glyphs, utf16.count) {
                    // ... and add it to the map.
                    glyphToUnicode[glyphs[0]] = uniChar
                }
            }
        }
    }

    return glyphToUnicode
}

I've tried playing with the loop to get the correct glyph, however I feel like the approach may be wrong.

Update

This code seems to give me what I need:

var char: String = "" 
                    let firstIndex = runIndices[k]
                    if runIndices.indices.contains(k+1) {
                        let slice = firstIndex..<runIndices[k+1]
                        char = String(Array(string)[slice])
                    } else {
                        if Array(string).indices.contains(runIndices[k]) {
                            char = String(Array(string)[runIndices[k]])
                        }
                    }

It's becoming very clear, that Fonts/Unicode is a very deep subject.

1

There are 1 answers

0
Rob Napier On

If I understand your question, I believe you're trying to take a sequence of glyphs and convert them into Unicode code points. That's not possible because many glyphs have no Unicode code point. Glyphs are purely a function of fonts. Fonts are free to create glyphs for anything they want. Most famously Zapfino has a glyph for the sequence "Zapfino". Open TextEdit, select Zapfino, and start typing Z-a-p-f-i-n, and when you type 'o' you'll see it pop into place.

Zapfino ligature

That entire "Zapfino" is one glyph (it will be returned as a single CGGlyph). There is no such Unicode code point or character or anything else. It's specific to this font. Once in place, the cursor still can select individual characters within the glyph. It's purely about how it's drawn.

This is the same as the ff ligature. There does happen to be a Unicode character LATIN SMALL LIGATURE FF, but it's not involved here at all (and I think that's the source of your confusion). When you type ff and the font substitutes a glyph, that does not change the string. It's still two individual f's. The font just has a special way to draw them that has nothing to do with the special Unicode character. Not all fonts do this. Some may kern them closely together so they overlap a little. The font may be designed so ff looks nice without any special handling. Or ff might just look messy. It completely depends on the font.

In many cases, multiple Unicode character sequences map to the same glyph. In most cases, the 2-code point sequence LATIN SMALL LETTER E, COMBINING ACUTE ACCENT will be rendered using the same glyph as LATIN SMALL LETTER E WITH ACUTE: é.

On the other hand, the two code point sequence LATIN SMALL LETTER X, COMBINING ACUTE ACCENT will generally be rendered as , but is two glyphs, one on top of the other. Swift will correctly treat this as one Character, even though it's two glyphs. So there can be more characters than glyphs or more glyphs than characters. It all depends on the font.

This becomes more complex when considering font cascade lists. Fonts generally support only a subset of characters. For example, is generally not going to be in any of your fonts. It's most likely coming from the Apple Color Emoji font, which is one of the many fallbacks the system uses when a font does not support a given character. Note that is not a Unicode character in any case. It's a sequence of REGIONAL INDICATOR SYMBOL LETTER G (), REGIONAL INDICATOR SYMBOL LETTER B ().