I was running some fuzzing on my code and it found a bug. I have reduced it down to the following code snippet and I cannot see what is wrong.
Given the string
s := string("\xc0")
The len(s) function returns 1. However, if you loop through the string the first rune is length 3.
for _, r := range s {
fmt.Println("len of rune:", utf8.RuneLen(r)) // Will print 3
}
My assumptions are:
len(string)is returning the number of bytes in the stringutf8.RuneLen(r)is returning the number of bytes in the rune
I assume I am misunderstanding something, but how can the length of a string be less than the length of one of it's runes?
Playground here: https://go.dev/play/p/SH3ZI2IZyrL
The explanation is simple: your input is not valid UTF-8 encoded string.
This outputs:
false.The
for rangeover astringranges over its runes, but if an invalid UTF-8 sequence is encountered, the Unicode replacement character0xFFFDis set forr. Spec: For statements:This applies to your case: you get
0xfffdforrwhich has 3 bytes using UTF-8 encoding.If you go with a valid string holding a
runeof\xc0:Then output is:
Try it on the Go Playground.