Golang 中的字符串和字符
strings and characters in Golang
我需要一点帮助来理解字符串是如何在 go 中管理的。
考虑以下 go 代码...
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
var1, var2 := 'a', 'ă'
fmt.Printf("For var1 - Char: %c, Type: %T, Value: %d\n", var1, var1, var1)
fmt.Printf("For var2 - Char: %c, Type: %T, Value: %d\nSo far so good, rune is an alias for int32 and the value is the Unicode Decimal Value\n\n", var2, var2, var2)
str := "aă"
fmt.Printf("%v is %v bytes \nI understand this, the a takes up one byte the ă takes up two bytes\n\n", str, len(str))
for i := 0; i < len(str); {
r, size := utf8.DecodeRuneInString(str[i:])
fmt.Printf("For character #%v in \"str\" (%q) Char: %c, Type: %T, Value: %d\n", i+1, str, r, r, r)
i += size
}
fmt.Printf("Same as above, done differently - the loop loops through the characters in the \nstring \"str\" by determining how much to jump in the underlying slice for the string\n")
fmt.Printf("the first iteration only goes over one position and then the next iteration \ngoes over two\n")
fmt.Println("\nNow lets go Byte by Byte ...")
fmt.Println("Byte (not rune) at position 0: ", str[0])
fmt.Println("Byte (not rune) at position 1: ", str[1])
fmt.Println("Byte (not rune) at position 2: ", str[2])
fmt.Println("Ok, I am a little confused. Position 0 holds the unicode decimal value of \"a\"")
fmt.Printf("but what is %v and %v to \"ă\" ?\n", str[1], str[2])
}
推出以下内容
For var1 - Char: a, Type: int32, Value: 97
For var2 - Char: ă, Type: int32, Value: 259
So far so good, rune is an alias for int32 and the value is the
Unicode Decimal Value aă is 3 bytes I understand this, the a takes up
one byte the ă takes up two bytes For character #1 in "str" ("aă")
Char: a, Type: int32, Value: 97
For character #2 in "str" ("aă") Char: ă, Type: int32, Value: 259
Same as above, done differently - the loop loops through the
characters in the string "str" by determining how much to jump in the
underlying slice for the string the first iteration only goes over one
position and then the next iteration goes over two Now lets go Byte
by Byte ...
Byte (not rune) at position 0: 97
Byte (not rune) at position 1: 196
Byte (not rune) at position 2: 131
Ok, I am a little confused. Position 0 holds the unicode decimal value
of "a" but what is 196 and 131 to "ă" ?
在此处检查 UTF-8 编码:
https://en.wikipedia.org/wiki/UTF-8
196: 110 00100
131: 10 000011
所以:00100 000011 -> 259
我需要一点帮助来理解字符串是如何在 go 中管理的。
考虑以下 go 代码...
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
var1, var2 := 'a', 'ă'
fmt.Printf("For var1 - Char: %c, Type: %T, Value: %d\n", var1, var1, var1)
fmt.Printf("For var2 - Char: %c, Type: %T, Value: %d\nSo far so good, rune is an alias for int32 and the value is the Unicode Decimal Value\n\n", var2, var2, var2)
str := "aă"
fmt.Printf("%v is %v bytes \nI understand this, the a takes up one byte the ă takes up two bytes\n\n", str, len(str))
for i := 0; i < len(str); {
r, size := utf8.DecodeRuneInString(str[i:])
fmt.Printf("For character #%v in \"str\" (%q) Char: %c, Type: %T, Value: %d\n", i+1, str, r, r, r)
i += size
}
fmt.Printf("Same as above, done differently - the loop loops through the characters in the \nstring \"str\" by determining how much to jump in the underlying slice for the string\n")
fmt.Printf("the first iteration only goes over one position and then the next iteration \ngoes over two\n")
fmt.Println("\nNow lets go Byte by Byte ...")
fmt.Println("Byte (not rune) at position 0: ", str[0])
fmt.Println("Byte (not rune) at position 1: ", str[1])
fmt.Println("Byte (not rune) at position 2: ", str[2])
fmt.Println("Ok, I am a little confused. Position 0 holds the unicode decimal value of \"a\"")
fmt.Printf("but what is %v and %v to \"ă\" ?\n", str[1], str[2])
}
推出以下内容
For var1 - Char: a, Type: int32, Value: 97
For var2 - Char: ă, Type: int32, Value: 259
So far so good, rune is an alias for int32 and the value is the Unicode Decimal Value aă is 3 bytes I understand this, the a takes up one byte the ă takes up two bytes For character #1 in "str" ("aă") Char: a, Type: int32, Value: 97
For character #2 in "str" ("aă") Char: ă, Type: int32, Value: 259
Same as above, done differently - the loop loops through the characters in the string "str" by determining how much to jump in the underlying slice for the string the first iteration only goes over one position and then the next iteration goes over two Now lets go Byte by Byte ...
Byte (not rune) at position 0: 97
Byte (not rune) at position 1: 196
Byte (not rune) at position 2: 131
Ok, I am a little confused. Position 0 holds the unicode decimal value of "a" but what is 196 and 131 to "ă" ?
在此处检查 UTF-8 编码:
https://en.wikipedia.org/wiki/UTF-8
196: 110 00100
131: 10 000011
所以:00100 000011 -> 259