Golang 中的字符串和字符

strings and characters in Golang

我需要一点帮助来理解字符串是如何在 go 中管理的。

考虑以下 go 代码...

package main

import (
    "fmt"
    "unicode/utf8"
)

func main() {

    var1, var2 := 'a', 'ă'

    fmt.Printf("For var1 - Char: %c, Type: %T, Value: %d\n", var1, var1, var1)
    fmt.Printf("For var2 - Char: %c, Type: %T, Value: %d\nSo far so good, rune is an alias for int32 and the value is the Unicode Decimal Value\n\n", var2, var2, var2)

    str := "aă"

    fmt.Printf("%v is %v bytes \nI understand this, the a takes up one byte the ă takes up two bytes\n\n", str, len(str))

    for i := 0; i < len(str); {
        r, size := utf8.DecodeRuneInString(str[i:])
        fmt.Printf("For character #%v in \"str\" (%q) Char: %c, Type: %T, Value: %d\n", i+1, str, r, r, r)
        i += size
    }
    fmt.Printf("Same as above, done differently - the loop loops through the characters in the \nstring \"str\" by determining how much to jump in the underlying slice for the string\n")
    fmt.Printf("the first iteration only goes over one position and then the next iteration \ngoes over two\n")

    fmt.Println("\nNow lets go Byte by Byte ...")
    fmt.Println("Byte (not rune) at position 0: ", str[0])
    fmt.Println("Byte (not rune) at position 1: ", str[1])
    fmt.Println("Byte (not rune) at position 2: ", str[2])
    fmt.Println("Ok, I am a little confused. Position 0 holds the unicode decimal value of \"a\"")
    fmt.Printf("but what is %v and %v to  \"ă\" ?\n", str[1], str[2])

}


推出以下内容

For var1 - Char: a, Type: int32, Value: 97

For var2 - Char: ă, Type: int32, Value: 259

So far so good, rune is an alias for int32 and the value is the Unicode Decimal Value aă is 3 bytes I understand this, the a takes up one byte the ă takes up two bytes For character #1 in "str" ("aă") Char: a, Type: int32, Value: 97

For character #2 in "str" ("aă") Char: ă, Type: int32, Value: 259

Same as above, done differently - the loop loops through the characters in the string "str" by determining how much to jump in the underlying slice for the string the first iteration only goes over one position and then the next iteration goes over two Now lets go Byte by Byte ...

Byte (not rune) at position 0: 97

Byte (not rune) at position 1: 196

Byte (not rune) at position 2: 131

Ok, I am a little confused. Position 0 holds the unicode decimal value of "a" but what is 196 and 131 to "ă" ?

在此处检查 UTF-8 编码:

https://en.wikipedia.org/wiki/UTF-8

196: 110 00100

131: 10 000011

所以:00100 000011 -> 259