Go中如何判断符文是否为中文标点符号

How to Check If The Rune is Chinese Punctuation Character in Go

这样的中文标点符号,如何通过Go检测?

我尝试使用包 unicode 的范围 table,就像下面的代码一样,但是 Han 不包括那些标点字符。

你能告诉我我应该使用哪个范围 table 来完成这项任务吗? (请避免使用 regex,因为它的性能很低。)

for _, r := range strToDetect {
    if unicode.Is(unicode.Han, r) {
        return true
    }
}

标点符号分散在不同的 Unicode 代码块中。


The Unicode® Standard
Version 14.0 – Core Specification

Chapter 6
Writing Systems and Punctuation
https://www.unicode.org/versions/latest/ch06.pdf

Punctuation. The rest of this chapter deals with a special case: punctuation marks, which tend to be scattered about in different blocks and which may be used in common by many scripts. Punctuation characters occur in several widely separated places in the blocks, including Basic Latin, Latin-1 Supplement, General Punctuation, Supplemental Punctuation, and CJK Symbols and Punctuation. There are also occasional punctuation characters in blocks for specific scripts.


这是你的两个例子,

~波浪冲刺U+301C

。表意句号U+3002


package main

import (
    "fmt"
    "unicode"
)

func main() {
    // CJK Symbols and Punctuation Unicode block
    for r := rune('\u3000'); r <= '\u303F'; r++ {
        if unicode.IsPunct(r) {
            fmt.Printf("%[1]U\t%[1]c\n", r)
        }
    }
}

https://go.dev/play/p/WoJjM6JKTYR

U+3001  、
U+3002  。
U+3003  〃
U+3008  〈
U+3009  〉
U+300A  《
U+300B  》
U+300C  「
U+300D  」
U+300E  『
U+300F  』
U+3010  【
U+3011  】
U+3014  〔
U+3015  〕
U+3016  〖
U+3017  〗
U+3018  〘
U+3019  〙
U+301A  〚
U+301B  〛
U+301C  〜
U+301D  〝
U+301E  〞
U+301F  〟
U+3030  〰
U+303D  〽