Kotlin 四字节 unicode 文字?

Kotlin four-byte unicode literals?

如何在 Kotlin 中声明包含四字节范围的 Char 范围?

private val CJK_IDEOGRAPHS_EXT_A = '\u3400' .. '\u4DBF'    // OK
private val CJK_IDEOGRAPHS_EXT_B = '\u20000' .. '\u2A6DF'  // doesn't compile

我尝试了以下 hack,但收到警告,"this cast can never succeed":

private val CJK_IDEOGRAPHS_EXT_B: CharRange = 0x20000 as Char .. 0x2A6DF as Char

基本上我想实现这样的功能:

fun isCJK(c: Char): Boolean {
    return c in CJK_RADICALS ||
        c in CJK_SYMBOLS ||
        c in CJK_STROKES ||
        c in CJK_ENCLOSED ||
        c in CJK_IDEOGRAPHS ||
        c in CJK_COMPAT ||
        c in CJK_COMPAT_IDEOGRAPHS ||
        c in CJK_COMPAT_FORMS ||
        c in CJK_IDEOGRAPHS_EXT_A
        // EXT_B not working
        // EXT_C not working
        // EXT_D not working
        // EXT_E not working
        // EXT_F not working
}

我在 Android 下使用 Kotlin。

在 JVM 上,Char 是 16 位 code unit and so the maximum code point it can represent is 0xFFFF; the ranges you mention are represented by surrogate pairs。所以你的函数应该采用 String 代替,例如

private val CJK_IDEOGRAPHS_EXT_B: IntRange = 0x20000 .. 0x2A6DF 
...

fun isCJK(s: String): Boolean {
    if (s.codePointCount(0, s.length) > 1) 
        throw new IllegalArgumentException("String \"$s\" contains more than 1 codepoint")
    val c = s.codePointAt(0)
    return c in CJK_RADICALS ||
        c in CJK_SYMBOLS ||
        c in CJK_STROKES ||
        c in CJK_ENCLOSED ||
        c in CJK_IDEOGRAPHS ||
        c in CJK_COMPAT ||
        c in CJK_COMPAT_IDEOGRAPHS ||
        c in CJK_COMPAT_FORMS ||
        c in CJK_IDEOGRAPHS_EXT_A ||
        c in CJK_IDEOGRAPHS_EXT_B || ...
}

Java 9有一个更方便的IntStream codePoints()方法,但是在Android上好像没有。