Kotlin 四字节 unicode 文字?
Kotlin four-byte unicode literals?
如何在 Kotlin 中声明包含四字节范围的 Char
范围?
private val CJK_IDEOGRAPHS_EXT_A = '\u3400' .. '\u4DBF' // OK
private val CJK_IDEOGRAPHS_EXT_B = '\u20000' .. '\u2A6DF' // doesn't compile
我尝试了以下 hack,但收到警告,"this cast can never succeed":
private val CJK_IDEOGRAPHS_EXT_B: CharRange = 0x20000 as Char .. 0x2A6DF as Char
基本上我想实现这样的功能:
fun isCJK(c: Char): Boolean {
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A
// EXT_B not working
// EXT_C not working
// EXT_D not working
// EXT_E not working
// EXT_F not working
}
我在 Android 下使用 Kotlin。
在 JVM 上,Char
是 16 位 code unit and so the maximum code point it can represent is 0xFFFF; the ranges you mention are represented by surrogate pairs。所以你的函数应该采用 String
代替,例如
private val CJK_IDEOGRAPHS_EXT_B: IntRange = 0x20000 .. 0x2A6DF
...
fun isCJK(s: String): Boolean {
if (s.codePointCount(0, s.length) > 1)
throw new IllegalArgumentException("String \"$s\" contains more than 1 codepoint")
val c = s.codePointAt(0)
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A ||
c in CJK_IDEOGRAPHS_EXT_B || ...
}
Java 9有一个更方便的IntStream codePoints()
方法,但是在Android上好像没有。
如何在 Kotlin 中声明包含四字节范围的 Char
范围?
private val CJK_IDEOGRAPHS_EXT_A = '\u3400' .. '\u4DBF' // OK
private val CJK_IDEOGRAPHS_EXT_B = '\u20000' .. '\u2A6DF' // doesn't compile
我尝试了以下 hack,但收到警告,"this cast can never succeed":
private val CJK_IDEOGRAPHS_EXT_B: CharRange = 0x20000 as Char .. 0x2A6DF as Char
基本上我想实现这样的功能:
fun isCJK(c: Char): Boolean {
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A
// EXT_B not working
// EXT_C not working
// EXT_D not working
// EXT_E not working
// EXT_F not working
}
我在 Android 下使用 Kotlin。
在 JVM 上,Char
是 16 位 code unit and so the maximum code point it can represent is 0xFFFF; the ranges you mention are represented by surrogate pairs。所以你的函数应该采用 String
代替,例如
private val CJK_IDEOGRAPHS_EXT_B: IntRange = 0x20000 .. 0x2A6DF
...
fun isCJK(s: String): Boolean {
if (s.codePointCount(0, s.length) > 1)
throw new IllegalArgumentException("String \"$s\" contains more than 1 codepoint")
val c = s.codePointAt(0)
return c in CJK_RADICALS ||
c in CJK_SYMBOLS ||
c in CJK_STROKES ||
c in CJK_ENCLOSED ||
c in CJK_IDEOGRAPHS ||
c in CJK_COMPAT ||
c in CJK_COMPAT_IDEOGRAPHS ||
c in CJK_COMPAT_FORMS ||
c in CJK_IDEOGRAPHS_EXT_A ||
c in CJK_IDEOGRAPHS_EXT_B || ...
}
Java 9有一个更方便的IntStream codePoints()
方法,但是在Android上好像没有。