Unicode 中的中文、日文和韩文字符是什么
What the Chinese, Japanese, and Korean characters are in Unicode
来自what I've gathered:
Hiragana is U+3040 to U+309F
Katakana is U+30A0 to U+30FF.
U+4E00..U+9FFF is part of the complete [Chinese] set, but not all.
The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD].
CJK (for Chinese, Japanese, Korean) encompasses all characters for the Chinese Hànzì, the Japanese Kanji and the Korean Hanja. (So they are all mixed).
链接的答案并未完全解释所有内容的位置。想知道是否有一个明确的答案,这样我就不必一个一个地检查每个字符了。
这是一个排序列表,包含中文、日文、韩文(以及一些越南文)中使用的所有内容
- U+1100..U+11FF: Hangul Jamo
- U+2E80..U+2EFFCJK Radicals Supplement
- U+2F00..U+2FDF: Kangxi Radicals
- U+3000..U+303F:CJK Symbols and Punctuation(可能不算字符,看你想做什么)
- U+3040..U+309F: Hiragana
- U+30A0..U+30FF: Katakana
- U+3100..U+312F: Bopomofo
- U+3130..U+318F:Hangul Compatibility Jamo
- U+3190..U+319F: Kanbun
- U+31A0..U+31BF: Bopomofo Extended
- U+31C0..U+31EF: CJK Strokes
- U+31F0..U+31FF: Katakana Phonetic Extensions
- U+31F0..U+31FF: Katakana Phonetic Extensions
- U+3200..U+32FF: Enclosed CJK Letters and Months
- U+3300..U+33FF: CJK Compatibility
- U+3400..U+4DBF:CJK Unified Ideographs Extension A
- U+4E00..U+9FEF:CJK Unified Ideographs
- U+A960..U+A97F:Hangul Jamo Extended-A
- U+AC00..U+D7A3:Hangul Syllables
- U+D7B0..U+D7FF: Hangul Jamo Extended-B
- U+F900..U+FAFF:CJK Compatibility Ideographs
- U+FE30..U+FE4F:CJK Compatibility Forms
- U+FF00..U+FFEF:Halfwidth and Fullwidth Forms。但这也包含标点符号和拉丁字母。真正的片假名和Jamo字符是从U+FF66到U+FFDD
- U+1B000..U+1B0FF: Kana Supplement
- U+1B100..U+1B12F: Kana Extended-A
- U+1B130..U+1B16F: Small Kana Extension
- U+1F200..U+1F2FF: Enclosed Ideographic Supplement
- U+20000..U+2A6DF: CJK Unified Ideographs Extension B
- U+2A700..U+2B73F: CJK Unified Ideographs Extension C
- U+2B740..U+2B81F:CJK Unified Ideographs Extension D
- U+2B820..U+2CEAF: CJK Unified Ideographs Extension E
- U+2CEB0..U+2EBEF:CJK Unified Ideographs Extension F
- U+2F800..U+2FA1F: CJK Compatibility Ideographs Supplement
- U+30000..U+3134F:CJK Unified Ideographs Extension G
so I don't have to go through each character one-by-one.
你应该检查 properties。这是针对 Unicode 12.1 的。
Script_Extensions:汉族(89513个字符)
U+02E80…U+02E99
U+02E9B…U+02EF3
U+02F00…U+02FD5
U+03001…U+03003
U+03005…U+03011
U+03013…U+0301F
U+03021…U+0302D
U+03030
U+03037…U+0303F
U+030FB
U+03190…U+0319F
U+031C0…U+031E3
U+03220…U+03247
U+03280…U+032B0
U+032C0…U+032CB
U+032FF
U+03358…U+03370
U+0337B…U+0337F
U+033E0…U+033FE
U+03400…U+04DB5
U+04E00…U+09FEF
U+0F900…U+0FA6D
U+0FA70…U+0FAD9
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+1D360…U+1D371
U+1F250…U+1F251
U+20000…U+2A6D6
U+2A700…U+2B734
U+2B740…U+2B81D
U+2B820…U+2CEA1
U+2CEB0…U+2EBE0
U+2F800…U+2FA1D
Script_Extensions:韩语(11775 个字符)
U+01100…U+011FF
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+0302E…U+03030
U+03037
U+030FB
U+03131…U+0318E
U+03200…U+0321E
U+03260…U+0327E
U+0A960…U+0A97C
U+0AC00…U+0D7A3
U+0D7B0…U+0D7C6
U+0D7CB…U+0D7FB
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FFA0…U+0FFBE
U+0FFC2…U+0FFC7
U+0FFCA…U+0FFCF
U+0FFD2…U+0FFD7
U+0FFDA…U+0FFDC
Script_Extensions:平假名(431 个字符)
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03041…U+03096
U+03099…U+030A0
U+030FB…U+030FC
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FF70
U+0FF9E…U+0FF9F
U+1B001…U+1B11E
U+1B150…U+1B152
U+1F200
Script_Extensions:片假名(356 个字符)
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03099…U+0309C
U+030A0…U+030FF
U+031F0…U+031FF
U+032D0…U+032FE
U+03300…U+03357
U+0FE45…U+0FE46
U+0FF61…U+0FF9F
U+1B000
U+1B164…U+1B167
来自what I've gathered:
Hiragana is U+3040 to U+309F
Katakana is U+30A0 to U+30FF.
U+4E00..U+9FFF is part of the complete [Chinese] set, but not all.
The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD].
CJK (for Chinese, Japanese, Korean) encompasses all characters for the Chinese Hànzì, the Japanese Kanji and the Korean Hanja. (So they are all mixed).
链接的答案并未完全解释所有内容的位置。想知道是否有一个明确的答案,这样我就不必一个一个地检查每个字符了。
这是一个排序列表,包含中文、日文、韩文(以及一些越南文)中使用的所有内容
- U+1100..U+11FF: Hangul Jamo
- U+2E80..U+2EFFCJK Radicals Supplement
- U+2F00..U+2FDF: Kangxi Radicals
- U+3000..U+303F:CJK Symbols and Punctuation(可能不算字符,看你想做什么)
- U+3040..U+309F: Hiragana
- U+30A0..U+30FF: Katakana
- U+3100..U+312F: Bopomofo
- U+3130..U+318F:Hangul Compatibility Jamo
- U+3190..U+319F: Kanbun
- U+31A0..U+31BF: Bopomofo Extended
- U+31C0..U+31EF: CJK Strokes
- U+31F0..U+31FF: Katakana Phonetic Extensions
- U+31F0..U+31FF: Katakana Phonetic Extensions
- U+3200..U+32FF: Enclosed CJK Letters and Months
- U+3300..U+33FF: CJK Compatibility
- U+3400..U+4DBF:CJK Unified Ideographs Extension A
- U+4E00..U+9FEF:CJK Unified Ideographs
- U+A960..U+A97F:Hangul Jamo Extended-A
- U+AC00..U+D7A3:Hangul Syllables
- U+D7B0..U+D7FF: Hangul Jamo Extended-B
- U+F900..U+FAFF:CJK Compatibility Ideographs
- U+FE30..U+FE4F:CJK Compatibility Forms
- U+FF00..U+FFEF:Halfwidth and Fullwidth Forms。但这也包含标点符号和拉丁字母。真正的片假名和Jamo字符是从U+FF66到U+FFDD
- U+1B000..U+1B0FF: Kana Supplement
- U+1B100..U+1B12F: Kana Extended-A
- U+1B130..U+1B16F: Small Kana Extension
- U+1F200..U+1F2FF: Enclosed Ideographic Supplement
- U+20000..U+2A6DF: CJK Unified Ideographs Extension B
- U+2A700..U+2B73F: CJK Unified Ideographs Extension C
- U+2B740..U+2B81F:CJK Unified Ideographs Extension D
- U+2B820..U+2CEAF: CJK Unified Ideographs Extension E
- U+2CEB0..U+2EBEF:CJK Unified Ideographs Extension F
- U+2F800..U+2FA1F: CJK Compatibility Ideographs Supplement
- U+30000..U+3134F:CJK Unified Ideographs Extension G
so I don't have to go through each character one-by-one.
你应该检查 properties。这是针对 Unicode 12.1 的。
Script_Extensions:汉族(89513个字符)
U+02E80…U+02E99
U+02E9B…U+02EF3
U+02F00…U+02FD5
U+03001…U+03003
U+03005…U+03011
U+03013…U+0301F
U+03021…U+0302D
U+03030
U+03037…U+0303F
U+030FB
U+03190…U+0319F
U+031C0…U+031E3
U+03220…U+03247
U+03280…U+032B0
U+032C0…U+032CB
U+032FF
U+03358…U+03370
U+0337B…U+0337F
U+033E0…U+033FE
U+03400…U+04DB5
U+04E00…U+09FEF
U+0F900…U+0FA6D
U+0FA70…U+0FAD9
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+1D360…U+1D371
U+1F250…U+1F251
U+20000…U+2A6D6
U+2A700…U+2B734
U+2B740…U+2B81D
U+2B820…U+2CEA1
U+2CEB0…U+2EBE0
U+2F800…U+2FA1D
Script_Extensions:韩语(11775 个字符)
U+01100…U+011FF
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+0302E…U+03030
U+03037
U+030FB
U+03131…U+0318E
U+03200…U+0321E
U+03260…U+0327E
U+0A960…U+0A97C
U+0AC00…U+0D7A3
U+0D7B0…U+0D7C6
U+0D7CB…U+0D7FB
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FFA0…U+0FFBE
U+0FFC2…U+0FFC7
U+0FFCA…U+0FFCF
U+0FFD2…U+0FFD7
U+0FFDA…U+0FFDC
Script_Extensions:平假名(431 个字符)
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03041…U+03096
U+03099…U+030A0
U+030FB…U+030FC
U+0FE45…U+0FE46
U+0FF61…U+0FF65
U+0FF70
U+0FF9E…U+0FF9F
U+1B001…U+1B11E
U+1B150…U+1B152
U+1F200
Script_Extensions:片假名(356 个字符)
U+03001…U+03003
U+03008…U+03011
U+03013…U+0301F
U+03030…U+03035
U+03037
U+0303C…U+0303D
U+03099…U+0309C
U+030A0…U+030FF
U+031F0…U+031FF
U+032D0…U+032FE
U+03300…U+03357
U+0FE45…U+0FE46
U+0FF61…U+0FF9F
U+1B000
U+1B164…U+1B167