具有超过 1 个字母拉丁字符的 Unicode 字母?
Unicode letters with more than 1 alphabetic latin character?
我不太确定如何表达它,但我正在搜索不止一个视觉拉丁字母的 unicode 字母。
到目前为止,我在 Word 中找到了这个:
- DZ
- Dz
- dz
- NJ
- Lj
- LJ
- Nj
- nj
还有其他人吗?
这是我发现的一些角色。我首先通过查看一些可能的块来手动完成此操作。但是我后来写了一个 Python 脚本来自动执行此操作,您可以在这个答案的末尾找到
Two Glyphs
Digraph
Unicode Code Point
HTML
DZ, Dz, dz
DZ, Dz, dz
U+01F1 U+01F2 U+01F3
DZ Dz dz
DŽ, Dž, dž
DŽ, Dž, dž
U+01C4 U+01C5 U+01C6
DŽ Dž dž
IJ, ij
IJ, ij
U+0132 U+0133
IJ ij
LJ, Lj, lj
LJ, Lj, lj
U+01C7 U+01C8 U+01C9
LJ Lj lj
NJ, Nj, nj
NJ, Nj, nj
U+01CA U+01CB U+01CC
NJ Nj nj
Non-ligature
Ligature
Unicode
HTML
AA, aa
Ꜳ, ꜳ
U+A732, U+A733
Ꜳ ꜳ
AE, ae
Æ, æ
U+00C6, U+00E6
Æ æ
AO, ao
Ꜵ, ꜵ
U+A734, U+A735
Ꜵ ꜵ
AU, au
Ꜷ, ꜷ
U+A736, U+A737
Ꜷ ꜷ
AV, av
Ꜹ, ꜹ
U+A738, U+A739
Ꜹ ꜹ
AV, av (with bar)
Ꜻ, ꜻ
U+A73A, U+A73B
Ꜻ ꜻ
AY, ay
Ꜽ, ꜽ
U+A73C, U+A73D
Ꜽ ꜽ
et
U+1F670
🙰
ff
ff
U+FB00
ff
ffi
ffi
U+FB03
ffi
ffl
ffl
U+FB04
ffl
fi
fi
U+FB01
fi
fl
fl
U+FB02
fl
OE, oe
Œ, œ
U+0152, U+0153
Œ œ
OO, oo
Ꝏ, ꝏ
U+A74E, U+A74F
Ꝏ ꝏ
ſs, ſz
ẞ, ß
U+1E9E, U+00DF
ß
st
st
U+FB06
st
ſt
ſt
U+FB05
ſt
TZ, tz
Ꜩ, ꜩ
U+A728, U+A729
Ꜩ ꜩ
ue
ᵫ
U+1D6B
ᵫ
VY, vy
Ꝡ, ꝡ
U+A760, U+A761
Ꝡ ꝡ
还有一些用于音标但看起来像拉丁字符的其他连字
Non-ligature
Ligature
Unicode
HTML
db
ȸ
U+0238
ȸ
dz
ʣ
U+02A3
ʣ
IJ, ij
IJ, ij
U+0132, U+0133
IJ ij
ls
ʪ
U+02AA
ʪ
lz
ʫ
U+02AB
ʫ
qp
ȹ
U+0239
ȹ
ts
ʦ
U+02A6
ʦ
ui
ꭐ
U+AB50
ꭐ
turned ui
ꭑ
U+AB51
ꭑ
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode#Digraphs_and_ligatures
编辑:
℻ 和 ℡ 旁边还有更多 letterlike symbols 就像 OP 在评论中发现的那样:
℀ ℁ ⅍ ℅ ℆ ℔ ℠ ™
较长的字母主要来自CJK Compatibility块
U+XXXX
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
U+338x
㎀
㎁
㎂
㎃
㎄
㎅
㎆
㎇
㎈
㎉
㎊
㎋
㎌
㎍
㎎
㎏
U+339x
㎐
㎑
㎒
㎓
㎔
㎕
㎖
㎗
㎘
㎙
㎚
㎛
㎜
㎝
㎞
㎟
U+33Ax
㎠
㎡
㎢
㎣
㎤
㎥
㎦
㎧
㎨
㎩
㎪
㎫
㎬
㎭
㎮
㎯
U+33Bx
㎰
㎱
㎲
㎳
㎴
㎵
㎶
㎷
㎸
㎹
㎺
㎻
㎼
㎽
㎾
㎿
U+33Cx
㏀
㏁
㏂
㏃
㏄
㏅
㏆
㏇
㏈
㏉
㏊
㏋
㏌
㏍
㏎
㏏
U+33Dx
㏐
㏑
㏒
㏓
㏔
㏕
㏖
㏗
㏘
㏙
㏚
㏛
㏜
㏝
㏞
㏟
3-letter-like中的符号是㎈㎑㎒㎓㎔㏒㏕㏖㏙㎪㎫㎬㎭㏆㏿㍱...大概是㎉和㎯
Unicode 甚至有 Roman numerals 的代码点。这里可以找到另一个4-letter-like字符:Ⅷ
U+XXXX
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
U+215x
⅐
⅑
⅒
⅓
⅔
⅕
⅖
⅗
⅘
⅙
⅚
⅛
⅜
⅝
⅞
⅟
U+216x
Ⅰ
Ⅱ
Ⅲ
Ⅳ
Ⅴ
Ⅵ
Ⅶ
Ⅷ
Ⅸ
Ⅹ
Ⅺ
Ⅻ
Ⅼ
Ⅽ
Ⅾ
Ⅿ
U+217x
ⅰ
ⅱ
ⅲ
ⅳ
ⅴ
ⅵ
ⅶ
ⅷ
ⅸ
ⅹ
ⅺ
ⅻ
ⅼ
ⅽ
ⅾ
ⅿ
U+218x
ↀ
ↁ
ↂ
Ↄ
ↄ
ↅ
ↆ
ↇ
ↈ
↉
↊
↋
如果可以考虑正常数字,那么在enclosed alphanumerics
中还有一些其他代码点用于多个数字,如⒆⒇⓳⓴
U+XXXX
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
U+246x
①
②
③
④
⑤
⑥
⑦
⑧
⑨
⑩
⑪
⑫
⑬
⑭
⑮
⑯
U+247x
⑰
⑱
⑲
⑳
⑴
⑵
⑶
⑷
⑸
⑹
⑺
⑻
⑼
⑽
⑾
⑿
U+248x
⒀
⒁
⒂
⒃
⒄
⒅
⒆
⒇
⒈
⒉
⒊
⒋
⒌
⒍
⒎
⒏
U+249x
⒐
⒑
⒒
⒓
⒔
⒕
⒖
⒗
⒘
⒙
⒚
⒛
⒜
⒝
⒞
⒟
U+24Ax
⒠
⒡
⒢
⒣
⒤
⒥
⒦
⒧
⒨
⒩
⒪
⒫
⒬
⒭
⒮
⒯
U+24Bx
⒰
⒱
⒲
⒳
⒴
⒵
Ⓐ
Ⓑ
Ⓒ
Ⓓ
Ⓔ
Ⓕ
Ⓖ
Ⓗ
Ⓘ
Ⓙ
U+24Cx
Ⓚ
Ⓛ
Ⓜ
Ⓝ
Ⓞ
Ⓟ
Ⓠ
Ⓡ
Ⓢ
Ⓣ
Ⓤ
Ⓥ
Ⓦ
Ⓧ
Ⓨ
Ⓩ
U+24Dx
ⓐ
ⓑ
ⓒ
ⓓ
ⓔ
ⓕ
ⓖ
ⓗ
ⓘ
ⓙ
ⓚ
ⓛ
ⓜ
ⓝ
ⓞ
ⓟ
U+24Ex
ⓠ
ⓡ
ⓢ
ⓣ
ⓤ
ⓥ
ⓦ
ⓧ
ⓨ
ⓩ
⓪
⓫
⓬
⓭
⓮
⓯
U+24Fx
⓰
⓱
⓲
⓳
⓴
⓵
⓶
⓷
⓸
⓹
⓺
⓻
⓼
⓽
⓾
⓿
并在 Enclosed Alphanumeric Supplement
, , , , , , , , , , , ,
还有几个:
₧ ₨ ₶ ₯ ₠ ₢ ₷
⎂ ⏨
Control pictures(可能需要缩小才能看到)
U+XXXX
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
U+240x
␀
␁
␂
␃
␄
␅
␆
␇
␈
␉
␊
␋
␌
␍
␎
␏
U+241x
␐
␑
␒
␓
␔
␕
␖
␗
␘
␙
␚
␛
␜
␝
␞
␟
U+242x
␠
␡
␢
␣

␥
␦
还有表情符号™
竖线可以被认为是大写 i 或小写 L(就像你的 〷 示例实际上是 TELEGRAPH LINE FEED SEPARATOR SYMBOL),我们有
- Vai 音节见 ꔖ 0xa516
- 大型三竖条运算符⫼0x2afc
- 数杆十位三:0x1d36b
- 苏州数字〢〣
- 中国河流川
- ║ 盒图双竖...
这是查找 multi-character 个字母的自动脚本
import unicodedata
for c in range(0, 0x10FFFF + 1):
d = unicodedata.normalize('NFKD', chr(c))
if len(d) > 1 and d.isascii() and d.isalpha():
print("U+%04X (%s): %s\n" % (c, chr(c), d))
它将无法找到许多像 æ 或 œ 这样的连字,因为它们不被考虑 orthographic ligatures and aren't decomposable in Unicode. Here's the result in Unicode 11.0.0 (checked with unicodedata.unidata_version)
U+0132 (IJ): IJ
U+0133 (ij): ij
U+01C7 (LJ): LJ
U+01C8 (Lj): Lj
U+01C9 (lj): lj
U+01CA (NJ): NJ
U+01CB (Nj): Nj
U+01CC (nj): nj
U+01F1 (DZ): DZ
U+01F2 (Dz): Dz
U+01F3 (dz): dz
U+20A8 (₨): Rs
U+2116 (№): No
U+2120 (℠): SM
U+2121 (℡): TEL
U+2122 (™): TM
U+213B (℻): FAX
U+2161 (Ⅱ): II
U+2162 (Ⅲ): III
U+2163 (Ⅳ): IV
U+2165 (Ⅵ): VI
U+2166 (Ⅶ): VII
U+2167 (Ⅷ): VIII
U+2168 (Ⅸ): IX
U+216A (Ⅺ): XI
U+216B (Ⅻ): XII
U+2171 (ⅱ): ii
U+2172 (ⅲ): iii
U+2173 (ⅳ): iv
U+2175 (ⅵ): vi
U+2176 (ⅶ): vii
U+2177 (ⅷ): viii
U+2178 (ⅸ): ix
U+217A (ⅺ): xi
U+217B (ⅻ): xii
U+3250 (㉐): PTE
U+32CC (㋌): Hg
U+32CD (㋍): erg
U+32CE (㋎): eV
U+32CF (㋏): LTD
U+3371 (㍱): hPa
U+3372 (㍲): da
U+3373 (㍳): AU
U+3374 (㍴): bar
U+3375 (㍵): oV
U+3376 (㍶): pc
U+3377 (㍷): dm
U+337A (㍺): IU
U+3380 (㎀): pA
U+3381 (㎁): nA
U+3383 (㎃): mA
U+3384 (㎄): kA
U+3385 (㎅): KB
U+3386 (㎆): MB
U+3387 (㎇): GB
U+3388 (㎈): cal
U+3389 (㎉): kcal
U+338A (㎊): pF
U+338B (㎋): nF
U+338E (㎎): mg
U+338F (㎏): kg
U+3390 (㎐): Hz
U+3391 (㎑): kHz
U+3392 (㎒): MHz
U+3393 (㎓): GHz
U+3394 (㎔): THz
U+3396 (㎖): ml
U+3397 (㎗): dl
U+3398 (㎘): kl
U+3399 (㎙): fm
U+339A (㎚): nm
U+339C (㎜): mm
U+339D (㎝): cm
U+339E (㎞): km
U+33A9 (㎩): Pa
U+33AA (㎪): kPa
U+33AB (㎫): MPa
U+33AC (㎬): GPa
U+33AD (㎭): rad
U+33B0 (㎰): ps
U+33B1 (㎱): ns
U+33B3 (㎳): ms
U+33B4 (㎴): pV
U+33B5 (㎵): nV
U+33B7 (㎷): mV
U+33B8 (㎸): kV
U+33B9 (㎹): MV
U+33BA (㎺): pW
U+33BB (㎻): nW
U+33BD (㎽): mW
U+33BE (㎾): kW
U+33BF (㎿): MW
U+33C3 (㏃): Bq
U+33C4 (㏄): cc
U+33C5 (㏅): cd
U+33C8 (㏈): dB
U+33C9 (㏉): Gy
U+33CA (㏊): ha
U+33CB (㏋): HP
U+33CC (㏌): in
U+33CD (㏍): KK
U+33CE (㏎): KM
U+33CF (㏏): kt
U+33D0 (㏐): lm
U+33D1 (㏑): ln
U+33D2 (㏒): log
U+33D3 (㏓): lx
U+33D4 (㏔): mb
U+33D5 (㏕): mil
U+33D6 (㏖): mol
U+33D7 (㏗): PH
U+33D9 (㏙): PPM
U+33DA (㏚): PR
U+33DB (㏛): sr
U+33DC (㏜): Sv
U+33DD (㏝): Wb
U+33FF (㏿): gal
U+FB00 (ff): ff
U+FB01 (fi): fi
U+FB02 (fl): fl
U+FB03 (ffi): ffi
U+FB04 (ffl): ffl
U+FB05 (ſt): st
U+FB06 (st): st
U+1F12D (): CD
U+1F12E (): WZ
U+1F14A (): HV
U+1F14B (): MV
U+1F14C (): SD
U+1F14D (): SS
U+1F14E (): PPV
U+1F14F (): WC
U+1F16A (): MC
U+1F16B (): MD
U+1F190 (): DJ
我不太确定如何表达它,但我正在搜索不止一个视觉拉丁字母的 unicode 字母。
到目前为止,我在 Word 中找到了这个:
- DZ
- Dz
- dz
- NJ
- Lj
- LJ
- Nj
- nj
还有其他人吗?
这是我发现的一些角色。我首先通过查看一些可能的块来手动完成此操作。但是我后来写了一个 Python 脚本来自动执行此操作,您可以在这个答案的末尾找到
Two Glyphs | Digraph | Unicode Code Point | HTML |
---|---|---|---|
DZ, Dz, dz | DZ, Dz, dz | U+01F1 U+01F2 U+01F3 | DZ Dz dz |
DŽ, Dž, dž | DŽ, Dž, dž | U+01C4 U+01C5 U+01C6 | DŽ Dž dž |
IJ, ij | IJ, ij | U+0132 U+0133 | IJ ij |
LJ, Lj, lj | LJ, Lj, lj | U+01C7 U+01C8 U+01C9 | LJ Lj lj |
NJ, Nj, nj | NJ, Nj, nj | U+01CA U+01CB U+01CC | NJ Nj nj |
Non-ligature | Ligature | Unicode | HTML |
---|---|---|---|
AA, aa | Ꜳ, ꜳ | U+A732, U+A733 | Ꜳ ꜳ |
AE, ae | Æ, æ | U+00C6, U+00E6 | Æ æ |
AO, ao | Ꜵ, ꜵ | U+A734, U+A735 | Ꜵ ꜵ |
AU, au | Ꜷ, ꜷ | U+A736, U+A737 | Ꜷ ꜷ |
AV, av | Ꜹ, ꜹ | U+A738, U+A739 | Ꜹ ꜹ |
AV, av (with bar) | Ꜻ, ꜻ | U+A73A, U+A73B | Ꜻ ꜻ |
AY, ay | Ꜽ, ꜽ | U+A73C, U+A73D | Ꜽ ꜽ |
et | U+1F670 | 🙰 | |
ff | ff | U+FB00 | ff |
ffi | ffi | U+FB03 | ffi |
ffl | ffl | U+FB04 | ffl |
fi | fi | U+FB01 | fi |
fl | fl | U+FB02 | fl |
OE, oe | Œ, œ | U+0152, U+0153 | Œ œ |
OO, oo | Ꝏ, ꝏ | U+A74E, U+A74F | Ꝏ ꝏ |
ſs, ſz | ẞ, ß | U+1E9E, U+00DF | ß |
st | st | U+FB06 | st |
ſt | ſt | U+FB05 | ſt |
TZ, tz | Ꜩ, ꜩ | U+A728, U+A729 | Ꜩ ꜩ |
ue | ᵫ | U+1D6B | ᵫ |
VY, vy | Ꝡ, ꝡ | U+A760, U+A761 | Ꝡ ꝡ |
还有一些用于音标但看起来像拉丁字符的其他连字
Non-ligature | Ligature | Unicode | HTML |
---|---|---|---|
db | ȸ | U+0238 | ȸ |
dz | ʣ | U+02A3 | ʣ |
IJ, ij | IJ, ij | U+0132, U+0133 | IJ ij |
ls | ʪ | U+02AA | ʪ |
lz | ʫ | U+02AB | ʫ |
qp | ȹ | U+0239 | ȹ |
ts | ʦ | U+02A6 | ʦ |
ui | ꭐ | U+AB50 | ꭐ |
turned ui | ꭑ | U+AB51 | ꭑ |
https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode#Digraphs_and_ligatures
编辑:
℻ 和 ℡ 旁边还有更多 letterlike symbols 就像 OP 在评论中发现的那样:
℀ ℁ ⅍ ℅ ℆ ℔ ℠ ™
较长的字母主要来自CJK Compatibility块
U+XXXX | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U+338x | ㎀ | ㎁ | ㎂ | ㎃ | ㎄ | ㎅ | ㎆ | ㎇ | ㎈ | ㎉ | ㎊ | ㎋ | ㎌ | ㎍ | ㎎ | ㎏ |
U+339x | ㎐ | ㎑ | ㎒ | ㎓ | ㎔ | ㎕ | ㎖ | ㎗ | ㎘ | ㎙ | ㎚ | ㎛ | ㎜ | ㎝ | ㎞ | ㎟ |
U+33Ax | ㎠ | ㎡ | ㎢ | ㎣ | ㎤ | ㎥ | ㎦ | ㎧ | ㎨ | ㎩ | ㎪ | ㎫ | ㎬ | ㎭ | ㎮ | ㎯ |
U+33Bx | ㎰ | ㎱ | ㎲ | ㎳ | ㎴ | ㎵ | ㎶ | ㎷ | ㎸ | ㎹ | ㎺ | ㎻ | ㎼ | ㎽ | ㎾ | ㎿ |
U+33Cx | ㏀ | ㏁ | ㏂ | ㏃ | ㏄ | ㏅ | ㏆ | ㏇ | ㏈ | ㏉ | ㏊ | ㏋ | ㏌ | ㏍ | ㏎ | ㏏ |
U+33Dx | ㏐ | ㏑ | ㏒ | ㏓ | ㏔ | ㏕ | ㏖ | ㏗ | ㏘ | ㏙ | ㏚ | ㏛ | ㏜ | ㏝ | ㏞ | ㏟ |
3-letter-like中的符号是㎈㎑㎒㎓㎔㏒㏕㏖㏙㎪㎫㎬㎭㏆㏿㍱...大概是㎉和㎯
Unicode 甚至有 Roman numerals 的代码点。这里可以找到另一个4-letter-like字符:Ⅷ
U+XXXX | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U+215x | ⅐ | ⅑ | ⅒ | ⅓ | ⅔ | ⅕ | ⅖ | ⅗ | ⅘ | ⅙ | ⅚ | ⅛ | ⅜ | ⅝ | ⅞ | ⅟ |
U+216x | Ⅰ | Ⅱ | Ⅲ | Ⅳ | Ⅴ | Ⅵ | Ⅶ | Ⅷ | Ⅸ | Ⅹ | Ⅺ | Ⅻ | Ⅼ | Ⅽ | Ⅾ | Ⅿ |
U+217x | ⅰ | ⅱ | ⅲ | ⅳ | ⅴ | ⅵ | ⅶ | ⅷ | ⅸ | ⅹ | ⅺ | ⅻ | ⅼ | ⅽ | ⅾ | ⅿ |
U+218x | ↀ | ↁ | ↂ | Ↄ | ↄ | ↅ | ↆ | ↇ | ↈ | ↉ | ↊ | ↋ |
如果可以考虑正常数字,那么在enclosed alphanumerics
中还有一些其他代码点用于多个数字,如⒆⒇⓳⓴U+XXXX | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U+246x | ① | ② | ③ | ④ | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ | ⑩ | ⑪ | ⑫ | ⑬ | ⑭ | ⑮ | ⑯ |
U+247x | ⑰ | ⑱ | ⑲ | ⑳ | ⑴ | ⑵ | ⑶ | ⑷ | ⑸ | ⑹ | ⑺ | ⑻ | ⑼ | ⑽ | ⑾ | ⑿ |
U+248x | ⒀ | ⒁ | ⒂ | ⒃ | ⒄ | ⒅ | ⒆ | ⒇ | ⒈ | ⒉ | ⒊ | ⒋ | ⒌ | ⒍ | ⒎ | ⒏ |
U+249x | ⒐ | ⒑ | ⒒ | ⒓ | ⒔ | ⒕ | ⒖ | ⒗ | ⒘ | ⒙ | ⒚ | ⒛ | ⒜ | ⒝ | ⒞ | ⒟ |
U+24Ax | ⒠ | ⒡ | ⒢ | ⒣ | ⒤ | ⒥ | ⒦ | ⒧ | ⒨ | ⒩ | ⒪ | ⒫ | ⒬ | ⒭ | ⒮ | ⒯ |
U+24Bx | ⒰ | ⒱ | ⒲ | ⒳ | ⒴ | ⒵ | Ⓐ | Ⓑ | Ⓒ | Ⓓ | Ⓔ | Ⓕ | Ⓖ | Ⓗ | Ⓘ | Ⓙ |
U+24Cx | Ⓚ | Ⓛ | Ⓜ | Ⓝ | Ⓞ | Ⓟ | Ⓠ | Ⓡ | Ⓢ | Ⓣ | Ⓤ | Ⓥ | Ⓦ | Ⓧ | Ⓨ | Ⓩ |
U+24Dx | ⓐ | ⓑ | ⓒ | ⓓ | ⓔ | ⓕ | ⓖ | ⓗ | ⓘ | ⓙ | ⓚ | ⓛ | ⓜ | ⓝ | ⓞ | ⓟ |
U+24Ex | ⓠ | ⓡ | ⓢ | ⓣ | ⓤ | ⓥ | ⓦ | ⓧ | ⓨ | ⓩ | ⓪ | ⓫ | ⓬ | ⓭ | ⓮ | ⓯ |
U+24Fx | ⓰ | ⓱ | ⓲ | ⓳ | ⓴ | ⓵ | ⓶ | ⓷ | ⓸ | ⓹ | ⓺ | ⓻ | ⓼ | ⓽ | ⓾ | ⓿ |
并在 Enclosed Alphanumeric Supplement
, , , , , , , , , , , ,
还有几个:
₧ ₨ ₶ ₯ ₠ ₢ ₷
⎂ ⏨
Control pictures(可能需要缩小才能看到)
U+XXXX | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
U+240x | ␀ | ␁ | ␂ | ␃ | ␄ | ␅ | ␆ | ␇ | ␈ | ␉ | ␊ | ␋ | ␌ | ␍ | ␎ | ␏ |
U+241x | ␐ | ␑ | ␒ | ␓ | ␔ | ␕ | ␖ | ␗ | ␘ | ␙ | ␚ | ␛ | ␜ | ␝ | ␞ | ␟ |
U+242x | ␠ | ␡ | ␢ | ␣ |  | ␥ | ␦ |
还有表情符号™
竖线可以被认为是大写 i 或小写 L(就像你的 〷 示例实际上是 TELEGRAPH LINE FEED SEPARATOR SYMBOL),我们有
- Vai 音节见 ꔖ 0xa516
- 大型三竖条运算符⫼0x2afc
- 数杆十位三:0x1d36b
- 苏州数字〢〣
- 中国河流川
- ║ 盒图双竖...
这是查找 multi-character 个字母的自动脚本
import unicodedata
for c in range(0, 0x10FFFF + 1):
d = unicodedata.normalize('NFKD', chr(c))
if len(d) > 1 and d.isascii() and d.isalpha():
print("U+%04X (%s): %s\n" % (c, chr(c), d))
它将无法找到许多像 æ 或 œ 这样的连字,因为它们不被考虑 orthographic ligatures and aren't decomposable in Unicode. Here's the result in Unicode 11.0.0 (checked with unicodedata.unidata_version)
U+0132 (IJ): IJ
U+0133 (ij): ij
U+01C7 (LJ): LJ
U+01C8 (Lj): Lj
U+01C9 (lj): lj
U+01CA (NJ): NJ
U+01CB (Nj): Nj
U+01CC (nj): nj
U+01F1 (DZ): DZ
U+01F2 (Dz): Dz
U+01F3 (dz): dz
U+20A8 (₨): Rs
U+2116 (№): No
U+2120 (℠): SM
U+2121 (℡): TEL
U+2122 (™): TM
U+213B (℻): FAX
U+2161 (Ⅱ): II
U+2162 (Ⅲ): III
U+2163 (Ⅳ): IV
U+2165 (Ⅵ): VI
U+2166 (Ⅶ): VII
U+2167 (Ⅷ): VIII
U+2168 (Ⅸ): IX
U+216A (Ⅺ): XI
U+216B (Ⅻ): XII
U+2171 (ⅱ): ii
U+2172 (ⅲ): iii
U+2173 (ⅳ): iv
U+2175 (ⅵ): vi
U+2176 (ⅶ): vii
U+2177 (ⅷ): viii
U+2178 (ⅸ): ix
U+217A (ⅺ): xi
U+217B (ⅻ): xii
U+3250 (㉐): PTE
U+32CC (㋌): Hg
U+32CD (㋍): erg
U+32CE (㋎): eV
U+32CF (㋏): LTD
U+3371 (㍱): hPa
U+3372 (㍲): da
U+3373 (㍳): AU
U+3374 (㍴): bar
U+3375 (㍵): oV
U+3376 (㍶): pc
U+3377 (㍷): dm
U+337A (㍺): IU
U+3380 (㎀): pA
U+3381 (㎁): nA
U+3383 (㎃): mA
U+3384 (㎄): kA
U+3385 (㎅): KB
U+3386 (㎆): MB
U+3387 (㎇): GB
U+3388 (㎈): cal
U+3389 (㎉): kcal
U+338A (㎊): pF
U+338B (㎋): nF
U+338E (㎎): mg
U+338F (㎏): kg
U+3390 (㎐): Hz
U+3391 (㎑): kHz
U+3392 (㎒): MHz
U+3393 (㎓): GHz
U+3394 (㎔): THz
U+3396 (㎖): ml
U+3397 (㎗): dl
U+3398 (㎘): kl
U+3399 (㎙): fm
U+339A (㎚): nm
U+339C (㎜): mm
U+339D (㎝): cm
U+339E (㎞): km
U+33A9 (㎩): Pa
U+33AA (㎪): kPa
U+33AB (㎫): MPa
U+33AC (㎬): GPa
U+33AD (㎭): rad
U+33B0 (㎰): ps
U+33B1 (㎱): ns
U+33B3 (㎳): ms
U+33B4 (㎴): pV
U+33B5 (㎵): nV
U+33B7 (㎷): mV
U+33B8 (㎸): kV
U+33B9 (㎹): MV
U+33BA (㎺): pW
U+33BB (㎻): nW
U+33BD (㎽): mW
U+33BE (㎾): kW
U+33BF (㎿): MW
U+33C3 (㏃): Bq
U+33C4 (㏄): cc
U+33C5 (㏅): cd
U+33C8 (㏈): dB
U+33C9 (㏉): Gy
U+33CA (㏊): ha
U+33CB (㏋): HP
U+33CC (㏌): in
U+33CD (㏍): KK
U+33CE (㏎): KM
U+33CF (㏏): kt
U+33D0 (㏐): lm
U+33D1 (㏑): ln
U+33D2 (㏒): log
U+33D3 (㏓): lx
U+33D4 (㏔): mb
U+33D5 (㏕): mil
U+33D6 (㏖): mol
U+33D7 (㏗): PH
U+33D9 (㏙): PPM
U+33DA (㏚): PR
U+33DB (㏛): sr
U+33DC (㏜): Sv
U+33DD (㏝): Wb
U+33FF (㏿): gal
U+FB00 (ff): ff
U+FB01 (fi): fi
U+FB02 (fl): fl
U+FB03 (ffi): ffi
U+FB04 (ffl): ffl
U+FB05 (ſt): st
U+FB06 (st): st
U+1F12D (): CD
U+1F12E (): WZ
U+1F14A (): HV
U+1F14B (): MV
U+1F14C (): SD
U+1F14D (): SS
U+1F14E (): PPV
U+1F14F (): WC
U+1F16A (): MC
U+1F16B (): MD
U+1F190 (): DJ