具有超过 1 个字母拉丁字符的 Unicode 字母?

Unicode letters with more than 1 alphabetic latin character?

我不太确定如何表达它,但我正在搜索不止一个视觉拉丁字母的 unicode 字母。

到目前为止,我在 Word 中找到了这个:

还有其他人吗?

这是我发现的一些角色。我首先通过查看一些可能的块来手动完成此操作。但是我后来写了一个 Python 脚本来自动执行此操作,您可以在这个答案的末尾找到

Digraphs

Two Glyphs Digraph Unicode Code Point HTML
DZ, Dz, dz DZ, Dz, dz U+01F1 U+01F2 U+01F3 DZ Dz dz
DŽ, Dž, dž DŽ, Dž, dž U+01C4 U+01C5 U+01C6 DŽ Dž dž
IJ, ij IJ, ij U+0132 U+0133 IJ ij
LJ, Lj, lj LJ, Lj, lj U+01C7 U+01C8 U+01C9 LJ Lj lj
NJ, Nj, nj NJ, Nj, nj U+01CA U+01CB U+01CC NJ Nj nj

Ligatures

Non-ligature Ligature Unicode HTML
AA, aa Ꜳ, ꜳ U+A732, U+A733 Ꜳ ꜳ
AE, ae Æ, æ U+00C6, U+00E6 Æ æ
AO, ao Ꜵ, ꜵ U+A734, U+A735 Ꜵ ꜵ
AU, au Ꜷ, ꜷ U+A736, U+A737 Ꜷ ꜷ
AV, av Ꜹ, ꜹ U+A738, U+A739 Ꜹ ꜹ
AV, av (with bar) Ꜻ, ꜻ U+A73A, U+A73B Ꜻ ꜻ
AY, ay Ꜽ, ꜽ U+A73C, U+A73D Ꜽ ꜽ
et U+1F670 🙰
f‌f U+FB00 ff
f‌f‌i U+FB03 ffi
f‌f‌l U+FB04 ffl
f‌i U+FB01 fi
f‌l U+FB02 fl
OE, oe Œ, œ U+0152, U+0153 Œ œ
OO, oo Ꝏ, ꝏ U+A74E, U+A74F Ꝏ ꝏ
ſs, ſz ẞ, ß U+1E9E, U+00DF ß
st U+FB06 st
ſt U+FB05 ſt
TZ, tz Ꜩ, ꜩ U+A728, U+A729 Ꜩ ꜩ
ue U+1D6B ᵫ
VY, vy Ꝡ, ꝡ U+A760, U+A761 Ꝡ ꝡ

还有一些用于音标但看起来像拉丁字符的其他连字

Non-ligature Ligature Unicode HTML
db ȸ U+0238 ȸ
dz ʣ U+02A3 ʣ
IJ, ij IJ, ij U+0132, U+0133 IJ ij
ls ʪ U+02AA ʪ
lz ʫ U+02AB ʫ
qp ȹ U+0239 ȹ
ts ʦ U+02A6 ʦ
ui U+AB50 ꭐ
turned ui U+AB51 ꭑ

https://en.wikipedia.org/wiki/List_of_precomposed_Latin_characters_in_Unicode#Digraphs_and_ligatures


编辑:

℻ 和 ℡ 旁边还有更多 letterlike symbols 就像 OP 在评论中发现的那样:

℀ ℁ ⅍ ℅ ℆ ℔ ℠ ™

较长的字母主要来自CJK Compatibility

U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
U+338x
U+339x
U+33Ax
U+33Bx
U+33Cx
U+33Dx

3-letter-like中的符号是㎈㎑㎒㎓㎔㏒㏕㏖㏙㎪㎫㎬㎭㏆㏿㍱...大概是㎉和㎯

Unicode 甚至有 Roman numerals 的代码点。这里可以找到另一个4-letter-like字符:Ⅷ

U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
U+215x
U+216x
U+217x
U+218x

如果可以考虑正常数字,那么在enclosed alphanumerics

中还有一些其他代码点用于多个数字,如⒆⒇⓳⓴
U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
U+246x
U+247x
U+248x
U+249x
U+24Ax
U+24Bx
U+24Cx
U+24Dx
U+24Ex
U+24Fx

并在 Enclosed Alphanumeric Supplement

, , , , , , , , , , , ,

还有几个:

Currency symbol group

₧ ₨ ₶ ₯ ₠ ₢ ₷

Miscellaneous technical group

⎂ ⏨

Control pictures(可能需要缩小才能看到)

U+XXXX 0 1 2 3 4 5 6 7 8 9 A B C D E F
U+240x
U+241x
U+242x

Alchemical Symbols

Musical Symbols

还有表情符号™

竖线可以被认为是大写 i 或小写 L(就像你的 〷 示例实际上是 TELEGRAPH LINE FEED SEPARATOR SYMBOL),我们有

  • Vai 音节见 ꔖ 0xa516
  • 大型三竖条运算符⫼0x2afc
  • 数杆十位三:0x1d36b
  • 苏州数字〢〣
  • 中国河流川
  • ║ 盒图双竖...

这是查找 multi-character 个字母的自动脚本

import unicodedata

for c in range(0, 0x10FFFF + 1):
    d = unicodedata.normalize('NFKD', chr(c))
    if len(d) > 1 and d.isascii() and d.isalpha():
        print("U+%04X (%s): %s\n" % (c, chr(c), d))

它将无法找到许多像 æ 或 œ 这样的连字,因为它们不被考虑 orthographic ligatures and aren't decomposable in Unicode. Here's the result in Unicode 11.0.0 (checked with unicodedata.unidata_version)

U+0132 (IJ): IJ
U+0133 (ij): ij
U+01C7 (LJ): LJ
U+01C8 (Lj): Lj
U+01C9 (lj): lj
U+01CA (NJ): NJ
U+01CB (Nj): Nj
U+01CC (nj): nj
U+01F1 (DZ): DZ
U+01F2 (Dz): Dz
U+01F3 (dz): dz
U+20A8 (₨): Rs
U+2116 (№): No
U+2120 (℠): SM
U+2121 (℡): TEL
U+2122 (™): TM
U+213B (℻): FAX
U+2161 (Ⅱ): II
U+2162 (Ⅲ): III
U+2163 (Ⅳ): IV
U+2165 (Ⅵ): VI
U+2166 (Ⅶ): VII
U+2167 (Ⅷ): VIII
U+2168 (Ⅸ): IX
U+216A (Ⅺ): XI
U+216B (Ⅻ): XII
U+2171 (ⅱ): ii
U+2172 (ⅲ): iii
U+2173 (ⅳ): iv
U+2175 (ⅵ): vi
U+2176 (ⅶ): vii
U+2177 (ⅷ): viii
U+2178 (ⅸ): ix
U+217A (ⅺ): xi
U+217B (ⅻ): xii
U+3250 (㉐): PTE
U+32CC (㋌): Hg
U+32CD (㋍): erg
U+32CE (㋎): eV
U+32CF (㋏): LTD
U+3371 (㍱): hPa
U+3372 (㍲): da
U+3373 (㍳): AU
U+3374 (㍴): bar
U+3375 (㍵): oV
U+3376 (㍶): pc
U+3377 (㍷): dm
U+337A (㍺): IU
U+3380 (㎀): pA
U+3381 (㎁): nA
U+3383 (㎃): mA
U+3384 (㎄): kA
U+3385 (㎅): KB
U+3386 (㎆): MB
U+3387 (㎇): GB
U+3388 (㎈): cal
U+3389 (㎉): kcal
U+338A (㎊): pF
U+338B (㎋): nF
U+338E (㎎): mg
U+338F (㎏): kg
U+3390 (㎐): Hz
U+3391 (㎑): kHz
U+3392 (㎒): MHz
U+3393 (㎓): GHz
U+3394 (㎔): THz
U+3396 (㎖): ml
U+3397 (㎗): dl
U+3398 (㎘): kl
U+3399 (㎙): fm
U+339A (㎚): nm
U+339C (㎜): mm
U+339D (㎝): cm
U+339E (㎞): km
U+33A9 (㎩): Pa
U+33AA (㎪): kPa
U+33AB (㎫): MPa
U+33AC (㎬): GPa
U+33AD (㎭): rad
U+33B0 (㎰): ps
U+33B1 (㎱): ns
U+33B3 (㎳): ms
U+33B4 (㎴): pV
U+33B5 (㎵): nV
U+33B7 (㎷): mV
U+33B8 (㎸): kV
U+33B9 (㎹): MV
U+33BA (㎺): pW
U+33BB (㎻): nW
U+33BD (㎽): mW
U+33BE (㎾): kW
U+33BF (㎿): MW
U+33C3 (㏃): Bq
U+33C4 (㏄): cc
U+33C5 (㏅): cd
U+33C8 (㏈): dB
U+33C9 (㏉): Gy
U+33CA (㏊): ha
U+33CB (㏋): HP
U+33CC (㏌): in
U+33CD (㏍): KK
U+33CE (㏎): KM
U+33CF (㏏): kt
U+33D0 (㏐): lm
U+33D1 (㏑): ln
U+33D2 (㏒): log
U+33D3 (㏓): lx
U+33D4 (㏔): mb
U+33D5 (㏕): mil
U+33D6 (㏖): mol
U+33D7 (㏗): PH
U+33D9 (㏙): PPM
U+33DA (㏚): PR
U+33DB (㏛): sr
U+33DC (㏜): Sv
U+33DD (㏝): Wb
U+33FF (㏿): gal
U+FB00 (ff): ff
U+FB01 (fi): fi
U+FB02 (fl): fl
U+FB03 (ffi): ffi
U+FB04 (ffl): ffl
U+FB05 (ſt): st
U+FB06 (st): st
U+1F12D (): CD
U+1F12E (): WZ
U+1F14A (): HV
U+1F14B (): MV
U+1F14C (): SD
U+1F14D (): SS
U+1F14E (): PPV
U+1F14F (): WC
U+1F16A (): MC
U+1F16B (): MD
U+1F190 (): DJ