XML 允许的字符

XML allowable characters

在高层次上,以下字符代码在 XML 中增加了对什么的支持?

[#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | 
[#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | 
[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

参考:https://www.w3.org/TR/xml/#NT-NameStartChar

我可以查各种字,例如:

À   latin capital letter a with grave   0300    192 0xC0    À

但我想知道是否有人可以在较高的层次上解释这允许 - 和不允许 - 因为范围之间存在差距(例如,0xF7) .

命名规则背后的基本原理总结在同一个linked page

The first character of a Name must be a NameStartChar, and any other characters must be NameChars; this mechanism is used to prevent names from beginning with European (ASCII) digits or with basic combining characters.

Almost all characters are permitted in names, except those which either are or reasonably could be used as delimiters.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents

例如,检查 Unicode blocks finds that x300-x36F are Combining Diacritical Marks, and x2190-x21FF are Arrows 的列表,这解释了为什么两个范围都被排除在引用列表之外。

更具体地说,关于 Character Classes describes the name rules in terms of Unicode Categories 的部分(有一些例外和说明单独注明,未在下面复制)。

  • Name start characters must have one of the categories Ll, Lu, Lo, Lt, Nl.

    • Ll  -  Letter, uppercase
    • Lu  -  Letter, lowercase
    • Lo  -  Letter, other (an ideograph or a letter in a unicase alphabet)
    • Lt  -  Letter, titlecase (ligatures containing uppercase followed by lowercase)
    • Nl  -  Number, letter (numerals composed of letters or letterlike symbols)
  • Name characters other than Name-start characters must have one of the categories Mc, Me, Mn, Lm, or Nd.

    • Mc  -  Mark, spacing combining
    • Me  -  Mark, enclosing
    • Mn  -  Mark, nonspacing
    • Lm  -  Letter, modifier (incl. diacritics)
    • Nd  -  Number, decimal digit