XML 文字与转义新行的模式验证?

XML schema validation of literal versus excaped new line?

我有一个正在工作的 xsd 拒绝验证包含无效白色 space 的 XML 实例(有关详细信息,请参阅下文,但其中包括回车符 return ( #xD)、换行符 (#xA) 或制表符 (#x9) 字符、无开始或结束 space (#x20) 字符或两个或多个相邻 space 字符的序列)。

样本XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
elementFormDefault="qualified" 
targetNamespace="http://www.example.com"
xmlns:test="http://www.example.com">

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>

<xs:element name="test-token" type="test:Tokenized500Type"></xs:element>

<xs:simpleType name="Tokenized500Type">
    <xs:annotation>
        <xs:documentation>An element of this type has minimum length of one character, a max of 500, and may not
            contain any of: the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, shall
            not begin or end with a space (#x20) character, or a sequence of two or more adjacent space
            characters.</xs:documentation>
    </xs:annotation>
    <xs:restriction base="xs:string">
        <xs:maxLength value="500"/>
        <xs:minLength value="1"/>
        <xs:pattern value="\S+( \S+)*"/>

    </xs:restriction>
</xs:simpleType>

我用上面的文字白色 space 字符测试了这个。

如果XML实例在相关元素内容中包含转义的白色space怎么办?这会不会导致验证错误?

这是一个带有转义版本的示例实例:

<?xml version="1.0" encoding="UTF-8"?>
<test-token xmlns="http://www.example.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.example.com">&#13;</test-token>

另请参阅:

正则表达式应该对扩展的(未转义的)字符串进行操作,因此新行文字和

之间应该没有区别
\S matches anything but a whitespace (short for [^\f\n\r\t\v\u00A0\u2028\u2029]).

另请注意,XSD 中使用的正则表达式是 Unicode 正则表达式(不同于更标准的 posix 正则表达式,更糟糕的是,一些解析器使用任何正则表达式解析器碰巧通过敲打(xsd .net 中的验证使用其内部正则表达式解析器 - 这不是 'Unicode Regular Expression')。

Note: The ·regular expression· language defined here does not attempt to provide a general solution to "regular expressions" over UCS character sequences. In particular, it does not easily provide for matching sequences of base characters and combining marks. The language is targeted at support of "Level 1" features as defined in [Unicode Regular Expression Guidelines]. It is hoped that future versions of this specification will provide support for "Level 2" features.