为什么必须在 XML 属性中对 < 进行转义?

Why must < be escaped in an XML attribute?

我想知道为什么 < 必须在 XML 属性中转义,例如

<foo bar="3 < 4" />

从周围(标签内,属性值内)来看,解析器应该很清楚它不能是新标签的开头。

XML 规范禁止这样做的原因是什么?

我不是很清楚,但在很多情况下,解释是 SGML 兼容性。 XML 被设计为 SGML 的子集,因此不允许 SGML 不允许的事情。

小于字符 (<) must indeed be escaped 属性值内:

Well-Formedness Constraint: No < in Attribute Values

The replacement text of any entity referred to directly or indirectly in an attribute value (other than "&lt;") must not contain a <.

为什么?

如您所见,可以明确地解析包含 < 的属性值。然而,动机是使 XML 的解析规则尽可能简单...

据 XML 1.0 W3C 推荐标准编辑之一和 The Annotated XML Specification 的作者 Tim Bray 所说,它抓住了 XML 设计决策背后的一些基本原理:

Banishing the <

This rule might seem a bit unnecessary, on the face of it. Since you can't have tags in attribute values, having an < can hardly be confusing, so why ban it?

This is another attempt to make life easy for the DPH. The rule in XML is simple: when you're reading text, and you hit a <, then that's a markup delimiter. Not just sometimes, always. When you want one in the data, you have to use &lt;. Not just sometimes, always. In attribute values too.

This rule has another unintended beneficial side-effect; it makes the catching of certain errors much easier. Suppose you have a chunk of XML as follows:

<a href="notes.html> <img src='notes.gif'></a>

Notice that the notes.html is missing its closing quote. Without the no-&lt; rule, it would be really hard to detect this problem and issue a reasonable error message. Since attribute values can contain almost anything, no error would be detected until the processor finds the next quotation mark. Instead, you get an error message the first time you hit a <, which in the example above, as in many cases, is almost immediately.

Back-link to spec