是否有一些没有分号的有效 HTML 实体?
Are there some valid HTML entities without the semicolon?
查看这个官方 entities.json 文件,一些实体的定义没有结束分号。
例如:
"Â": { "codepoints": [194], "characters": "\u00C2" },
"Â": { "codepoints": [194], "characters": "\u00C2" },
HTML5 中记录的内容在哪里?还是浏览器 thing¹?
¹ thing 与 extension 一样,用于向后兼容。
HTML 命名字符列表在 https://html.spec.whatwg.org/multipage/named-characters.html 处定义,是的,其中一些没有尾随 ;
,例如 ¬
¬
我在 python 中编写了一个程序来获取一些数字,我发现:
在总共 2231 个实体中,有 4.75% 或 106 个末尾没有分号的有效实体
所有这些实体:
Æ, &, Á, Â, À, Å, Ã, Ä, ©, Ç, Ð, É, Ê, È, Ë, >, Í, Î, Ì, Ï, <, Ñ, Ó, Ô, Ò, Ø, Õ, Ö, ", ®, Þ, Ú, Û, Ù, Ü, Ý, á, â, ´, æ, à, &, å, ã, ä, ¦, ç, ¸, ¢, ©, ¤, °, ÷, é, ê, è, ð, ë, ½, ¼, ¾, >, í, î, ¡, ì, ¿, ï, «, <, ¯, µ, ·,  , ¬, ñ, ó, ô, ò, ª, º, ø, õ, ö, ¶, ±, £, ", », ®, §, ­, ¹, ², ³, ß, þ, ×, ú, û, ù, ¨, ü, ý, ¥, ÿ
根据 HTML 规范,没有分号 的命名 HTML 实体无效 ,但无论如何浏览器都需要支持其中的一些实体。 (这种规范模式 - 作为 HTML 作者,你做的事情在官方上是非法的,但仍然有一个浏览器必须实现的明确指定的行为 - 在 HTML 规范中被大量使用。)
规范中有几个相关部分:
-
相关引用:
Named character references
The ampersand must be followed by one of the names given in the named character references section, using the same case. The name must be one that is terminated by a U+003B SEMICOLON character (;).
§13.2 Parsing HTML Documents, especially 13.2.5.73 Named character reference state(如果你真的想通过解析算法的可怕 hard-to-read 实现细节来挑选)。
non-normative§1.11.2 Syntax errors,其中包含一些关于为什么规范引用时没有分号错误的解释(尽管我个人并不认为它非常引人注目):
Errors involving fragile syntax constructs
There are syntax constructs that, for historical reasons, are relatively fragile. To help reduce the number of users who accidentally run into such problems, they are made non-conforming.
Example
For example, the parsing of certain named character references in attributes happens even with the closing semicolon being omitted. It is safe to include an ampersand followed by letters that do not form a named character reference, but if the letters are changed to a string that does form a named character reference, they will be interpreted as that character instead.
In this fragment, the attribute's value is "?bill&ted"
:
<a href="?bill&ted">Bill and Ted</a>
In the following fragment, however, the attribute's value is actually "?art©"
, not the intended "?art©"
, because even without the final semicolon, "©"
is handled the same as "©"
and thus gets interpreted as "©"
:
<a href="?art©">Art and Copy</a>
To avoid this problem, all named character references are required to end with a semicolon, and uses of named character references without a semicolon are flagged as errors.
Thus, the correct way to express the above cases is as follows:
<a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference -->
<a href="?art&copy">Art and Copy</a> <!-- the & has to be escaped, since © is a named character reference -->
作为最后一点证实像 Â
这样的实体是无效的但仍然有效,我们可以使用这个测试文档:
<!DOCTYPE html>
<html lang="en">
<title>Test page</title>
<div>Â</div>
</html>
在 Chrome 中打开它,它可以工作并向我们显示带有抑扬音符的 A:
但将其粘贴到 Nu Html Checker (endorsed by WhatWG) 中,我们会收到一条错误消息,指出 “命名字符引用未以分号结束。”:
即有效,但无效。
查看这个官方 entities.json 文件,一些实体的定义没有结束分号。
例如:
"Â": { "codepoints": [194], "characters": "\u00C2" },
"Â": { "codepoints": [194], "characters": "\u00C2" },
HTML5 中记录的内容在哪里?还是浏览器 thing¹?
¹ thing 与 extension 一样,用于向后兼容。
HTML 命名字符列表在 https://html.spec.whatwg.org/multipage/named-characters.html 处定义,是的,其中一些没有尾随 ;
,例如 ¬
¬
我在 python 中编写了一个程序来获取一些数字,我发现:
在总共 2231 个实体中,有 4.75% 或 106 个末尾没有分号的有效实体
所有这些实体:
Æ, &, Á, Â, À, Å, Ã, Ä, ©, Ç, Ð, É, Ê, È, Ë, >, Í, Î, Ì, Ï, <, Ñ, Ó, Ô, Ò, Ø, Õ, Ö, ", ®, Þ, Ú, Û, Ù, Ü, Ý, á, â, ´, æ, à, &, å, ã, ä, ¦, ç, ¸, ¢, ©, ¤, °, ÷, é, ê, è, ð, ë, ½, ¼, ¾, >, í, î, ¡, ì, ¿, ï, «, <, ¯, µ, ·,  , ¬, ñ, ó, ô, ò, ª, º, ø, õ, ö, ¶, ±, £, ", », ®, §, ­, ¹, ², ³, ß, þ, ×, ú, û, ù, ¨, ü, ý, ¥, ÿ
没有分号 的命名 HTML 实体无效 ,但无论如何浏览器都需要支持其中的一些实体。 (这种规范模式 - 作为 HTML 作者,你做的事情在官方上是非法的,但仍然有一个浏览器必须实现的明确指定的行为 - 在 HTML 规范中被大量使用。)
规范中有几个相关部分:
-
相关引用:
Named character references
The ampersand must be followed by one of the names given in the named character references section, using the same case. The name must be one that is terminated by a U+003B SEMICOLON character (;).
§13.2 Parsing HTML Documents, especially 13.2.5.73 Named character reference state(如果你真的想通过解析算法的可怕 hard-to-read 实现细节来挑选)。
non-normative§1.11.2 Syntax errors,其中包含一些关于为什么规范引用时没有分号错误的解释(尽管我个人并不认为它非常引人注目):
Errors involving fragile syntax constructs
There are syntax constructs that, for historical reasons, are relatively fragile. To help reduce the number of users who accidentally run into such problems, they are made non-conforming.
Example
For example, the parsing of certain named character references in attributes happens even with the closing semicolon being omitted. It is safe to include an ampersand followed by letters that do not form a named character reference, but if the letters are changed to a string that does form a named character reference, they will be interpreted as that character instead.
In this fragment, the attribute's value is
"?bill&ted"
:<a href="?bill&ted">Bill and Ted</a>
In the following fragment, however, the attribute's value is actually
"?art©"
, not the intended"?art©"
, because even without the final semicolon,"©"
is handled the same as"©"
and thus gets interpreted as"©"
:<a href="?art©">Art and Copy</a>
To avoid this problem, all named character references are required to end with a semicolon, and uses of named character references without a semicolon are flagged as errors.
Thus, the correct way to express the above cases is as follows:
<a href="?bill&ted">Bill and Ted</a> <!-- &ted is ok, since it's not a named character reference --> <a href="?art&copy">Art and Copy</a> <!-- the & has to be escaped, since © is a named character reference -->
作为最后一点证实像 Â
这样的实体是无效的但仍然有效,我们可以使用这个测试文档:
<!DOCTYPE html>
<html lang="en">
<title>Test page</title>
<div>Â</div>
</html>
在 Chrome 中打开它,它可以工作并向我们显示带有抑扬音符的 A:
但将其粘贴到 Nu Html Checker (endorsed by WhatWG) 中,我们会收到一条错误消息,指出 “命名字符引用未以分号结束。”:
即有效,但无效。