为什么 decodeURI 解码的字符多于应有的字符数?

Why does decodeURI decode more characters than it should?

我刚刚在阅读有关 decodeURI (MDN, ES6 spec) 的内容,有件事引起了我的注意:

Escape sequences that could not have been introduced by encodeURI are not replaced.

因此,它应该只解码 encodeURI 编码的字符。

// None of these should be escaped by `encodeURI`.
const unescaped = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'();/?:@&=+$,#";

const data = [...unescaped].map(char => ({
  "char": char,
  "encodeURI(char)": encodeURI(char),
  "encodePercent(char)": encodePercent(char),
  "decodeURI(encodePercent(char))": decodeURI(encodePercent(char))
}));

console.table( data );
console.log( "Check the browser's console." );

function encodePercent(string) {
  return string.replace(/./g, char => "%" + char.charCodeAt(0).toString(16));
}

为什么这只适用于 ; / ? : @ & = + $ , #

规范规定了以下步骤:

  1. Let unescapedURISet be a String containing one instance of each code unit valid in uriReserved and uriUnescaped plus "#"

让我们来看看 uriReserved,瞧:

uriReserved ::: one of

; / ? : @ & = + $ ,

接下来的步骤是:

  1. Return Encode(uriString, unescapedURISet).

Encode 除了 unescapedURISet 中的字符外,

Encode 中的所有内容都对字符串进行编码,其中包括 ; / ? : @ & = + $ ,

这意味着 encodeURI 永远不能为 uriReserveduriUnescaped.

中的任何内容引入转义序列

有趣的是,decodeURI 是这样定义的:

  1. Let reservedURISet be a String containing one instance of each code unit valid in uriReserved plus "#".

  2. Return Decode(uriString, reservedURISet).

Decode 的工作方式类似于对 reservedURISet 中的字符进行编码和解码。显然,只有uriReserved的字符被排除在解码之外。而这些恰好是 ; / ? : @ & = + $ ,

问题仍然是标准为何如此规定。如果他们在 reservedURISet 中包含了 uriUnescaped,那么行为将与介绍中所述的完全一致。可能是一个错误?