为什么 decodeURI 解码的字符多于应有的字符数？

Question

我刚刚在阅读有关 decodeURI (MDN, ES6 spec) 的内容，有件事引起了我的注意：

Escape sequences that could not have been introduced by encodeURI are not replaced.

因此，它应该只解码 encodeURI 编码的字符。

// None of these should be escaped by `encodeURI`.
const unescaped = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'();/?:@&=+$,#";

const data = [...unescaped].map(char => ({
  "char": char,
  "encodeURI(char)": encodeURI(char),
  "encodePercent(char)": encodePercent(char),
  "decodeURI(encodePercent(char))": decodeURI(encodePercent(char))
}));

console.table( data );
console.log( "Check the browser's console." );

function encodePercent(string) {
  return string.replace(/./g, char => "%" + char.charCodeAt(0).toString(16));
}

为什么这只适用于 ; / ? : @ & = + $ , #？

Answer 1

规范规定了以下步骤：

Let unescapedURISet be a String containing one instance of each code unit valid in uriReserved and uriUnescaped plus "#"

让我们来看看 uriReserved，瞧：

uriReserved ::: one of
; / ? : @ & = + $ ,

接下来的步骤是：

Return Encode(uriString, unescapedURISet).

Encode 除了 unescapedURISet 中的字符外，

Encode 中的所有内容都对字符串进行编码，其中包括 ; / ? : @ & = + $ ,。

这意味着 encodeURI 永远不能为 uriReserved 和 uriUnescaped.

中的任何内容引入转义序列

有趣的是，decodeURI 是这样定义的：

Let reservedURISet be a String containing one instance of each code unit valid in uriReserved plus "#".

Return Decode(uriString, reservedURISet).

Decode 的工作方式类似于对 reservedURISet 中的字符进行编码和解码。显然，只有uriReserved的字符被排除在解码之外。而这些恰好是 ; / ? : @ & = + $ ,！

问题仍然是标准为何如此规定。如果他们在 reservedURISet 中包含了 uriUnescaped，那么行为将与介绍中所述的完全一致。可能是一个错误？

为什么 decodeURI 解码的字符多于应有的字符数？

Why does decodeURI decode more characters than it should?

javascript

urlencode