为什么 decodeURI 解码的字符多于应有的字符数?
Why does decodeURI decode more characters than it should?
我刚刚在阅读有关 decodeURI
(MDN, ES6 spec) 的内容,有件事引起了我的注意:
Escape sequences that could not have been introduced by encodeURI are not replaced.
因此,它应该只解码 encodeURI
编码的字符。
// None of these should be escaped by `encodeURI`.
const unescaped = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'();/?:@&=+$,#";
const data = [...unescaped].map(char => ({
"char": char,
"encodeURI(char)": encodeURI(char),
"encodePercent(char)": encodePercent(char),
"decodeURI(encodePercent(char))": decodeURI(encodePercent(char))
}));
console.table( data );
console.log( "Check the browser's console." );
function encodePercent(string) {
return string.replace(/./g, char => "%" + char.charCodeAt(0).toString(16));
}
为什么这只适用于 ; / ? : @ & = + $ , #
?
规范规定了以下步骤:
- Let unescapedURISet be a String containing one instance of each code unit valid in uriReserved and uriUnescaped plus "#"
让我们来看看 uriReserved,瞧:
uriReserved ::: one of
; / ? : @ & = + $ ,
接下来的步骤是:
- Return Encode(uriString, unescapedURISet).
Encode 除了 unescapedURISet 中的字符外,
Encode 中的所有内容都对字符串进行编码,其中包括 ; / ? : @ & = + $ ,
。
这意味着 encodeURI
永远不能为 uriReserved 和 uriUnescaped.
中的任何内容引入转义序列
有趣的是,decodeURI
是这样定义的:
Let reservedURISet be a String containing one instance of each code unit valid in uriReserved plus "#".
Return Decode(uriString, reservedURISet).
Decode 的工作方式类似于对 reservedURISet 中的字符进行编码和解码。显然,只有uriReserved的字符被排除在解码之外。而这些恰好是 ; / ? : @ & = + $ ,
!
问题仍然是标准为何如此规定。如果他们在 reservedURISet 中包含了 uriUnescaped,那么行为将与介绍中所述的完全一致。可能是一个错误?
我刚刚在阅读有关 decodeURI
(MDN, ES6 spec) 的内容,有件事引起了我的注意:
Escape sequences that could not have been introduced by encodeURI are not replaced.
因此,它应该只解码 encodeURI
编码的字符。
// None of these should be escaped by `encodeURI`.
const unescaped = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_.!~*'();/?:@&=+$,#";
const data = [...unescaped].map(char => ({
"char": char,
"encodeURI(char)": encodeURI(char),
"encodePercent(char)": encodePercent(char),
"decodeURI(encodePercent(char))": decodeURI(encodePercent(char))
}));
console.table( data );
console.log( "Check the browser's console." );
function encodePercent(string) {
return string.replace(/./g, char => "%" + char.charCodeAt(0).toString(16));
}
为什么这只适用于 ; / ? : @ & = + $ , #
?
规范规定了以下步骤:
- Let unescapedURISet be a String containing one instance of each code unit valid in uriReserved and uriUnescaped plus "#"
让我们来看看 uriReserved,瞧:
uriReserved ::: one of
; / ? : @ & = + $ ,
接下来的步骤是:
- Return Encode(uriString, unescapedURISet).
Encode 除了 unescapedURISet 中的字符外,
Encode 中的所有内容都对字符串进行编码,其中包括 ; / ? : @ & = + $ ,
。
这意味着 encodeURI
永远不能为 uriReserved 和 uriUnescaped.
有趣的是,decodeURI
是这样定义的:
Let reservedURISet be a String containing one instance of each code unit valid in uriReserved plus "#".
Return Decode(uriString, reservedURISet).
Decode 的工作方式类似于对 reservedURISet 中的字符进行编码和解码。显然,只有uriReserved的字符被排除在解码之外。而这些恰好是 ; / ? : @ & = + $ ,
!
问题仍然是标准为何如此规定。如果他们在 reservedURISet 中包含了 uriUnescaped,那么行为将与介绍中所述的完全一致。可能是一个错误?