奇怪的 url 编码问题

Strange url encoding issue

我有一个奇怪的问题,url将加号 + 编码为针对 API 的请求的查询参数。 API 的文档指出:

The date has to be in the W3C format, e.g. '2016-10-24T13:33:23+02:00'.

到目前为止一切顺利,所以我使用此代码(最小化)生成 url,使用 Spring 的 UriComponentBuilder:

DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ssX");
ZonedDateTime dateTime = ZonedDateTime.now().minusDays(1);
String formated = dateTime.format(formatter);

UriComponentsBuilder uriComponentsBuilder = UriComponentsBuilder.fromUriString(baseUrl);
uriComponentsBuilder.queryParam("update", formated);
uriComponentsBuilder.build();
String url = uriComponentsBuilder.toUriString();

未编码的查询如下所示:

https://example.com?update=2017-01-05T12:40:44+01

编码后的字符串结果为:

https://example.com?update=2017-01-05T12:40:44%2B01

这是(恕我直言)正确编码的查询字符串。请参阅 %2B 替换查询字符串末尾 +01 中的 +

然而,现在,当我使用编码的 url 发送针对 API 的请求时,我收到一条错误消息,指出无法处理该请求。

但是,如果我在发送请求之前将 %2B 替换为 +,它会起作用:

url.replaceAll("%2B", "+");

根据我的理解,+ 符号是 whitespace 的替代品。所以服务器解码后真正看到的url一定是

https://example.com?update=2017-01-05T12:40:44 01

更新:

根据规范RFC 3986(第 3.4 节),查询参数中的 + 符号不需要编码。

3.4. Query

The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a
resource within the scope of the URI's scheme and naming authority
(if any). The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.

Berners-Lee, et al. Standards Track [Page 23] RFC 3986 URI Generic Syntax
January 2005

  query       = *( pchar / "/" / "?" )

The characters slash ("/") and question mark ("?") may represent data within the query component. Beware that some older, erroneous implementations may not handle such data correctly when it is used as the base URI for relative references (Section 5.1), apparently
because they fail to distinguish query data from path data when
looking for hierarchical separators. However, as query components
are often used to carry identifying information in the form of
"key=value" pairs and one frequently used value is a reference to
another URI, it is sometimes better for usability to avoid percent-
encoding those characters.

根据 this answer on Whosebug,spring 的 UriComponentBuilder 使用了这个规范,但实际上并没有。所以一个新问题是,如何使 UriComponentBuilder 遵循规范?

编码2017-01-05T12:40:44+01

给你2017-01-05T12%3A40%3A44%2B01

而不是您建议的 2017-01-05T12:40:44%2B01

也许这就是服务器无法处理您的请求的原因,它是半编码日期。

所以好像spring的UriComponentBuilder对整个url进行了编码,在build()方法中设置编码标志为false没有效果,因为toUriString() 方法始终对 url 进行编码,因为它在 build():

之后显式调用 encode()
/**
 * Build a URI String. This is a shortcut method which combines calls
 * to {@link #build()}, then {@link UriComponents#encode()} and finally
 * {@link UriComponents#toUriString()}.
 * @since 4.1
 * @see UriComponents#toUriString()
 */
public String toUriString() {
    return build(false).encode().toUriString();
}

我的解决方案(目前)是对真正需要手动编码的内容进行编码。另一种解决方案可能是(也可能需要编码)获取 URI 并进一步处理

String url = uriComponentsBuilder.build().toUri().toString(); // returns the unencoded url as a string

在org/springframework/web/util/HierarchicalUriComponents.java

QUERY_PARAM {
        @Override
        public boolean isAllowed(int c) {
            if ('=' == c || '+' == c || '&' == c) {
                return false;
            }
            else {
                return isPchar(c) || '/' == c || '?' == c;
            }
        }
    }

不允许使用字符“+”,因此将对其进行编码

你可以使用builder.build().toUriString()

这对我有用

谢谢