有没有办法将 non-ascii 字符转换为 unicode 并保留原样的 ascii?
Is there a way to convert non-ascii chars to unicode and leave ascii as they are?
我刚刚发现,如果 apache httpclient returns 包含百分比编码字母,则位置 header 解码不正确。
当浏览器中的相同请求returns正确的字符串:
我写了一个恢复uri的方法。我写对了吗?有没有更简单的方法?
import java.net.URLDecoder;
public class Test {
public static void main(String[] args) throws Exception {
String uri = "/search-zero?searchterm=\u00D1\u008C";
String converted = convert(uri);
System.out.println(converted); // /search-zero?searchterm=%D1%8C
System.out.println(URLDecoder.decode(converted, "utf-8")); // /search-zero?searchterm=ь
}
private static String convert(String uri) {
char[] chars = uri.toCharArray();
int i = 0;
StringBuilder result = new StringBuilder();
while (i < chars.length) {
int n = (int) chars[i];
if (n > 127) {
result.append('%');
result.append(String.format("%02X", n));
} else {
result.append(chars[i]);
}
i++;
}
return result.toString();
}
}
更新
我当前的 HttpClient 配置:
@Bean
public CloseableHttpClient getHttpClient() {
ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20);
return HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
.setConnectionManager(cm)
.setRedirectStrategy(new CustomRedirectStrategy())
.build();
}
public class CustomRedirectStrategy extends DefaultRedirectStrategy {
@Override
public URI getLocationURI(HttpRequest request, HttpResponse response, HttpContext context) throws ProtocolException {
System.out.println(response.getFirstHeader("location"));
URI uri = super.getLocationURI(request, response, context);
return uri;
}
}
工作代码(我们需要正确设置自定义连接管理器或删除它)谢谢 OLEG!!
@Bean
public CloseableHttpClient getHttpClient() {
ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();
// PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
// cm.setMaxTotal(200);
// cm.setDefaultMaxPerRoute(20);
return HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
// .setConnectionManager(cm)
.setRedirectStrategy(new CustomRedirectStrategy())
.build();
}
可以强制 HttpClient 为协议元素使用 non-standard 字符集,这应该改进 inter-operability 损坏的 Web 服务器,其中包括 'Location' 中未转义的 non-ASCII 个字符 headers
ConnectionConfig connectionConfig = ConnectionConfig.custom()
.setCharset(Consts.ISO_8859_1)
.build();
CloseableHttpClient client = HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
.build();
我刚刚发现,如果 apache httpclient returns 包含百分比编码字母,则位置 header 解码不正确。
当浏览器中的相同请求returns正确的字符串:
我写了一个恢复uri的方法。我写对了吗?有没有更简单的方法?
import java.net.URLDecoder;
public class Test {
public static void main(String[] args) throws Exception {
String uri = "/search-zero?searchterm=\u00D1\u008C";
String converted = convert(uri);
System.out.println(converted); // /search-zero?searchterm=%D1%8C
System.out.println(URLDecoder.decode(converted, "utf-8")); // /search-zero?searchterm=ь
}
private static String convert(String uri) {
char[] chars = uri.toCharArray();
int i = 0;
StringBuilder result = new StringBuilder();
while (i < chars.length) {
int n = (int) chars[i];
if (n > 127) {
result.append('%');
result.append(String.format("%02X", n));
} else {
result.append(chars[i]);
}
i++;
}
return result.toString();
}
}
更新
我当前的 HttpClient 配置:
@Bean
public CloseableHttpClient getHttpClient() {
ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();
PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
cm.setMaxTotal(200);
cm.setDefaultMaxPerRoute(20);
return HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
.setConnectionManager(cm)
.setRedirectStrategy(new CustomRedirectStrategy())
.build();
}
public class CustomRedirectStrategy extends DefaultRedirectStrategy {
@Override
public URI getLocationURI(HttpRequest request, HttpResponse response, HttpContext context) throws ProtocolException {
System.out.println(response.getFirstHeader("location"));
URI uri = super.getLocationURI(request, response, context);
return uri;
}
}
工作代码(我们需要正确设置自定义连接管理器或删除它)谢谢 OLEG!!
@Bean
public CloseableHttpClient getHttpClient() {
ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();
// PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
// cm.setMaxTotal(200);
// cm.setDefaultMaxPerRoute(20);
return HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
// .setConnectionManager(cm)
.setRedirectStrategy(new CustomRedirectStrategy())
.build();
}
可以强制 HttpClient 为协议元素使用 non-standard 字符集,这应该改进 inter-operability 损坏的 Web 服务器,其中包括 'Location' 中未转义的 non-ASCII 个字符 headers
ConnectionConfig connectionConfig = ConnectionConfig.custom()
.setCharset(Consts.ISO_8859_1)
.build();
CloseableHttpClient client = HttpClients.custom()
.setDefaultConnectionConfig(connectionConfig)
.build();