有没有办法将 non-ascii 字符转换为 unicode 并保留原样的 ascii?

Is there a way to convert non-ascii chars to unicode and leave ascii as they are?

我刚刚发现,如果 apache httpclient returns 包含百分比编码字母,则位置 header 解码不正确。

当浏览器中的相同请求returns正确的字符串:

我写了一个恢复uri的方法。我写对了吗?有没有更简单的方法?

import java.net.URLDecoder;

public class Test {
    public static void main(String[] args) throws Exception {
        String uri = "/search-zero?searchterm=\u00D1\u008C";
        String converted = convert(uri);
        System.out.println(converted); // /search-zero?searchterm=%D1%8C
        System.out.println(URLDecoder.decode(converted, "utf-8")); // /search-zero?searchterm=ь
    }

    private static String convert(String uri) {
        char[] chars = uri.toCharArray();
        int i = 0;
        StringBuilder result = new StringBuilder();
        while (i < chars.length) {
            int n = (int) chars[i];
            if (n > 127) {
                result.append('%');
                result.append(String.format("%02X", n));
            } else {
                result.append(chars[i]);
            }
            i++;
        }
        return result.toString();
    }
}

更新

我当前的 HttpClient 配置:

@Bean
public CloseableHttpClient getHttpClient() {
    ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();

    PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
    cm.setMaxTotal(200);
    cm.setDefaultMaxPerRoute(20);

    return HttpClients.custom()
            .setDefaultConnectionConfig(connectionConfig)
            .setConnectionManager(cm)
            .setRedirectStrategy(new CustomRedirectStrategy())
            .build();
}

public class CustomRedirectStrategy extends DefaultRedirectStrategy {

    @Override
    public URI getLocationURI(HttpRequest request, HttpResponse response, HttpContext context) throws ProtocolException {
        System.out.println(response.getFirstHeader("location"));
        URI uri = super.getLocationURI(request, response, context);
        return uri;
    }
}

工作代码(我们需要正确设置自定义连接管理器或删除它)谢谢 OLEG!!

    @Bean
    public CloseableHttpClient getHttpClient() {
        ConnectionConfig connectionConfig = ConnectionConfig.custom().setCharset(Consts.UTF_8).build();

//        PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager();
//        cm.setMaxTotal(200);
//        cm.setDefaultMaxPerRoute(20);

        return HttpClients.custom()
                .setDefaultConnectionConfig(connectionConfig)
//                .setConnectionManager(cm)
                .setRedirectStrategy(new CustomRedirectStrategy())
                .build();
    }

可以强制 HttpClient 为协议元素使用 non-standard 字符集,这应该改进 inter-operability 损坏的 Web 服务器,其中包括 'Location' 中未转义的 non-ASCII 个字符 headers

ConnectionConfig connectionConfig = ConnectionConfig.custom()
        .setCharset(Consts.ISO_8859_1)
        .build();
CloseableHttpClient client = HttpClients.custom()
        .setDefaultConnectionConfig(connectionConfig)
        .build();