HttpGet 发送带有错误编码的请求

HttpGet sends request with the wrong encoding

我正在尝试从以下 URL 获取文本响应:

http://translate.google.cn/translate_a/single?client=t&sl=zh-CN&tl=en&dt=t&tk=265632.142896&q=%E4%BD%A0%E5%A5%BD

响应如下:

[[["Hello there","你好",,,1]],,"zh-CN"]

(您可以通过在浏览器中输入地址来验证此响应。)

这是我尝试下载此文本的代码的简化版本:

import org.apache.http.client.HttpClient;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.BasicResponseHandler;
import org.apache.http.impl.client.DefaultHttpClient;

public class Test {
    public static String downloadString() {
        String url = "http://translate.google.cn/translate_a/single?client=t&sl=zh-CN&tl=en&dt=t&tk=265632.142896&q=%E4%BD%A0%E5%A5%BD";
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(url);
        ResponseHandler<String> handler = new BasicResponseHandler();
        try {
            return client.execute(request, handler);
        } catch (Exception e) {
            return "GET request failed.";
        }
    }
}

当我调用 Test.downloadString() 时,我得到以下(不正确的)响应:

[[["Huan Chai Sunsolt","浣犲ソ",,,0]],,"zh-CN"]

我猜测在请求过程的某处幕后存在某种编码问题(有六个字节应该被解释为两个汉字,但被解释为三个日文字符),但是我似乎无法查明确切原因。我的代码哪里做错了?

Android 6.0 版删除了对 Apache HTTP 客户端的支持。如果您的应用正在使用此客户端并面向 Android 2.3(API 级别 9)或更高版本,请改用 HttpURLConnection class。

此处:http://developer.android.com/about/versions/marshmallow/android-6.0-changes.html#behavior-apache-http-client

很奇怪,但是添加 User-Agent header 解决了问题:

request.addHeader("User-Agent", "Mozilla/5.0 (X11; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0");