接收对 HttpsURLConnection GET 请求的编码响应
Receiving encoded response to HttpsURLConnection GET request
我正在开发一个 Android 应用程序,它将使用 java class HttpsURLConnection 连接到网页并使用 JSoup 解析 HTML 响应。问题是来自网站的 HTML 响应似乎已编码。关于如何获得实际的任何想法 HTML?
这是我联系网站的代码:
private String GetPageContent(String url) throws Exception {
URL obj = new URL(url);
conn = (HttpsURLConnection) obj.openConnection();
// default is GET
conn.setRequestMethod("GET");
conn.setUseCaches(false);
// act like a browser
conn.setRequestProperty("User-Agent", USER_AGENT);
conn.setRequestProperty("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
conn.setRequestProperty("Accept-Language", "en-US,en;q=0.8,en-GB;q=0.6");
conn.setRequestProperty("Accept-Encoding" , "gzip, deflate, sdch");
conn.setRequestProperty("Connection" , "keep-alive");
if (cookies != null) {
for (String cookie : this.cookies) {
conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
}
}
int responseCode = conn.getResponseCode();
Log.v(TAG,"\nSending 'GET' request to URL : " + url);
Log.v(TAG,"Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
// Get the response cookies
setCookies(conn.getHeaderFields().get("Set-Cookie"));
return response.toString();
}
以及响应的片段:
��������������]�r�6��۞�w@ՙ�NDQ�ﱥ|�siv�Kkw�m&�HH�M, Z��ff_c_o�d�@���9�l�6����� �_=w|����/A{��!W� LZ��������f]�=wc߽�2,˨�|�8x��~�}�x1�$Ib�Uq�7�j�X|;��K
编辑:HTML 使用 GZIP 编码,如请求 headers here.
所示
此问题的解决方案是使用 GZIPInputStream class,如下所示:
BufferedReader in = new BufferedReader(new InputStreamReader(
new GZIPInputStream(conn.getInputStream())));
不知道您要访问哪个URL,但是您是否尝试过设置字符集?
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream(), "UTF8"));
根据请求返回的 headers,我们可以得出结论,内容是使用 gzip 编码的。幸运的是,有一种简单的方法可以解码 gzip 编码流,使用 GZIPInputStream class.
我正在开发一个 Android 应用程序,它将使用 java class HttpsURLConnection 连接到网页并使用 JSoup 解析 HTML 响应。问题是来自网站的 HTML 响应似乎已编码。关于如何获得实际的任何想法 HTML?
这是我联系网站的代码:
private String GetPageContent(String url) throws Exception {
URL obj = new URL(url);
conn = (HttpsURLConnection) obj.openConnection();
// default is GET
conn.setRequestMethod("GET");
conn.setUseCaches(false);
// act like a browser
conn.setRequestProperty("User-Agent", USER_AGENT);
conn.setRequestProperty("Accept",
"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
conn.setRequestProperty("Accept-Language", "en-US,en;q=0.8,en-GB;q=0.6");
conn.setRequestProperty("Accept-Encoding" , "gzip, deflate, sdch");
conn.setRequestProperty("Connection" , "keep-alive");
if (cookies != null) {
for (String cookie : this.cookies) {
conn.addRequestProperty("Cookie", cookie.split(";", 1)[0]);
}
}
int responseCode = conn.getResponseCode();
Log.v(TAG,"\nSending 'GET' request to URL : " + url);
Log.v(TAG,"Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
// Get the response cookies
setCookies(conn.getHeaderFields().get("Set-Cookie"));
return response.toString();
}
以及响应的片段:
��������������]�r�6��۞�w@ՙ�NDQ�ﱥ|�siv�Kkw�m&�HH�M, Z��ff_c_o�d�@���9�l�6����� �_=w|����/A{��!W� LZ��������f]�=wc߽�2,˨�|�8x��~�}�x1�$Ib�Uq�7�j�X|;��K
编辑:HTML 使用 GZIP 编码,如请求 headers here.
所示此问题的解决方案是使用 GZIPInputStream class,如下所示:
BufferedReader in = new BufferedReader(new InputStreamReader(
new GZIPInputStream(conn.getInputStream())));
不知道您要访问哪个URL,但是您是否尝试过设置字符集?
BufferedReader in = new BufferedReader(new InputStreamReader(
conn.getInputStream(), "UTF8"));
根据请求返回的 headers,我们可以得出结论,内容是使用 gzip 编码的。幸运的是,有一种简单的方法可以解码 gzip 编码流,使用 GZIPInputStream class.