Apache HttpClient 未显示响应的 Content-Length 和 Content-Encoding headers
Apache HttpClient is not showing Content-Length and Content-Encoding headers of the response
我安装了 Apache httpcomponents-client-5.0.x,在查看 http 响应的 headers 时,我很震惊它没有显示 Content-Length
和 Content-Encoding
headers, 这是我用来测试的代码
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import com.sun.net.httpserver.Headers;
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet request = new HttpGet(new URI("https://www.example.com"));
CloseableHttpResponse response = httpclient.execute(request);
Header[] responseHeaders = response.getHeaders();
for(Header header: responseHeaders) {
System.out.println(header.getName());
}
// this prints all the headers except
// status code header
// Content-Length
// Content-Encoding
无论我尝试什么,我都会得到相同的结果,就像这样
Iterator<Header> headersItr = response.headerIterator();
while(headersItr.hasNext()) {
Header header = headersItr.next();
System.out.println(header.getName());
}
或者这个
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentEncoding()); // NULL
System.out.println(entity.getContentLength()); // -1
根据 6 年前提出的 this question,即使是旧版本的 Apache HttpClient,这似乎也是一个老问题。
Of-course 服务器实际上返回那些 headers,正如 Wireshark 确认的那样,Apache HttpClient 记录自己
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << HTTP/1.1 200 OK
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Encoding: gzip
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Accept-Ranges: bytes
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Age: 451956
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Cache-Control: max-age=604800
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Type: text/html; charset=UTF-8
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Date: Fri, 03 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Etag: "3147526947+gzip"
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Expires: Fri, 10 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Server: ECS (dcb/7EEB)
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Vary: Accept-Encoding
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << X-Cache: HIT
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Length: 648
顺便说一句,称为 JDK HttpClient
的 java.net.http
库运行良好并显示所有 headers.
我做错了什么,或者我应该报告一个存在多年的错误吗?
在这种情况下,content-length 可能会被忽略。
HttpGet request = new HttpGet(new URI("https://www.example.com"));
request.setHeader("Accept-Encoding", "identity");
CloseableHttpResponse response = httpclient.execute(request);
我可以看到以下内容
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentLength());
System.out.println(entity.getContentEncoding());
输出
...
2020-04-03 03:04:17.760 DEBUG 34196 --- [ main] org.apache.hc.client5.http.headers : http-outgoing-0 << Content-Length: 1256
...
1256
null
我想提醒您注意正在发送的 header:
http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
告诉服务器这个客户端可以接受 gzip,x-gzip 并压缩内容作为响应。响应表明它是 'gzip' 编码的。
http-outgoing-0 << Content-Encoding: gzip
我相信 HttpClient 正在内部透明地处理这个问题并使内容可用。
如您引用的另一篇文章所述,其中一个答案表明可以应用 EntityUtils.toByteArray(httpResponse.getEntity()).length
方法来获取内容长度。
这里是 HttpComponents 提交者...
你没有仔细看Dave G说的话。默认情况下,HttpClientBuilder
将启用透明解压缩,您不再看到某些 headers 的原因是 here:
if (decoderFactory != null) {
response.setEntity(new DecompressingEntity(response.getEntity(), decoderFactory));
response.removeHeaders(HttpHeaders.CONTENT_LENGTH);
response.removeHeaders(HttpHeaders.CONTENT_ENCODING);
response.removeHeaders(HttpHeaders.CONTENT_MD5);
} ...
关于JDK HttpClient,它不会进行任何透明解压,所以你看到的是压缩流的长度。需自行解压
curl 提交者在这里...
我安装了 Apache httpcomponents-client-5.0.x,在查看 http 响应的 headers 时,我很震惊它没有显示 Content-Length
和 Content-Encoding
headers, 这是我用来测试的代码
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import com.sun.net.httpserver.Headers;
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet request = new HttpGet(new URI("https://www.example.com"));
CloseableHttpResponse response = httpclient.execute(request);
Header[] responseHeaders = response.getHeaders();
for(Header header: responseHeaders) {
System.out.println(header.getName());
}
// this prints all the headers except
// status code header
// Content-Length
// Content-Encoding
无论我尝试什么,我都会得到相同的结果,就像这样
Iterator<Header> headersItr = response.headerIterator();
while(headersItr.hasNext()) {
Header header = headersItr.next();
System.out.println(header.getName());
}
或者这个
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentEncoding()); // NULL
System.out.println(entity.getContentLength()); // -1
根据 6 年前提出的 this question,即使是旧版本的 Apache HttpClient,这似乎也是一个老问题。
Of-course 服务器实际上返回那些 headers,正如 Wireshark 确认的那样,Apache HttpClient 记录自己
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << HTTP/1.1 200 OK
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Encoding: gzip
2020-04-03 07:59:09,106 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Accept-Ranges: bytes
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Age: 451956
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Cache-Control: max-age=604800
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Type: text/html; charset=UTF-8
2020-04-03 07:59:09,107 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Date: Fri, 03 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Etag: "3147526947+gzip"
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Expires: Fri, 10 Apr 2020 05:59:09 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Server: ECS (dcb/7EEB)
2020-04-03 07:59:09,108 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Vary: Accept-Encoding
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << X-Cache: HIT
2020-04-03 07:59:09,109 DEBUG [org.apache.hc.client5.http.headers] http-outgoing-0 << Content-Length: 648
顺便说一句,称为 JDK HttpClient
的 java.net.http
库运行良好并显示所有 headers.
我做错了什么,或者我应该报告一个存在多年的错误吗?
在这种情况下,content-length 可能会被忽略。
HttpGet request = new HttpGet(new URI("https://www.example.com"));
request.setHeader("Accept-Encoding", "identity");
CloseableHttpResponse response = httpclient.execute(request);
我可以看到以下内容
HttpEntity entity = response.getEntity();
System.out.println(entity.getContentLength());
System.out.println(entity.getContentEncoding());
输出
...
2020-04-03 03:04:17.760 DEBUG 34196 --- [ main] org.apache.hc.client5.http.headers : http-outgoing-0 << Content-Length: 1256
...
1256
null
我想提醒您注意正在发送的 header:
http-outgoing-0 >> Accept-Encoding: gzip, x-gzip, deflate
告诉服务器这个客户端可以接受 gzip,x-gzip 并压缩内容作为响应。响应表明它是 'gzip' 编码的。
http-outgoing-0 << Content-Encoding: gzip
我相信 HttpClient 正在内部透明地处理这个问题并使内容可用。
如您引用的另一篇文章所述,其中一个答案表明可以应用 EntityUtils.toByteArray(httpResponse.getEntity()).length
方法来获取内容长度。
这里是 HttpComponents 提交者...
你没有仔细看Dave G说的话。默认情况下,HttpClientBuilder
将启用透明解压缩,您不再看到某些 headers 的原因是 here:
if (decoderFactory != null) {
response.setEntity(new DecompressingEntity(response.getEntity(), decoderFactory));
response.removeHeaders(HttpHeaders.CONTENT_LENGTH);
response.removeHeaders(HttpHeaders.CONTENT_ENCODING);
response.removeHeaders(HttpHeaders.CONTENT_MD5);
} ...
关于JDK HttpClient,它不会进行任何透明解压,所以你看到的是压缩流的长度。需自行解压
curl 提交者在这里...