PHP file_get_contents 为 HTTP 请求返回不一致的部分数据
PHP file_get_contents Returning Inconsistent Partial Data for HTTP Requests
我正在尝试使用 PHP SoapClient 来执行对 third-party 应用程序的请求。当我创建 SoapClient object 时,我收到有关 WSDL 数据过早结束的错误。在尝试诊断错误时,我发现 WSDL URI 的 file_get_contents() 并不 return 整个 XML。事实上,它经常 returns 不同数量的 WSDL。这是我的测试程序:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";
我每次得到大约 57k 字节(195628 是正确的值),有时更多而且很少我得到整个 XML。我认为这是一个 PHP 问题,因为 shell 循环为此 URI 调用 curl 或 wget 100 次将 100% 的时间 return 整个文件。我在 PHP 5.4.16 上,我知道它很旧(2013 年),但是这个过程工作了大约一个月然后就完全停止了。
我试过更改超时、HTTP 协议版本、PHP 内存设置,但我不明白为什么 file_get_contents 会这样。任何建议表示赞赏。
卷曲测试:
for a in $( seq 1 100 ); do curl -o wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done
Wget 测试:
for a in $( seq 1 100 ); do wget -O wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done
更新 1:
将 maxlen 设置为一些愚蠢的大数不会影响行为:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, null, 0, 999999);
echo $xml . "\n";
echo strlen($xml). "\n";
更新 2:
$ curl -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT
Accept-Ranges: bytes
ETag: "39163cd7331ed61:0"
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 15:22:28 GMT
Content-Length: 195628
以下是 header 对 PHP 报告的回复:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";
echo var_dump($http_response_header);
array(11) {
[0]=> string(15) "HTTP/1.1 200 OK"
[1]=> string(22) "Content-Type: text/xml"
[2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT"
[3]=> string(20) "Accept-Ranges: bytes"
[4]=> string(25) "ETag: "39163cd7331ed61:0""
[5]=> string(25) "Server: Microsoft-IIS/8.5"
[6]=> string(21) "X-Powered-By: ASP.NET"
[7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
[8]=> string(35) "Date: Fri, 08 May 2020 15:26:54 GMT"
[9]=> string(22) "Connection: keep-alive"
[10]=> string(22) "Content-Length: 195628"
}
更新 3:
使用 gzip 从 PHP 中损坏 Content-Length header:
$ctx = stream_context_create(array(
'http' => array(
'header' => "Accept-Encoding: gzip\r\n"
)
));
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, $ctx);
echo var_dump($http_response_header);
array(12) {
[0]=> string(15) "HTTP/1.1 200 OK"
[1]=> string(22) "Content-Type: text/xml"
[2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT"
[3]=> string(20) "Accept-Ranges: bytes"
[4]=> string(24) "ETag: "b376e7b331ed61:0""
[5]=> string(25) "Server: Microsoft-IIS/8.5"
[6]=> string(21) "X-Powered-By: ASP.NET"
[7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
[8]=> string(35) "Date: Fri, 08 May 2020 15:44:12 GMT"
[9]=> string(22) "Connection: keep-alive"
[10]=> string(22) "ntCoent-Length: 195628"
[11]=> string(22) "Content-Encoding: gzip"
}
更新 4:
Headers 来自带有 gzip 的 curl(注意它们看起来是正确的):
$ curl --compressed -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Content-Encoding: gzip
Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT
Accept-Ranges: bytes
ETag: "807d37b331ed61:0"
Vary: Accept-Encoding
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 16:12:13 GMT
Content-Length: 13192
我能够强制 SoapClient 不使用 gzip,这确实解决了问题,尽管效率低下。我们仍然没有 PHP 破坏 header 的根本原因。
// Autotask Client options
$auth_opts = array(
'login' => $username,
'password' => $password,
'trace' => 1,
'http' => array(
'header' => array(
'Accept-Encoding' => 'identity' // here be dragons
)
)
);
更新 5:
我们确认这在 PHP 7.2 中仍然可以重现。我和 PHP 团队开了一个 bug。
webservices3.autotask.net 响应不佳 header
HTTP/1.1 200 OK
Content-Type: text/xml
Accept-Ranges: bytes
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Cteonnt-Length: 195628
Cache-Control: private
Content-Encoding: gzip
Transfer-Encoding: chunked
注意:Cteonnt-Length: 195628
应该是Content-Length: 195628
这就是为什么 file_get_contents
无法正确处理请求的原因。
因此,修复响应或设置 maxlen
更新:
乱七八糟的header。
这应该有效 https://whosebug.com/a/8582042/3849743
这似乎是 PHP 中的错误。问题末尾有一个 bug 报告链接。
我正在尝试使用 PHP SoapClient 来执行对 third-party 应用程序的请求。当我创建 SoapClient object 时,我收到有关 WSDL 数据过早结束的错误。在尝试诊断错误时,我发现 WSDL URI 的 file_get_contents() 并不 return 整个 XML。事实上,它经常 returns 不同数量的 WSDL。这是我的测试程序:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";
我每次得到大约 57k 字节(195628 是正确的值),有时更多而且很少我得到整个 XML。我认为这是一个 PHP 问题,因为 shell 循环为此 URI 调用 curl 或 wget 100 次将 100% 的时间 return 整个文件。我在 PHP 5.4.16 上,我知道它很旧(2013 年),但是这个过程工作了大约一个月然后就完全停止了。
我试过更改超时、HTTP 协议版本、PHP 内存设置,但我不明白为什么 file_get_contents 会这样。任何建议表示赞赏。
卷曲测试:
for a in $( seq 1 100 ); do curl -o wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done
Wget 测试:
for a in $( seq 1 100 ); do wget -O wsdl.$a https://webservices3.autotask.net/atservices/1.6/atws.wsdl; done
更新 1:
将 maxlen 设置为一些愚蠢的大数不会影响行为:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, null, 0, 999999);
echo $xml . "\n";
echo strlen($xml). "\n";
更新 2:
$ curl -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT
Accept-Ranges: bytes
ETag: "39163cd7331ed61:0"
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 15:22:28 GMT
Content-Length: 195628
以下是 header 对 PHP 报告的回复:
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl');
echo $xml . "\n";
echo strlen($xml). "\n";
echo var_dump($http_response_header);
array(11) {
[0]=> string(15) "HTTP/1.1 200 OK"
[1]=> string(22) "Content-Type: text/xml"
[2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:38:25 GMT"
[3]=> string(20) "Accept-Ranges: bytes"
[4]=> string(25) "ETag: "39163cd7331ed61:0""
[5]=> string(25) "Server: Microsoft-IIS/8.5"
[6]=> string(21) "X-Powered-By: ASP.NET"
[7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
[8]=> string(35) "Date: Fri, 08 May 2020 15:26:54 GMT"
[9]=> string(22) "Connection: keep-alive"
[10]=> string(22) "Content-Length: 195628"
}
更新 3:
使用 gzip 从 PHP 中损坏 Content-Length header:
$ctx = stream_context_create(array(
'http' => array(
'header' => "Accept-Encoding: gzip\r\n"
)
));
$xml = file_get_contents('https://webservices3.autotask.net/atservices/1.6/atws.wsdl', false, $ctx);
echo var_dump($http_response_header);
array(12) {
[0]=> string(15) "HTTP/1.1 200 OK"
[1]=> string(22) "Content-Type: text/xml"
[2]=> string(44) "Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT"
[3]=> string(20) "Accept-Ranges: bytes"
[4]=> string(24) "ETag: "b376e7b331ed61:0""
[5]=> string(25) "Server: Microsoft-IIS/8.5"
[6]=> string(21) "X-Powered-By: ASP.NET"
[7]=> string(228) "Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/ "
[8]=> string(35) "Date: Fri, 08 May 2020 15:44:12 GMT"
[9]=> string(22) "Connection: keep-alive"
[10]=> string(22) "ntCoent-Length: 195628"
[11]=> string(22) "Content-Encoding: gzip"
}
更新 4:
Headers 来自带有 gzip 的 curl(注意它们看起来是正确的):
$ curl --compressed -s -D /dev/stderr -- https://webservices3.autotask.net/atservices/1.6/atws.wsdl > /dev/null
HTTP/1.1 200 OK
Content-Type: text/xml
Content-Encoding: gzip
Last-Modified: Wed, 29 Apr 2020 14:35:51 GMT
Accept-Ranges: bytes
ETag: "807d37b331ed61:0"
Vary: Accept-Encoding
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Content-Security-Policy: default-src 'self' https: *;script-src 'self' 'unsafe-inline' 'unsafe-eval' https: *;style-src 'self' 'unsafe-inline';img-src 'self' https://walkme.psa.datto.com/Images/ data: https://www.datto.com/img/
Date: Fri, 08 May 2020 16:12:13 GMT
Content-Length: 13192
我能够强制 SoapClient 不使用 gzip,这确实解决了问题,尽管效率低下。我们仍然没有 PHP 破坏 header 的根本原因。
// Autotask Client options
$auth_opts = array(
'login' => $username,
'password' => $password,
'trace' => 1,
'http' => array(
'header' => array(
'Accept-Encoding' => 'identity' // here be dragons
)
)
);
更新 5:
我们确认这在 PHP 7.2 中仍然可以重现。我和 PHP 团队开了一个 bug。
webservices3.autotask.net 响应不佳 header
HTTP/1.1 200 OK
Content-Type: text/xml
Accept-Ranges: bytes
Server: Microsoft-IIS/8.5
X-Powered-By: ASP.NET
Cteonnt-Length: 195628
Cache-Control: private
Content-Encoding: gzip
Transfer-Encoding: chunked
注意:Cteonnt-Length: 195628
应该是Content-Length: 195628
这就是为什么 file_get_contents
无法正确处理请求的原因。
因此,修复响应或设置 maxlen
更新: 乱七八糟的header。 这应该有效 https://whosebug.com/a/8582042/3849743
这似乎是 PHP 中的错误。问题末尾有一个 bug 报告链接。