python requests.get 中的特定字符串给出 ValueError

Specific string in python requests.get gives ValueError

正在尝试获取此特定页面...

request = requests.get('http://market.yandex.ru/catalog/90555/list')

...给我一个奇怪的错误:

ValueError                                Traceback (most recent call last)
    C:\Python34\lib\site-packages\requests\packages\urllib3\response.py in read_chunked(self, amt)
    406                 try:
--> 407                     self.chunk_left = int(line, 16)
    408                 except ValueError:

ValueError: invalid literal for int() with base 16: ''

我发现应该归咎于字符串的某些部分。我正在试验它,结果更加奇怪:

# No error
http://market.ru/catalog/90555/list
http://market.yandex.ru/catalo

# Error
http://market.yandex.ru/catalog

P.S。对了,今天就出现了这个问题。就在最近,我在获取这个页面时没有遇到任何问题(使用相同的方法)。

您正受到速率限制,但服务器这样做的方式违反了 HTTP 规范。他们的响应 headers 承诺一个 Chunked 传输编码,然后不发送这样的响应。

如果您在详细模式下查看带有 curl 的 URL,您会得到以下输出:

$ curl -v https://market.yandex.ru/catalog/90555/list
* Hostname was NOT found in DNS cache
*   Trying 213.180.204.22...
* Connected to market.yandex.ru (213.180.204.22) port 443 (#0)
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256
* Server certificate: market.yandex.ru
* Server certificate: Certum Level IV CA
* Server certificate: Certum CA
> GET /catalog/90555/list HTTP/1.1
> User-Agent: curl/7.37.1
> Host: market.yandex.ru
> Accept: */*
> 
< HTTP/1.1 302 Found
* Server nginx is not blacklisted
< Server: nginx
< Date: Mon, 18 May 2015 18:53:15 GMT
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Keep-Alive: timeout=120
< X-Forwardtouser-Y: 1
< Set-Cookie: spravka=dD0xNDAwNDM5MTk1O2k9ODQuOTIuOTguMTcwO3U9MTQwMDQzOTE5NTUxNjUwOTExMjtoPWNkMzVlMzBlMjgxMTg4YWM0YjYyZDg3OTg4ZjUyNWFj; domain=.yandex.ru; path=/; expires=Wed, 17-Jun-2015 18:53:15 GMT
< Location: http://market.yandex.ru/showcaptcha?cc=1&retpath=http%3A//market.yandex.ru/catalog/90555/list%3F_bfd13d35fbf1551a835f050d3775fc4b&t=0/1431975195/029660aeb063916c78e30ebd9444fd4b&s=4dd645e7048b399008278208fa776ba9
< Set-Cookie: uid=CniLolVaNRthdR2JDtV0Ag==; path=/
< 
* transfer closed with outstanding read data remaining
* Closing connection 0
curl: (18) transfer closed with outstanding read data remaining

他们正在向您发送 redirect,但响应中的 Transfer-Encoding: chunked header 意味着客户端必须加载块,而这些块不是那里。

重定向导致验证码:

http://market.yandex.ru/showcaptcha?cc=1&retpath=http%3A//market.yandex.ru/catalog/90555/list%3F_bfd13d35fbf1551a835f050d3775fc4b&t=0/1431975195/029660aeb063916c78e30ebd9444fd4b&s=4dd645e7048b399008278208fa776ba9
#                       ^^^^^^^^^^^