gevent/grequests 通过 HTTPS 的奇怪阻塞行为
Strange blocking behaviour with gevent/grequests over HTTPS
以下代码每 200 毫秒发送一个请求,并且应该在响应到来时异步处理它们。
通过 HTTP 它按预期工作 - 每 200 毫秒发送一个请求,并且每当响应到达时独立调用响应回调。然而,通过 HTTPS,请求会在响应到达时显着延迟(即使我的响应处理程序不起作用)。似乎每个请求都调用了两次响应回调,一次是零长度响应(编辑:这是因为重定向,似乎与阻塞问题无关,感谢 Padraic)。
什么可能导致这种 HTTPS 阻塞行为? (www.bbc.co.uk
只是一个在地理上离我很远的例子,但它发生在我测试过的所有服务器上)。
grequests_test.py
import time
import sys
import grequests
import gevent
def cb(res, **kwargs):
print("**** Response", time.time(), len(res.text))
for i in range(10):
unsent = grequests.get(sys.argv[1], hooks={'response': cb})
print("Request", time.time())
grequests.send(unsent, grequests.Pool(1))
gevent.sleep(0.2)
gevent.sleep(5)
$ ipython2 grequests_test.py 'http://www.bbc.co.uk'
(预期结果)
('Request', 1459050191.499266)
('Request', 1459050191.701466)
('Request', 1459050191.903223)
('Request', 1459050192.10403)
('Request', 1459050192.305626)
('**** Response', 1459050192.099185, 179643)
('Request', 1459050192.506476)
('**** Response', 1459050192.307869, 179643)
('Request', 1459050192.707745)
('**** Response', 1459050192.484711, 179643)
('Request', 1459050192.909376)
('**** Response', 1459050192.696583, 179643)
('Request', 1459050193.110528)
('**** Response', 1459050192.870476, 179643)
('Request', 1459050193.311601)
('**** Response', 1459050193.071679, 179639)
('**** Response', 1459050193.313615, 179680)
('**** Response', 1459050193.4959, 179643)
('**** Response', 1459050193.687054, 179680)
('**** Response', 1459050193.902827, 179639)
ipython2 grequests_test.py 'https://www.bbc.co.uk'
(请求发送晚)
('Request', 1459050203.24336)
('Request', 1459050203.44473)
('**** Response', 1459050204.423302, 0)
('Request', 1459050204.424748) <------------- THIS REQUEST TIME IS LATE
('**** Response', 1459050205.294426, 0)
('Request', 1459050205.296722)
('Request', 1459050205.497924)
('**** Response', 1459050206.456572, 0)
('Request', 1459050206.457875)
('**** Response', 1459050207.363188, 0)
('**** Response', 1459050208.247189, 0)
('Request', 1459050208.249579)
('**** Response', 1459050208.250645, 179643)
('**** Response', 1459050208.253638, 179643)
('Request', 1459050208.451083)
('**** Response', 1459050209.426556, 0)
('Request', 1459050209.428032)
('**** Response', 1459050209.428929, 179643)
('**** Response', 1459050210.331425, 0)
('**** Response', 1459050211.247793, 0)
('Request', 1459050211.251574)
('**** Response', 1459050211.252321, 179643)
('**** Response', 1459050211.25519, 179680)
('**** Response', 1459050212.397186, 0)
('**** Response', 1459050213.299109, 0)
('**** Response', 1459050213.588854, 179588)
('**** Response', 1459050213.590434, 179643)
('**** Response', 1459050213.593731, 179643)
('**** Response', 1459050213.90507, 179643)
('**** Response', 1459050213.909386, 179643)
请注意,第一个响应似乎在下一个请求 应该 已发送但未发送后很久才到达。为什么在第一个响应到达之前没有睡眠 return 和下一个请求发送?
额外的响应和 0 长度的响应很容易解释,如果你添加一个 print(res.status_code)
你会看到很多 301,在 https://www.bbc.co.uk
的情况下你会被重定向到 http://www.bbc.co.uk
所以这就是为什么你看到额外的响应和 0
为 len(res.text)
返回的原因,你可以看到下面的输出:
In [11]: def cb(res, **kwargs):
....: print(res.status_code)
....: print("**** Response", time.time(), len(res.text))
....:
In [12]: for i in range(10):
....: unsent = grequests.get("https://www.bbc.co.uk", hooks={'response': cb})
....: print("Request", time.time())
....: grequests.send(unsent, grequests.Pool(1))
....: gevent.sleep(0.2)
....: gevent.sleep(5)
....:
('Request', 1459368704.32843)
301
('**** Response', 1459368704.616453, 0)
('Request', 1459368704.618786)
301
('**** Response', 1459368704.937159, 0)
('Request', 1459368704.941069)
200
('**** Response', 1459368704.943034, 141486)
301
('**** Response', 1459368705.496423, 0)
('Request', 1459368705.498991)
200
('**** Response', 1459368705.50162, 141448)
301
('**** Response', 1459368705.784145, 0)
('Request', 1459368705.785769)
200
('**** Response', 1459368705.786772, 141486)
301
('**** Response', 1459368706.110865, 0)
('Request', 1459368706.114921)
200
('**** Response', 1459368706.116124, 141448)
301
('**** Response', 1459368706.396807, 0)
('Request', 1459368706.400795)
200
301
('**** Response', 1459368706.756861, 0)
('Request', 1459368706.76069)
200
('**** Response', 1459368706.763268, 141448)
('**** Response', 1459368706.488708, 141448)
301
('**** Response', 1459368707.065011, 0)
('Request', 1459368707.069128)
200
('**** Response', 1459368707.071981, 141448)
301
('**** Response', 1459368707.366737, 0)
('Request', 1459368707.370713)
200
('**** Response', 1459368707.373597, 141448)
301
('**** Response', 1459368707.73689, 0)
200
('**** Response', 1459368707.743815, 141448)
200
('**** Response', 1459368707.902499, 141448)
如果我们 运行 使用通过 https 提供服务的站点的相同代码,https://www.google.ie/
在本例中:
In [14]: for i in range(10):
....: unsent = grequests.get("https://www.google.ie/", hooks={'response': cb})
....: print("Request", time.time())
....: grequests.send(unsent, grequests.Pool(1))
....: gevent.sleep(0.2)
....: gevent.sleep(5)
....:
('Request', 1459368895.525717)
200
('**** Response', 1459368895.838453, 19682)
('Request', 1459368895.884151)
200
('**** Response', 1459368896.168789, 19650)
('Request', 1459368896.22553)
200
('**** Response', 1459368896.491304, 19632)
('Request', 1459368896.542206)
200
('**** Response', 1459368896.808875, 19650)
('Request', 1459368896.850575)
200
('**** Response', 1459368897.144725, 19705)
('Request', 1459368897.173744)
200
('**** Response', 1459368897.45713, 19649)
('Request', 1459368897.491821)
200
('**** Response', 1459368897.761675, 19657)
('Request', 1459368897.792373)
200
('**** Response', 1459368898.331791, 19683)
('Request', 1459368898.350483)
200
('**** Response', 1459368898.836108, 19713)
('Request', 1459368898.855729)
200
('**** Response', 1459368899.148171, 19666)
您会发现行为有所不同。我们收到 10 条回复,但没有 0
长度的回复。你应该检查你的函数中的 status_code
来验证你得到了你想要的。上面的示例解释了您在 bbc 网站上看到的内容以及其他网站很可能发生的情况。
grequests 的当前迭代包含以下内容:
from gevent import monkey as curious_george
curious_george.patch_all(thread=False, select=False)
有问题的部分是 select=False
- 删除它或手动调用 monkey.patch_select()
可以解决问题。我不确定这是否有其他副作用。
以下代码每 200 毫秒发送一个请求,并且应该在响应到来时异步处理它们。
通过 HTTP 它按预期工作 - 每 200 毫秒发送一个请求,并且每当响应到达时独立调用响应回调。然而,通过 HTTPS,请求会在响应到达时显着延迟(即使我的响应处理程序不起作用)。似乎每个请求都调用了两次响应回调,一次是零长度响应(编辑:这是因为重定向,似乎与阻塞问题无关,感谢 Padraic)。
什么可能导致这种 HTTPS 阻塞行为? (www.bbc.co.uk
只是一个在地理上离我很远的例子,但它发生在我测试过的所有服务器上)。
grequests_test.py
import time
import sys
import grequests
import gevent
def cb(res, **kwargs):
print("**** Response", time.time(), len(res.text))
for i in range(10):
unsent = grequests.get(sys.argv[1], hooks={'response': cb})
print("Request", time.time())
grequests.send(unsent, grequests.Pool(1))
gevent.sleep(0.2)
gevent.sleep(5)
$ ipython2 grequests_test.py 'http://www.bbc.co.uk'
(预期结果)
('Request', 1459050191.499266)
('Request', 1459050191.701466)
('Request', 1459050191.903223)
('Request', 1459050192.10403)
('Request', 1459050192.305626)
('**** Response', 1459050192.099185, 179643)
('Request', 1459050192.506476)
('**** Response', 1459050192.307869, 179643)
('Request', 1459050192.707745)
('**** Response', 1459050192.484711, 179643)
('Request', 1459050192.909376)
('**** Response', 1459050192.696583, 179643)
('Request', 1459050193.110528)
('**** Response', 1459050192.870476, 179643)
('Request', 1459050193.311601)
('**** Response', 1459050193.071679, 179639)
('**** Response', 1459050193.313615, 179680)
('**** Response', 1459050193.4959, 179643)
('**** Response', 1459050193.687054, 179680)
('**** Response', 1459050193.902827, 179639)
ipython2 grequests_test.py 'https://www.bbc.co.uk'
(请求发送晚)
('Request', 1459050203.24336)
('Request', 1459050203.44473)
('**** Response', 1459050204.423302, 0)
('Request', 1459050204.424748) <------------- THIS REQUEST TIME IS LATE
('**** Response', 1459050205.294426, 0)
('Request', 1459050205.296722)
('Request', 1459050205.497924)
('**** Response', 1459050206.456572, 0)
('Request', 1459050206.457875)
('**** Response', 1459050207.363188, 0)
('**** Response', 1459050208.247189, 0)
('Request', 1459050208.249579)
('**** Response', 1459050208.250645, 179643)
('**** Response', 1459050208.253638, 179643)
('Request', 1459050208.451083)
('**** Response', 1459050209.426556, 0)
('Request', 1459050209.428032)
('**** Response', 1459050209.428929, 179643)
('**** Response', 1459050210.331425, 0)
('**** Response', 1459050211.247793, 0)
('Request', 1459050211.251574)
('**** Response', 1459050211.252321, 179643)
('**** Response', 1459050211.25519, 179680)
('**** Response', 1459050212.397186, 0)
('**** Response', 1459050213.299109, 0)
('**** Response', 1459050213.588854, 179588)
('**** Response', 1459050213.590434, 179643)
('**** Response', 1459050213.593731, 179643)
('**** Response', 1459050213.90507, 179643)
('**** Response', 1459050213.909386, 179643)
请注意,第一个响应似乎在下一个请求 应该 已发送但未发送后很久才到达。为什么在第一个响应到达之前没有睡眠 return 和下一个请求发送?
额外的响应和 0 长度的响应很容易解释,如果你添加一个 print(res.status_code)
你会看到很多 301,在 https://www.bbc.co.uk
的情况下你会被重定向到 http://www.bbc.co.uk
所以这就是为什么你看到额外的响应和 0
为 len(res.text)
返回的原因,你可以看到下面的输出:
In [11]: def cb(res, **kwargs):
....: print(res.status_code)
....: print("**** Response", time.time(), len(res.text))
....:
In [12]: for i in range(10):
....: unsent = grequests.get("https://www.bbc.co.uk", hooks={'response': cb})
....: print("Request", time.time())
....: grequests.send(unsent, grequests.Pool(1))
....: gevent.sleep(0.2)
....: gevent.sleep(5)
....:
('Request', 1459368704.32843)
301
('**** Response', 1459368704.616453, 0)
('Request', 1459368704.618786)
301
('**** Response', 1459368704.937159, 0)
('Request', 1459368704.941069)
200
('**** Response', 1459368704.943034, 141486)
301
('**** Response', 1459368705.496423, 0)
('Request', 1459368705.498991)
200
('**** Response', 1459368705.50162, 141448)
301
('**** Response', 1459368705.784145, 0)
('Request', 1459368705.785769)
200
('**** Response', 1459368705.786772, 141486)
301
('**** Response', 1459368706.110865, 0)
('Request', 1459368706.114921)
200
('**** Response', 1459368706.116124, 141448)
301
('**** Response', 1459368706.396807, 0)
('Request', 1459368706.400795)
200
301
('**** Response', 1459368706.756861, 0)
('Request', 1459368706.76069)
200
('**** Response', 1459368706.763268, 141448)
('**** Response', 1459368706.488708, 141448)
301
('**** Response', 1459368707.065011, 0)
('Request', 1459368707.069128)
200
('**** Response', 1459368707.071981, 141448)
301
('**** Response', 1459368707.366737, 0)
('Request', 1459368707.370713)
200
('**** Response', 1459368707.373597, 141448)
301
('**** Response', 1459368707.73689, 0)
200
('**** Response', 1459368707.743815, 141448)
200
('**** Response', 1459368707.902499, 141448)
如果我们 运行 使用通过 https 提供服务的站点的相同代码,https://www.google.ie/
在本例中:
In [14]: for i in range(10):
....: unsent = grequests.get("https://www.google.ie/", hooks={'response': cb})
....: print("Request", time.time())
....: grequests.send(unsent, grequests.Pool(1))
....: gevent.sleep(0.2)
....: gevent.sleep(5)
....:
('Request', 1459368895.525717)
200
('**** Response', 1459368895.838453, 19682)
('Request', 1459368895.884151)
200
('**** Response', 1459368896.168789, 19650)
('Request', 1459368896.22553)
200
('**** Response', 1459368896.491304, 19632)
('Request', 1459368896.542206)
200
('**** Response', 1459368896.808875, 19650)
('Request', 1459368896.850575)
200
('**** Response', 1459368897.144725, 19705)
('Request', 1459368897.173744)
200
('**** Response', 1459368897.45713, 19649)
('Request', 1459368897.491821)
200
('**** Response', 1459368897.761675, 19657)
('Request', 1459368897.792373)
200
('**** Response', 1459368898.331791, 19683)
('Request', 1459368898.350483)
200
('**** Response', 1459368898.836108, 19713)
('Request', 1459368898.855729)
200
('**** Response', 1459368899.148171, 19666)
您会发现行为有所不同。我们收到 10 条回复,但没有 0
长度的回复。你应该检查你的函数中的 status_code
来验证你得到了你想要的。上面的示例解释了您在 bbc 网站上看到的内容以及其他网站很可能发生的情况。
grequests 的当前迭代包含以下内容:
from gevent import monkey as curious_george
curious_george.patch_all(thread=False, select=False)
有问题的部分是 select=False
- 删除它或手动调用 monkey.patch_select()
可以解决问题。我不确定这是否有其他副作用。