我将如何跟踪一大批 grequest 的进度?
How would I track progress on a large batch of grequests?
我有时会通过 Python 的 grequest.map
函数发送大量请求。目前我的代码如下所示
# passes these two into the function. The list of parameters can sometimes be thousands long.
# made this example up
local_path = 'https://www.google.com/search?q={}'
parameters = [('the+answer+to+life+the+universe+and+everything'), ('askew'), ('fun+facts')]
s = requests.Session()
retries = Retry(total=5, backoff_factor=0.2, status_forcelist=[500,502,503,504], raise_on_redirect=True, raise_on_status=True)
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
async_list = []
for parameters in parameter_list:
URL = local_path.format(*parameters)
async_list.append(grequests.get(URL, session=s))
results = grequests.map(async_list)
我是 tqdm 库的粉丝,很想知道有多少请求已完成以及有多少仍在等待,但我不确定是否可以轮询或从 grequest.get
或 Session
生成一个能够执行此操作的挂钩。我确实尝试使用 grequests.get(URL, hooks={'response': test}, session=s)
但这似乎实际上将响应本身提供给测试函数然后 results
有 None
.
的内容
编辑:在我发布这个问题后不久,我探索了 test
钩子函数中的 return 值,但无论我尝试什么,似乎如果有一个钩子,那么 map
函数在有响应之前不会阻塞;导致 None
响应,但我的钩子也没有任何反应。
我如何跟踪大量请求的进度?
使用 hooks 参数是正确的解决方案。我发现我设置的 test
回调遇到异常(诅咒那些微小的范围错误)并且因为我没有为我的请求设置异常处理程序它导致了一个静默错误导致 None
回应。
这是我最终得到的设置。
track_requests = None
def request_fulfilled(r, *args, **kwargs):
track_requests.update()
local_path = 'https://www.google.com/search?q={}'
parameters = [('the+answer+to+life+the+universe+and+everything'), ('askew'), ('fun+facts')]
global track_requests # missing this line was the cause of my issue...
s = requests.Session()
s.hooks['response'].append(request_fulfilled) # assign hook here
retries = Retry(total=5, backoff_factor=0.2, status_forcelist=[500,502,503,504], raise_on_redirect=True, raise_on_status=True)
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
async_list = []
for parameters in parameter_list:
URL = local_path.format(*parameters)
async_list.append(grequests.get(URL, session=s))
track_requests = tqdm(total=len(async_list))
results = grequests.map(async_list)
track_requests.close()
track_requests = None
我有时会通过 Python 的 grequest.map
函数发送大量请求。目前我的代码如下所示
# passes these two into the function. The list of parameters can sometimes be thousands long.
# made this example up
local_path = 'https://www.google.com/search?q={}'
parameters = [('the+answer+to+life+the+universe+and+everything'), ('askew'), ('fun+facts')]
s = requests.Session()
retries = Retry(total=5, backoff_factor=0.2, status_forcelist=[500,502,503,504], raise_on_redirect=True, raise_on_status=True)
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
async_list = []
for parameters in parameter_list:
URL = local_path.format(*parameters)
async_list.append(grequests.get(URL, session=s))
results = grequests.map(async_list)
我是 tqdm 库的粉丝,很想知道有多少请求已完成以及有多少仍在等待,但我不确定是否可以轮询或从 grequest.get
或 Session
生成一个能够执行此操作的挂钩。我确实尝试使用 grequests.get(URL, hooks={'response': test}, session=s)
但这似乎实际上将响应本身提供给测试函数然后 results
有 None
.
编辑:在我发布这个问题后不久,我探索了 test
钩子函数中的 return 值,但无论我尝试什么,似乎如果有一个钩子,那么 map
函数在有响应之前不会阻塞;导致 None
响应,但我的钩子也没有任何反应。
我如何跟踪大量请求的进度?
使用 hooks 参数是正确的解决方案。我发现我设置的 test
回调遇到异常(诅咒那些微小的范围错误)并且因为我没有为我的请求设置异常处理程序它导致了一个静默错误导致 None
回应。
这是我最终得到的设置。
track_requests = None
def request_fulfilled(r, *args, **kwargs):
track_requests.update()
local_path = 'https://www.google.com/search?q={}'
parameters = [('the+answer+to+life+the+universe+and+everything'), ('askew'), ('fun+facts')]
global track_requests # missing this line was the cause of my issue...
s = requests.Session()
s.hooks['response'].append(request_fulfilled) # assign hook here
retries = Retry(total=5, backoff_factor=0.2, status_forcelist=[500,502,503,504], raise_on_redirect=True, raise_on_status=True)
s.mount('http://', HTTPAdapter(max_retries=retries))
s.mount('https://', HTTPAdapter(max_retries=retries))
async_list = []
for parameters in parameter_list:
URL = local_path.format(*parameters)
async_list.append(grequests.get(URL, session=s))
track_requests = tqdm(total=len(async_list))
results = grequests.map(async_list)
track_requests.close()
track_requests = None