Python 多个请求
Python Multiple requests
我有一种情况需要在调度程序作业中调用多个请求以一次检查 1000 个用户的实时用户状态。但是服务器在 API 请求的每次命中中最多限制 50 个用户。因此,使用以下带有 for
循环的方法对于 1000 个用户(即 20 个 API 调用)大约需要 66 秒。
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
def shcdulerjob():
"""
"""
uidlist = todays_userslist() #Get around 1000 users from table
#-- DIVIDE LIST BY GIVEN SIZE (here 50)
split_list = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)]
idlists = split_list(uidlist, 50) # SERVER MAX LIMIT - 50 ids/request
for idlist in idlists:
apiurl = some_server_url + "&ids="+str(idlist)
resp = requests.get(apiurl)
save_status(resp.json()) #-- Save status to db
if __name__ == "__main__":
sched.add_job(shcdulerjob, 'interval', minutes=10)
sched.start()
所以,
- 是否有任何解决方法可以优化获取 API 所需的时间?
Python- APScheduler
是否提供任何多处理选项来在单个作业中处理此类 api 请求?
如果服务器允许并发请求,您可以尝试从 concurrent.futures
模块应用 python 的线程池。这样你就可以并行处理,而不是调度本身
文档中提供了一些很好的示例 here (If you're using python 2, there is a sort of an equivalent module
例如
import concurrent.futures
import multiprocessing
import requests
import time
import json
cpu_start_time = time.process_time()
clock_start_time = time.time()
queue = multiprocessing.Queue()
uri = "http://localhost:5000/data.json"
users = [str(user) for user in range(1, 50)]
with concurrent.futures.ThreadPoolExecutor(multiprocessing.cpu_count()) as executor:
for user_id, result in zip(
[str(user) for user in range(1, 50)]
, executor.map(lambda x: requests.get(uri, params={id: x}).content, users)
):
queue.put((user_id, result))
while not queue.empty():
user_id, rs = queue.get()
print("User ", user_id, json.loads(rs.decode()))
cpu_end_time = time.process_time()
clock_end_time = time.time()
print("Took {0:.03}s [{1:.03}s]".format(cpu_end_time-cpu_start_time, clock_end_time-clock_start_time))
如果您想使用进程池,请确保您不使用共享资源,例如排队,并独立写入您的数据
我有一种情况需要在调度程序作业中调用多个请求以一次检查 1000 个用户的实时用户状态。但是服务器在 API 请求的每次命中中最多限制 50 个用户。因此,使用以下带有 for
循环的方法对于 1000 个用户(即 20 个 API 调用)大约需要 66 秒。
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
def shcdulerjob():
"""
"""
uidlist = todays_userslist() #Get around 1000 users from table
#-- DIVIDE LIST BY GIVEN SIZE (here 50)
split_list = lambda lst, sz: [lst[i:i+sz] for i in range(0, len(lst), sz)]
idlists = split_list(uidlist, 50) # SERVER MAX LIMIT - 50 ids/request
for idlist in idlists:
apiurl = some_server_url + "&ids="+str(idlist)
resp = requests.get(apiurl)
save_status(resp.json()) #-- Save status to db
if __name__ == "__main__":
sched.add_job(shcdulerjob, 'interval', minutes=10)
sched.start()
所以,
- 是否有任何解决方法可以优化获取 API 所需的时间?
Python- APScheduler
是否提供任何多处理选项来在单个作业中处理此类 api 请求?
如果服务器允许并发请求,您可以尝试从 concurrent.futures
模块应用 python 的线程池。这样你就可以并行处理,而不是调度本身
文档中提供了一些很好的示例 here (If you're using python 2, there is a sort of an equivalent module
例如
import concurrent.futures
import multiprocessing
import requests
import time
import json
cpu_start_time = time.process_time()
clock_start_time = time.time()
queue = multiprocessing.Queue()
uri = "http://localhost:5000/data.json"
users = [str(user) for user in range(1, 50)]
with concurrent.futures.ThreadPoolExecutor(multiprocessing.cpu_count()) as executor:
for user_id, result in zip(
[str(user) for user in range(1, 50)]
, executor.map(lambda x: requests.get(uri, params={id: x}).content, users)
):
queue.put((user_id, result))
while not queue.empty():
user_id, rs = queue.get()
print("User ", user_id, json.loads(rs.decode()))
cpu_end_time = time.process_time()
clock_end_time = time.time()
print("Took {0:.03}s [{1:.03}s]".format(cpu_end_time-cpu_start_time, clock_end_time-clock_start_time))
如果您想使用进程池,请确保您不使用共享资源,例如排队,并独立写入您的数据