使用相同的 url 但每次都传递不同的 ID 来发出 800 多个 get 请求的最快方法
Fastest way to make 800+ get requests using same url but passing different ids everytime
从包含 800 多个设备 ID 的 excel sheet 加载设备 ID 并在 http get 请求中传递这些设备 ID 的最快方法是什么。
我正在从 excel sheet 获取设备 ID,发出 http get 请求以获取相关数据并将其转储到列表中,然后将其保存在 excel文件使用:-
if __name__ == '__main__':
excel_file = openpyxl.load_workbook("D:\mypath\Book1.xlsx")
active_sheet = excel_file.get_sheet_by_name("Sheet4")
def iter_rows(active_sheet):
for row in active_sheet.iter_rows():
yield [cell.value for cell in row]
res = iter_rows(active_sheet)
keys = next(res)
final_data_to_dump = []
failed_data_dump = []
for new in res:
inventory_data = dict(zip(keys, new))
if None in inventory_data.values():
pass
else:
url_get_event = 'https://some_url&source={}'.format(inventory_data['DeviceID'])
header_events = {
'Authorization': 'Basic authkey_here'}
print(inventory_data['DeviceID'])
try:
r3 = requests.get(url_get_event, headers=header_events)
r3_json = json.loads(r3.content)
if r3_json['events']:
for object in r3_json['events']:
dict_excel_data = {
"DeviceID":object['source']['id'],
"Device Name":object['source']['name'],
"Start 1":object['Start1'],
"Start 2":object['Start2'],
"Watering Mode":object['WateringMode'],
"Duration":object['ActuationDetails']['Duration'],
"Type":object['type'],
"Creation Time":object['creationTime']
}
final_data_to_dump.append(dict_excel_data)
else:
no_dict_excel_data = {
"DeviceID":inventory_data["DeviceID"],
"Device Name":inventory_data["DeviceName"],
"Start 1":"",
"Start 2":"",
"Watering Mode":"",
"Duration":"",
"Type":"",
"Creation Time":""
}
final_data_to_dump.append(no_dict_excel_data)
except requests.ConnectionError:
failed_dict_excel_data = {
"DeviceID":inventory_data['DeviceID'],
"Device Name":inventory_data["DeviceName"],
"Status":"Connection Error"
}
failed_data_dump.append(failed_dict_excel_data)
df = pd.DataFrame.from_dict(final_data_to_dump)
df2 = pd.DataFrame.from_dict(failed_data_dump)
df.to_excel('D:\mypath\ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)
df2.to_excel('D:\mypath\Failed_ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)
但这可能需要 10-15 分钟以上的时间,因为 Book1 sheet 中有 800 多台设备,而且可能还会增加。我怎样才能使这个过程更快?
您可以使用异步库,但最简单的解决方案是执行类似
的操作
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as exc:
responses = exc.map(get, device_ids)
def get(device_id):
url_get_event = 'https://some_url&source={}'.format(device_id)
return requests.get(url_get_event)
如果代码的其他部分很小,您可能希望将函数提交给执行程序并使用 as_completed
在主线程中处理它们,同时等待对 运行 的其他请求.
从包含 800 多个设备 ID 的 excel sheet 加载设备 ID 并在 http get 请求中传递这些设备 ID 的最快方法是什么。
我正在从 excel sheet 获取设备 ID,发出 http get 请求以获取相关数据并将其转储到列表中,然后将其保存在 excel文件使用:-
if __name__ == '__main__':
excel_file = openpyxl.load_workbook("D:\mypath\Book1.xlsx")
active_sheet = excel_file.get_sheet_by_name("Sheet4")
def iter_rows(active_sheet):
for row in active_sheet.iter_rows():
yield [cell.value for cell in row]
res = iter_rows(active_sheet)
keys = next(res)
final_data_to_dump = []
failed_data_dump = []
for new in res:
inventory_data = dict(zip(keys, new))
if None in inventory_data.values():
pass
else:
url_get_event = 'https://some_url&source={}'.format(inventory_data['DeviceID'])
header_events = {
'Authorization': 'Basic authkey_here'}
print(inventory_data['DeviceID'])
try:
r3 = requests.get(url_get_event, headers=header_events)
r3_json = json.loads(r3.content)
if r3_json['events']:
for object in r3_json['events']:
dict_excel_data = {
"DeviceID":object['source']['id'],
"Device Name":object['source']['name'],
"Start 1":object['Start1'],
"Start 2":object['Start2'],
"Watering Mode":object['WateringMode'],
"Duration":object['ActuationDetails']['Duration'],
"Type":object['type'],
"Creation Time":object['creationTime']
}
final_data_to_dump.append(dict_excel_data)
else:
no_dict_excel_data = {
"DeviceID":inventory_data["DeviceID"],
"Device Name":inventory_data["DeviceName"],
"Start 1":"",
"Start 2":"",
"Watering Mode":"",
"Duration":"",
"Type":"",
"Creation Time":""
}
final_data_to_dump.append(no_dict_excel_data)
except requests.ConnectionError:
failed_dict_excel_data = {
"DeviceID":inventory_data['DeviceID'],
"Device Name":inventory_data["DeviceName"],
"Status":"Connection Error"
}
failed_data_dump.append(failed_dict_excel_data)
df = pd.DataFrame.from_dict(final_data_to_dump)
df2 = pd.DataFrame.from_dict(failed_data_dump)
df.to_excel('D:\mypath\ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)
df2.to_excel('D:\mypath\Failed_ReportReceived_10Apr.xlsx',sheet_name='Sheet1',index=False)
但这可能需要 10-15 分钟以上的时间,因为 Book1 sheet 中有 800 多台设备,而且可能还会增加。我怎样才能使这个过程更快?
您可以使用异步库,但最简单的解决方案是执行类似
的操作from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor() as exc:
responses = exc.map(get, device_ids)
def get(device_id):
url_get_event = 'https://some_url&source={}'.format(device_id)
return requests.get(url_get_event)
如果代码的其他部分很小,您可能希望将函数提交给执行程序并使用 as_completed
在主线程中处理它们,同时等待对 运行 的其他请求.