在 Python 中使用 ThreadPoolExecutor 获取数据时添加一列

Add a column during getting data with ThreadPoolExecutor in Python

我想使用 ThreadPoolExecutor 读取下面 link 中不同编号的不同页面,并将相关编号作为新列保存到数据框中。

https://booking.snav.it/api/v1/rates/1030/2019-02-25/1042/2019-02-25?lang=1

数字变化如下:

from concurrent.futures import ThreadPoolExecutor, as_completed
from pandas import json_normalize
import pandas as pd
import requests


def download_file(url):
    url_info = requests.get(url, stream=True)
    jdata = url_info.json()
    return jdata


nums = [1030,1031,1040,1050,1020,1021,1010,1023]
urls= [f"https://booking.snav.it/api/v1/rates/{i}/2019-02-25/1042/2019-02-25?lang=1" for i in nums]
with ThreadPoolExecutor(max_workers=14) as executor:
     for url in urls:
         sleep(0.1)
         processes.append(executor.submit(download_file, url))

for index, task in enumerate(as_completed(processes)):
    jdata = task.result()
    tmp = json_normalize(jdata)
    tmp["num"] = nums[index]
df = df.append(tmp)
print(df.head())

在上面的代码中,我尝试使用多线程读取数据,并将每个 json 响应的相关编号作为 df 数据帧的新列。但是这段代码不起作用,因为使用多线程 nums 的数字顺序与抓取的 json 响应不同。我该怎么办?

试试这个:

from concurrent.futures import ThreadPoolExecutor

...

with ThreadPoolExecutor(max_workers=14) as executor:
     rv = executor.map(download_file, urls)

for index, jdata in enumerate(rv):
    tmp = json_normalize(jdata)
    tmp["num"] = nums[index]
    df.append(tmp)

print(df.head())