Python for循环需要几个小时，如何减少执行时间？

Question

我正在使用 for 循环从 yfinance 获取特定的财务数据（大约 800 组）。但是这次执行的运行时间是一个多小时！这只是我整个项目的一小部分。如何减少执行时间？

for loop code

==========================================

Answer 1

你有多少列？我不会为每次迭代创建数据框，而是使用字典来存储列值，将字典附加到列表中。在循环之后，从字典列表中创建一个数据框。

d = []

for i in tqdm(symbol['Symbol']:
    dict_store = {}
    try:
        dict_store['col_1'] = value1
        dict_store['col_2'] = value2
        dict_store['col_3'] = value3
    except:
        dict_store['col_1'] = ''
        dict_store['col_2'] = ''
        dict_store['col_3'] = ''

    d.append(dict_store)

如果列表是d，持有多个字典，那么：

df = pd.DataFrame(d)

Answer 2

yfinance 似乎没有并行下载机构持有人的本地方式，因此我们可以使用 pandarellel~

并行化版本：

import yfinance as yf
import pandas as pd
from pandarallel import pandarallel

pandarallel.initialize(nb_workers=16, progress_bar=True)

t = yf.Tickers([x for x in tqdm(symbol['Symbol'])])
def get_holders(x):
    try:
        return x.institutional_holders.head()
    except:
        pass


call = pd.Series(t.tickers).parallel_apply(get_holders)
df = pd.concat(call.to_dict())
df

(文本输出同下)

在这里，在 4 分钟内下载整个标准普尔 500 指数：

import yfinance as yf

# symbols = [x for x in tqdm(symbol['Symbol'])]

symbols = ['AAPL', 'MSFT'] # Testing only, Swap for yours

tickers = yf.Tickers(' '.join(symbols))

institutional_holders = {x:tickers.tickers[x].institutional_holders.head() for x in symbols}

df = pd.concat(institutional_holders)

print(df)

输出：

                               Holder      Shares Date Reported   % Out         Value
AAPL 0     Vanguard Group, Inc. (The)  1261261357    2021-12-30  0.0779  223962179162
     1                 Blackrock Inc.  1019810291    2021-12-30  0.0630  181087713372
     2        Berkshire Hathaway, Inc   887135554    2021-12-30  0.0548  157528660323
     3       State Street Corporation   633115246    2021-12-30  0.0391  112422274232
     4                       FMR, LLC   352204129    2021-12-30  0.0218   62540887186
MSFT 0     Vanguard Group, Inc. (The)   615950062    2021-12-30  0.0824  207156324851
     1                 Blackrock Inc.   519035634    2021-12-30  0.0694  174562064426
     2       State Street Corporation   302541869    2021-12-30  0.0405  101750881382
     3                       FMR, LLC   215377233    2021-12-30  0.0288   72435671002
     4  Price (T.Rowe) Associates Inc   204196901    2021-12-30  0.0273   68675501744

Python for循环需要几个小时，如何减少执行时间？

Python for loop takes over hours, how to reduce execution time?

python

time

for-loop

execution