Python for循环需要几个小时,如何减少执行时间?
Python for loop takes over hours, how to reduce execution time?
我正在使用 for 循环从 yfinance 获取特定的财务数据(大约 800 组)。
但是这次执行的运行时间是一个多小时!
这只是我整个项目的一小部分。
如何减少执行时间?
for loop code
==========================================
你有多少列?我不会为每次迭代创建数据框,而是使用字典来存储列值,将字典附加到列表中。在循环之后,从字典列表中创建一个数据框。
d = []
for i in tqdm(symbol['Symbol']:
dict_store = {}
try:
dict_store['col_1'] = value1
dict_store['col_2'] = value2
dict_store['col_3'] = value3
except:
dict_store['col_1'] = ''
dict_store['col_2'] = ''
dict_store['col_3'] = ''
d.append(dict_store)
如果列表是d,持有多个字典,那么:
df = pd.DataFrame(d)
yfinance
似乎没有并行下载机构持有人的本地方式,因此我们可以使用 pandarellel
~
并行化版本:
import yfinance as yf
import pandas as pd
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=16, progress_bar=True)
t = yf.Tickers([x for x in tqdm(symbol['Symbol'])])
def get_holders(x):
try:
return x.institutional_holders.head()
except:
pass
call = pd.Series(t.tickers).parallel_apply(get_holders)
df = pd.concat(call.to_dict())
df
(文本输出同下)
在这里,在 4 分钟内下载整个标准普尔 500 指数:
import yfinance as yf
# symbols = [x for x in tqdm(symbol['Symbol'])]
symbols = ['AAPL', 'MSFT'] # Testing only, Swap for yours
tickers = yf.Tickers(' '.join(symbols))
institutional_holders = {x:tickers.tickers[x].institutional_holders.head() for x in symbols}
df = pd.concat(institutional_holders)
print(df)
输出:
Holder Shares Date Reported % Out Value
AAPL 0 Vanguard Group, Inc. (The) 1261261357 2021-12-30 0.0779 223962179162
1 Blackrock Inc. 1019810291 2021-12-30 0.0630 181087713372
2 Berkshire Hathaway, Inc 887135554 2021-12-30 0.0548 157528660323
3 State Street Corporation 633115246 2021-12-30 0.0391 112422274232
4 FMR, LLC 352204129 2021-12-30 0.0218 62540887186
MSFT 0 Vanguard Group, Inc. (The) 615950062 2021-12-30 0.0824 207156324851
1 Blackrock Inc. 519035634 2021-12-30 0.0694 174562064426
2 State Street Corporation 302541869 2021-12-30 0.0405 101750881382
3 FMR, LLC 215377233 2021-12-30 0.0288 72435671002
4 Price (T.Rowe) Associates Inc 204196901 2021-12-30 0.0273 68675501744
我正在使用 for 循环从 yfinance 获取特定的财务数据(大约 800 组)。 但是这次执行的运行时间是一个多小时! 这只是我整个项目的一小部分。 如何减少执行时间?
for loop code
==========================================
你有多少列?我不会为每次迭代创建数据框,而是使用字典来存储列值,将字典附加到列表中。在循环之后,从字典列表中创建一个数据框。
d = []
for i in tqdm(symbol['Symbol']:
dict_store = {}
try:
dict_store['col_1'] = value1
dict_store['col_2'] = value2
dict_store['col_3'] = value3
except:
dict_store['col_1'] = ''
dict_store['col_2'] = ''
dict_store['col_3'] = ''
d.append(dict_store)
如果列表是d,持有多个字典,那么:
df = pd.DataFrame(d)
yfinance
似乎没有并行下载机构持有人的本地方式,因此我们可以使用 pandarellel
~
并行化版本:
import yfinance as yf
import pandas as pd
from pandarallel import pandarallel
pandarallel.initialize(nb_workers=16, progress_bar=True)
t = yf.Tickers([x for x in tqdm(symbol['Symbol'])])
def get_holders(x):
try:
return x.institutional_holders.head()
except:
pass
call = pd.Series(t.tickers).parallel_apply(get_holders)
df = pd.concat(call.to_dict())
df
(文本输出同下)
在这里,在 4 分钟内下载整个标准普尔 500 指数:
import yfinance as yf
# symbols = [x for x in tqdm(symbol['Symbol'])]
symbols = ['AAPL', 'MSFT'] # Testing only, Swap for yours
tickers = yf.Tickers(' '.join(symbols))
institutional_holders = {x:tickers.tickers[x].institutional_holders.head() for x in symbols}
df = pd.concat(institutional_holders)
print(df)
输出:
Holder Shares Date Reported % Out Value
AAPL 0 Vanguard Group, Inc. (The) 1261261357 2021-12-30 0.0779 223962179162
1 Blackrock Inc. 1019810291 2021-12-30 0.0630 181087713372
2 Berkshire Hathaway, Inc 887135554 2021-12-30 0.0548 157528660323
3 State Street Corporation 633115246 2021-12-30 0.0391 112422274232
4 FMR, LLC 352204129 2021-12-30 0.0218 62540887186
MSFT 0 Vanguard Group, Inc. (The) 615950062 2021-12-30 0.0824 207156324851
1 Blackrock Inc. 519035634 2021-12-30 0.0694 174562064426
2 State Street Corporation 302541869 2021-12-30 0.0405 101750881382
3 FMR, LLC 215377233 2021-12-30 0.0288 72435671002
4 Price (T.Rowe) Associates Inc 204196901 2021-12-30 0.0273 68675501744