在 Python 中下载 S&P 500 公司的股票数据及其 GICS 标识符
Download S&P 500 firms' stock data AND their GICS identifier in Python
我正在尝试使用 yfinance
从标准普尔 500 家公司下载财务数据。但是,我还想包括每个公司的 GICS 部门代码,以便根据 GICS 代码将数据划分为更小的数据集。这是我的尝试:
import pandas as pd
import yfinance as yf
import datetime
payload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
df = first_table
df.head()
tickers_symbols=df['Symbol'].values.tolist()
GICS_sectors = df['GICS Sector'].values.tolist()
GICS=pd.DataFrame(GICS_sectors)
data = yf.download(tickers_symbols, period='1mo')
data['GICS']=GICS
print(data.head)
data.to_csv('stock_prices.csv')
但是,我注意到,当我将输出保存在 CSV 文件中时,除了给我一个空列表外,它还会生成一个列向量,它与股票数据框不兼容,股票数据框以天为行,公司为列。关于如何解决这个问题的任何想法?也欢迎包含其他软件包的解决方案。
虽然不如获取所有股票那么快,但在空数据框中获取每只股票的值并添加股票名称和行业类别是一种易于完成的格式(垂直格式)以供将来分析。
import pandas as pd
import yfinance as yf
import datetime
payload = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = pd.DataFrame(payload[0])
tickers_symbols = df['Symbol'].values.tolist()
GICS_sectors = df['GICS Sector'].values.tolist()
data = pd.DataFrame()
for t,s in zip(tickers_symbols, GICS_sectors):
tmp = yf.download(t, period='1mo', progress=False)
tmp.reset_index(inplace=True)
tmp['Ticker'] = t
tmp['GICS'] = s
data = data.append(tmp, ignore_index=True)
data.to_csv('stock_prices.csv', sep=',')
Date Open High Low Close Adj Close Volume Ticker GICS
0 2021-08-09 197.649994 198.149994 196.779999 197.429993 195.934311 1193300.0 MMM Industrials
1 2021-08-10 198.240005 199.490005 197.699997 199.250000 197.740524 1598400.0 MMM Industrials
2 2021-08-11 200.000000 201.770004 199.309998 201.570007 200.042969 2217400.0 MMM Industrials
3 2021-08-12 201.479996 202.369995 200.360001 201.429993 199.904007 1231800.0 MMM Industrials
4 2021-08-13 201.229996 201.710007 200.289993 200.580002 199.060455 1910700.0 MMM Industrials
我正在尝试使用 yfinance
从标准普尔 500 家公司下载财务数据。但是,我还想包括每个公司的 GICS 部门代码,以便根据 GICS 代码将数据划分为更小的数据集。这是我的尝试:
import pandas as pd
import yfinance as yf
import datetime
payload=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
first_table = payload[0]
df = first_table
df.head()
tickers_symbols=df['Symbol'].values.tolist()
GICS_sectors = df['GICS Sector'].values.tolist()
GICS=pd.DataFrame(GICS_sectors)
data = yf.download(tickers_symbols, period='1mo')
data['GICS']=GICS
print(data.head)
data.to_csv('stock_prices.csv')
但是,我注意到,当我将输出保存在 CSV 文件中时,除了给我一个空列表外,它还会生成一个列向量,它与股票数据框不兼容,股票数据框以天为行,公司为列。关于如何解决这个问题的任何想法?也欢迎包含其他软件包的解决方案。
虽然不如获取所有股票那么快,但在空数据框中获取每只股票的值并添加股票名称和行业类别是一种易于完成的格式(垂直格式)以供将来分析。
import pandas as pd
import yfinance as yf
import datetime
payload = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = pd.DataFrame(payload[0])
tickers_symbols = df['Symbol'].values.tolist()
GICS_sectors = df['GICS Sector'].values.tolist()
data = pd.DataFrame()
for t,s in zip(tickers_symbols, GICS_sectors):
tmp = yf.download(t, period='1mo', progress=False)
tmp.reset_index(inplace=True)
tmp['Ticker'] = t
tmp['GICS'] = s
data = data.append(tmp, ignore_index=True)
data.to_csv('stock_prices.csv', sep=',')
Date Open High Low Close Adj Close Volume Ticker GICS
0 2021-08-09 197.649994 198.149994 196.779999 197.429993 195.934311 1193300.0 MMM Industrials
1 2021-08-10 198.240005 199.490005 197.699997 199.250000 197.740524 1598400.0 MMM Industrials
2 2021-08-11 200.000000 201.770004 199.309998 201.570007 200.042969 2217400.0 MMM Industrials
3 2021-08-12 201.479996 202.369995 200.360001 201.429993 199.904007 1231800.0 MMM Industrials
4 2021-08-13 201.229996 201.710007 200.289993 200.580002 199.060455 1910700.0 MMM Industrials