Pandas 将 groupby 参数转换为单独的行
Pandas convert groupby parameter to separate rows
我在 pandas 数据框中有一些股票数据,这些数据按代码分组。我想修改 table 格式,以便每个日期和每个代码都存在一行。
import yfinance as yf
def get_all_data(tickers, start_date="2009-01-01", end_date="2020-09-25"):
df_orig = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
df_orig = df_orig.fillna(method='ffill')
return df_orig
sec_list = ['ULVR.L', 'MSFT', 'ABF.L']
df_orig = get_all_data(sec_list, start_date="2020-09-21", end_date="2020-09-25")
display(df_orig)
我目前得到这个 ,但我想要以下格式的数据(即 7 columns/12rows,而不是 15 列/4 行)
我应该怎么做?
| Date | Ticker | Open |
| ---------- | ------ | ------ |
| 2020-09-21 | MSFT | 197.19 |
| 2020-09-21 | ABF.L | 1903.5 |
| 2020-09-21 | ULVR.L | 4706 |
| 2020-09-22 | MSFT | 205.06 |
| 2020-09-22 | ABF.L | 1855 |
| 2020-09-22 | ULVR.L | 4671 |
| 2020-09-23 | MSFT | 207.9 |
| 2020-09-23 | ABF.L | 1870.5 |
| 2020-09-23 | ULVR.L | 4766 |
| 2020-09-24 | MSFT | 199.85 |
| 2020-09-24 | ABF.L | 1847 |
| 2020-09-24 | ULVR.L | 4743 |
您可以为此使用堆栈,命名 column/index maxes 也更容易:
In [26]: df
Out[26]:
a b
open close open close
2020-10-10 1 2 3 4
2020-10-10 5 6 7 8
2020-10-10 1 2 3 4
2020-10-10 6 7 8 9
In [27]: df.columns.names = ["ticker", "metric"]
In [28]: df.index.name = "date"
In [29]: df.stack("ticker")
Out[29]:
metric close open
date ticker
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8
或者,如果您不关心命名,只需使用带有 int 的堆栈即可:
In [46]: df
Out[46]:
a b
open close open close
2020-10-10 1 2 3 4
2020-10-10 5 6 7 8
2020-10-10 1 2 3 4
2020-10-10 6 7 8 9
In [47]: df.stack(0)
Out[47]:
close open
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8
# to set index names:
In [56]: gf = df.stack(0)
In [57]: gf.index = gf.index.set_names(["date", "ticker"])
In [58]: gf
Out[58]:
close open
date ticker
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8
我在 pandas 数据框中有一些股票数据,这些数据按代码分组。我想修改 table 格式,以便每个日期和每个代码都存在一行。
import yfinance as yf
def get_all_data(tickers, start_date="2009-01-01", end_date="2020-09-25"):
df_orig = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
df_orig = df_orig.fillna(method='ffill')
return df_orig
sec_list = ['ULVR.L', 'MSFT', 'ABF.L']
df_orig = get_all_data(sec_list, start_date="2020-09-21", end_date="2020-09-25")
display(df_orig)
我目前得到这个
我应该怎么做?
| Date | Ticker | Open |
| ---------- | ------ | ------ |
| 2020-09-21 | MSFT | 197.19 |
| 2020-09-21 | ABF.L | 1903.5 |
| 2020-09-21 | ULVR.L | 4706 |
| 2020-09-22 | MSFT | 205.06 |
| 2020-09-22 | ABF.L | 1855 |
| 2020-09-22 | ULVR.L | 4671 |
| 2020-09-23 | MSFT | 207.9 |
| 2020-09-23 | ABF.L | 1870.5 |
| 2020-09-23 | ULVR.L | 4766 |
| 2020-09-24 | MSFT | 199.85 |
| 2020-09-24 | ABF.L | 1847 |
| 2020-09-24 | ULVR.L | 4743 |
您可以为此使用堆栈,命名 column/index maxes 也更容易:
In [26]: df
Out[26]:
a b
open close open close
2020-10-10 1 2 3 4
2020-10-10 5 6 7 8
2020-10-10 1 2 3 4
2020-10-10 6 7 8 9
In [27]: df.columns.names = ["ticker", "metric"]
In [28]: df.index.name = "date"
In [29]: df.stack("ticker")
Out[29]:
metric close open
date ticker
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8
或者,如果您不关心命名,只需使用带有 int 的堆栈即可:
In [46]: df
Out[46]:
a b
open close open close
2020-10-10 1 2 3 4
2020-10-10 5 6 7 8
2020-10-10 1 2 3 4
2020-10-10 6 7 8 9
In [47]: df.stack(0)
Out[47]:
close open
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8
# to set index names:
In [56]: gf = df.stack(0)
In [57]: gf.index = gf.index.set_names(["date", "ticker"])
In [58]: gf
Out[58]:
close open
date ticker
2020-10-10 a 2 1
b 4 3
a 6 5
b 8 7
a 2 1
b 4 3
a 7 6
b 9 8