如何从列名字符串创建数据帧索引?奖励问题:然后获取传播数据框
how to make dataframe index from column name string? bonus question: then to get the spread dataframe
我有一个包含 1300 行和 400 列的数据框,如下所示
AOC.2017Jan AOC.2017Feb ... ZTP.2021Oct ZTP.2021Nov
VALUE_TIME ...
2016-07-07 NaN NaN ... NaN NaN
... ... ... ... ...
2021-10-14 NaN NaN ... NaN 101.1000
2021-10-15 NaN NaN ... NaN 88.6250
2021-10-18 NaN NaN ... NaN 90.1375
2021-10-19 NaN NaN ... NaN 91.1125
2021-10-20 NaN NaN ... NaN 93.5500
我想检索3个字母和日期作为索引(长格式),最Pythonic/Pandas的方法是什么?
这大致是预期的效果,前 2 个日期之间的行未显示
VALUE_TIME Group Date Value
2016-07-07 AOC 2017Jan NaN
AOC 2017Feb NaN
ZTP 2021Oct NaN
ZTP 2021Nov NaN
2021-10-14 ZTP 2021Nov 101.1000
2021-10-15 ZTP 2021Nov 88.6250
2021-10-18 ZTP 2021Nov 90.1375
2021-10-19 ZTP 2021Nov 91.1125
2021-10-20 ZTP 2021Nov 93.5500
顺便说一句,我的最终目标是拥有一个数据框,显示具有相同 value_time 和日期
的任何产品(AOC、ZTP 等)对之间的所有可能价差
最终目标应该是这样的
VALUE_TIME Spread Date Value
2016-07-07 AOC-BBC 2017Jan xxx
AOC-BBC 2017Feb xxx
AOC-ZTP 2017Jan xxx
AOC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
2016-07-08 AOC-BBC 2017Jan xxx
AOC-BBC 2017Feb xxx
AOC-ZTP 2017Jan xxx
AOC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
回答你的第一个问题:
df.columns=pd.MultiIndex.from_tuples([x.split('.') for x in df], names=['Group','Date'])
df.stack(level=[0,1], dropna=False).to_frame(name='Value')
你会得到这样的东西:
Value
VALUE_TIME Group Date
2016-07-07 AOC 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
ZTP 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
2021-10-14 AOC 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
ZTP 2017Feb NaN
2017Jan NaN
2021Nov 101.1
2021Oct NaN
但是你的另一个问题不是很清楚。
我有一个包含 1300 行和 400 列的数据框,如下所示
AOC.2017Jan AOC.2017Feb ... ZTP.2021Oct ZTP.2021Nov
VALUE_TIME ...
2016-07-07 NaN NaN ... NaN NaN
... ... ... ... ...
2021-10-14 NaN NaN ... NaN 101.1000
2021-10-15 NaN NaN ... NaN 88.6250
2021-10-18 NaN NaN ... NaN 90.1375
2021-10-19 NaN NaN ... NaN 91.1125
2021-10-20 NaN NaN ... NaN 93.5500
我想检索3个字母和日期作为索引(长格式),最Pythonic/Pandas的方法是什么?
这大致是预期的效果,前 2 个日期之间的行未显示
VALUE_TIME Group Date Value
2016-07-07 AOC 2017Jan NaN
AOC 2017Feb NaN
ZTP 2021Oct NaN
ZTP 2021Nov NaN
2021-10-14 ZTP 2021Nov 101.1000
2021-10-15 ZTP 2021Nov 88.6250
2021-10-18 ZTP 2021Nov 90.1375
2021-10-19 ZTP 2021Nov 91.1125
2021-10-20 ZTP 2021Nov 93.5500
顺便说一句,我的最终目标是拥有一个数据框,显示具有相同 value_time 和日期
的任何产品(AOC、ZTP 等)对之间的所有可能价差最终目标应该是这样的
VALUE_TIME Spread Date Value
2016-07-07 AOC-BBC 2017Jan xxx
AOC-BBC 2017Feb xxx
AOC-ZTP 2017Jan xxx
AOC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
2016-07-08 AOC-BBC 2017Jan xxx
AOC-BBC 2017Feb xxx
AOC-ZTP 2017Jan xxx
AOC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
BBC-ZTP 2017Feb NaN
回答你的第一个问题:
df.columns=pd.MultiIndex.from_tuples([x.split('.') for x in df], names=['Group','Date'])
df.stack(level=[0,1], dropna=False).to_frame(name='Value')
你会得到这样的东西:
Value
VALUE_TIME Group Date
2016-07-07 AOC 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
ZTP 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
2021-10-14 AOC 2017Feb NaN
2017Jan NaN
2021Nov NaN
2021Oct NaN
ZTP 2017Feb NaN
2017Jan NaN
2021Nov 101.1
2021Oct NaN
但是你的另一个问题不是很清楚。