对包含字符串的 Pandas 数据透视表进行排序
Sorting a Pandas Pivot Containing Strings
我有一个 pandas.DataFrame
,其中包含数值、日期值和文本值。像这样:
Strike StrikeCell Expiration ExpirationCell CellContents
0 60.0 \n <div class="cell row-header strike itm" ... 2016-07-15 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0" m...
1 60.0 \n <div class="cell row-header strike itm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0" m...
2 60.0 \n <div class="cell row-header strike itm" ... 2018-01-19 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0"
13 70.0 \n <div class="cell row-header strike itm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
15 70.0 \n <div class="cell row-header strike itm" ... 2018-01-19 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
17 70.0 \n <div class="cell row-header strike itm" ... 2016-10-21 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
...
562 260.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="260.0" ...
564 270.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="270.0" ...
565 280.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="280.0" ...
我的意图是让 StrikeCell
在第一列下方(按升序),ExpirationCell
跨列(按升序)和 CellContents
作为值table 内。基本上,我正在创建一个带有 html 格式内容的大型枢轴 table。
我可以执行以下操作,效果很好:
df.pivot(index='Strike', columns='Expiration', values='CellContents')
Strike
排序正确,Expiration
排序正确。
但是,如果我尝试使用字符串内容 StrikeCell
和 ExpirationCell
如下:
df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
排序丢失。
所以问题是如何在使用 StrikeCell
作为 index
和 Expirationcell
作为 columns
.
使用 pandas 0.18.1
.
我相信这对你有用。
首先让我们确定 ExpirationCell
和 StrikeCell
的顺序。
StrikeCell_ordered = df[['Strike', 'StrikeCell']].sort_values(by='Strike')['StrikeCell']
ExpirationCell_ordered = df[['Expiration', 'ExpirationCell']].sort_values(by='Expiration')['ExpirationCell']
然后旋转并应用 reindex
:
pivoted_df = df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
result = pivoted_df.reindex(index=StrikeCell_ordered, columns=ExpirationCell_ordered)
我有一个 pandas.DataFrame
,其中包含数值、日期值和文本值。像这样:
Strike StrikeCell Expiration ExpirationCell CellContents
0 60.0 \n <div class="cell row-header strike itm" ... 2016-07-15 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0" m...
1 60.0 \n <div class="cell row-header strike itm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0" m...
2 60.0 \n <div class="cell row-header strike itm" ... 2018-01-19 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="60.0"
13 70.0 \n <div class="cell row-header strike itm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
15 70.0 \n <div class="cell row-header strike itm" ... 2018-01-19 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
17 70.0 \n <div class="cell row-header strike itm" ... 2016-10-21 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="70.0" m...
...
562 260.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="260.0" ...
564 270.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="270.0" ...
565 280.0 \n <div class="cell row-header strike otm" ... 2017-01-20 \n <div class="cell col-header expiration">... \n <div class="cell option" strike="280.0" ...
我的意图是让 StrikeCell
在第一列下方(按升序),ExpirationCell
跨列(按升序)和 CellContents
作为值table 内。基本上,我正在创建一个带有 html 格式内容的大型枢轴 table。
我可以执行以下操作,效果很好:
df.pivot(index='Strike', columns='Expiration', values='CellContents')
Strike
排序正确,Expiration
排序正确。
但是,如果我尝试使用字符串内容 StrikeCell
和 ExpirationCell
如下:
df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
排序丢失。
所以问题是如何在使用 StrikeCell
作为 index
和 Expirationcell
作为 columns
.
使用 pandas 0.18.1
.
我相信这对你有用。
首先让我们确定 ExpirationCell
和 StrikeCell
的顺序。
StrikeCell_ordered = df[['Strike', 'StrikeCell']].sort_values(by='Strike')['StrikeCell']
ExpirationCell_ordered = df[['Expiration', 'ExpirationCell']].sort_values(by='Expiration')['ExpirationCell']
然后旋转并应用 reindex
:
pivoted_df = df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
result = pivoted_df.reindex(index=StrikeCell_ordered, columns=ExpirationCell_ordered)