对包含字符串的 Pandas 数据透视表进行排序

Sorting a Pandas Pivot Containing Strings

我有一个 pandas.DataFrame,其中包含数值、日期值和文本值。像这样:

    Strike  StrikeCell                                      Expiration  ExpirationCell                                  CellContents
0   60.0    \n <div class="cell row-header strike itm" ...  2016-07-15  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" m...
1   60.0    \n <div class="cell row-header strike itm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" m...
2   60.0    \n <div class="cell row-header strike itm" ...  2018-01-19  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="60.0" 
13  70.0    \n <div class="cell row-header strike itm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
15  70.0    \n <div class="cell row-header strike itm" ...  2018-01-19  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
17  70.0    \n <div class="cell row-header strike itm" ...  2016-10-21  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="70.0" m...
...
562 260.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="260.0" ...
564 270.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="270.0" ...
565 280.0   \n <div class="cell row-header strike otm" ...  2017-01-20  \n <div class="cell col-header expiration">...  \n <div class="cell option" strike="280.0" ...

我的意图是让 StrikeCell 在第一列下方(按升序),ExpirationCell 跨列(按升序)和 CellContents 作为值table 内。基本上,我正在创建一个带有 html 格式内容的大型枢轴 table。

我可以执行以下操作,效果很好:

df.pivot(index='Strike', columns='Expiration', values='CellContents')

Strike 排序正确,Expiration 排序正确。

但是,如果我尝试使用字符串内容 StrikeCellExpirationCell 如下:

df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')

排序丢失。

所以问题是如何在使用 StrikeCell 作为 indexExpirationcell 作为 columns.

使用 pandas 0.18.1.

我相信这对你有用。

首先让我们确定 ExpirationCellStrikeCell 的顺序。

StrikeCell_ordered = df[['Strike', 'StrikeCell']].sort_values(by='Strike')['StrikeCell']
ExpirationCell_ordered = df[['Expiration', 'ExpirationCell']].sort_values(by='Expiration')['ExpirationCell']

然后旋转并应用 reindex:

pivoted_df = df.pivot(index='StrikeCell', columns='ExpirationCell', values='CellContents')
result = pivoted_df.reindex(index=StrikeCell_ordered, columns=ExpirationCell_ordered)