如何在数据透视后 python 中对数据框中除索引列以外的列进行排序

How to sort columns except index column in a data frame in python after pivot

所以我有一个数据框

testdf = pd.DataFrame({"loc" : ["ab12","bc12","cd12","ab12","bc13","cd12"], "months" : 
         ["Jun21","Jun21","July21","July21","Aug21","Aug21"], "dept" : 
         ["dep1","dep2","dep3","dep2","dep1","dep3"], "count": [15, 16, 15, 92, 90, 2]})

看起来像这样:

当我转动它时,

df =  pd.pivot_table(testdf, values = ['count'], index = ['loc','dept'], columns = ['months'], aggfunc=np.sum).reset_index()
df.columns = df.columns.droplevel(0)
df

看起来像这样:

我正在寻找一个排序函数,它将仅按顺序对月份列进行排序,而不是前 2 列,即位置和部门。

当我尝试这个时:

df.sort_values(by = ['Jun21'],ascending = False, inplace = True, axis = 1, ignore_index=True)[2:]

它给我错误。

我希望列按 Jun21、Jul21、Aug21 的顺序排列

我正在寻找能让它动态的东西,我不需要在月份更改时手动更改顺序。

任何提示将不胜感激。

我们可以像这样转换 datetime 中的 months 列:

>>> testdf.months = (pd.to_datetime(testdf.months, format="%b%y", errors='coerce'))
>>> testdf
    loc     months      dept    count
0   ab12    2021-06-01  dep1    15
1   bc12    2021-06-01  dep2    16
2   cd12    2021-07-01  dep3    15
3   ab12    2021-07-01  dep2    92
4   bc13    2021-08-01  dep1    90
5   cd12    2021-08-01  dep3    2

然后,我们应用您的代码来获取 pivot :

>>> df =  pd.pivot_table(testdf, values = ['count'], index = ['loc','dept'], columns = ['months'], aggfunc=np.sum).reset_index()
>>> df.columns = df.columns.droplevel(0)
>>> df
months  NaT     NaT     2021-06-01  2021-07-01  2021-08-01
0       ab12    dep1    15.0        NaN         NaN
1       ab12    dep2    NaN         92.0        NaN
2       bc12    dep2    16.0        NaN         NaN
3       bc13    dep1    NaN         NaN         90.0
4       cd12    dep3    NaN         15.0        2.0

最后,我们可以使用 strftime 重新格式化列名以获得预期结果:

>>> df.columns = df.columns.map(lambda t: t.strftime('%b%y') if pd.notnull(t) else '')
>>> df
months                  Jun21   Jul21   Aug21
0       ab12    dep1    15.0    NaN     NaN
1       ab12    dep2    NaN     92.0    NaN
2       bc12    dep2    16.0    NaN     NaN
3       bc13    dep1    NaN     NaN     90.0
4       cd12    dep3    NaN     15.0    2.0

使用groupby就很简单了

df = testdf.groupby(['loc', 'dept', 'months']).sum().unstack(level=2)
df = df.reindex(['Jun21', 'July21', 'Aug21'], axis=1, level=1)

输出

          count             
months    Jun21 July21 Aug21
loc  dept                   
ab12 dep1  15.0    NaN   NaN
     dep2   NaN   92.0   NaN
bc12 dep2  16.0    NaN   NaN
bc13 dep1   NaN    NaN  90.0
cd12 dep3   NaN   15.0   2.0