根据唯一 'Other' 列按月重复年份对数据框进行排序

Sorting a Data Frame by Month with Repeating Years, based on Unique 'Other' Column

在 pandas 中,我尝试按月对大型数据框的行进行排序。目前,月份是乱序的。它们按字母顺序排序,但我想按时间顺序排序。 棘手的部分是我对每一种产品按 21 个月的周期进行分类。有两个年份列,一个用于日历年,一个用于财政年,它们有意不同。 2021财年是2021年1月-2021年9月,2022财年是2021年10月-2022年9月。有数百种产品,下面的部分只是两种产品的示例。

如下面的 table 所示,月份顺序不对,但其他一切顺序正确。

同样,每个产品都有 21 个月,从 2021 年 1 月到 2022 年 9 月。我希望这些按顺序迭代每个产品。

我正在寻找以正确方式对该数据框进行排序的代码。

现在的样子(月份不是按年份顺序排列):

Item Calendar Year Fiscal Year Month Amount
Product 1 2021 2021 April 45
Product 1 2021 2021 August 85
Product 1 2021 2021 February 25
Product 1 2021 2021 January 15
Product 1 2021 2021 July 75
Product 1 2021 2021 June 65
Product 1 2021 2021 March 35
Product 1 2021 2021 May 55
Product 1 2021 2021 September 95
Product 1 2021 2022 December 125
Product 1 2021 2022 November 115
Product 1 2021 2022 October 105
Product 1 2022 2022 April 405
Product 1 2022 2022 August 805
Product 1 2022 2022 February 205
Product 1 2022 2022 January 1005
Product 1 2022 2022 July 705
Product 1 2022 2022 June 605
Product 1 2022 2022 March 305
Product 1 2022 2022 May 505
Product 1 2022 2022 September 905
Product 2 2021 2021 April 4000
Product 2 2021 2021 August 8000
Product 2 2021 2021 February 2000
Product 2 2021 2021 January 1000
Product 2 2021 2021 July 7000
Product 2 2021 2021 June 6000
Product 2 2021 2021 March 3000
Product 2 2021 2021 May 5000
Product 2 2021 2021 September 9000
Product 2 2021 2022 December 12000
Product 2 2021 2022 November 11000
Product 2 2021 2022 October 10000
Product 2 2022 2022 April 40000
Product 2 2022 2022 August 80000
Product 2 2022 2022 February 20000
Product 2 2022 2022 January 10000
Product 2 2022 2022 July 70000
Product 2 2022 2022 June 60000
Product 2 2022 2022 March 30000
Product 2 2022 2022 May 50000
Product 2 2022 2022 September 90000

它应该是什么样子(按月顺序):

Item Calendar Year Fiscal Year Month Amount
Product 1 2021 2021 January 15
Product 1 2021 2021 February 25
Product 1 2021 2021 March 35
Product 1 2021 2021 April 45
Product 1 2021 2021 May 55
Product 1 2021 2021 June 65
Product 1 2021 2021 July 75
Product 1 2021 2021 August 85
Product 1 2021 2021 September 95
Product 1 2021 2022 October 105
Product 1 2021 2022 November 115
Product 1 2021 2022 December 125
Product 1 2022 2022 January 1005
Product 1 2022 2022 February 205
Product 1 2022 2022 March 305
Product 1 2022 2022 April 405
Product 1 2022 2022 May 505
Product 1 2022 2022 June 605
Product 1 2022 2022 July 705
Product 1 2022 2022 August 805
Product 1 2022 2022 September 905
Product 2 2021 2021 January 1000
Product 2 2021 2021 February 2000
Product 2 2021 2021 March 3000
Product 2 2021 2021 April 4000
Product 2 2021 2021 May 5000
Product 2 2021 2021 June 6000
Product 2 2021 2021 July 7000
Product 2 2021 2021 August 8000
Product 2 2021 2021 September 9000
Product 2 2021 2022 October 10000
Product 2 2021 2022 November 11000
Product 2 2021 2022 December 12000
Product 2 2022 2022 January 10000
Product 2 2022 2022 February 20000
Product 2 2022 2022 March 30000
Product 2 2022 2022 April 40000
Product 2 2022 2022 May 50000
Product 2 2022 2022 June 60000
Product 2 2022 2022 July 70000
Product 2 2022 2022 August 80000
Product 2 2022 2022 September 90000

首先将值转换为 ordered categoricals, so possible sorting by multiple columns in DataFrame.sort_values:

cat = ['January','February','March','April','May','June',
       'July','August','September','October','November','December']
df['Month'] = pd.Categorical(df['Month'], ordered=True, categories=cat)
df = df.sort_values(['Item','Calendar Year','Month'])

或创建 DatetimeIndex,因此可以按 Item 和日期时间排序:

df.index = pd.to_datetime(df['Calendar Year'] + df['Month'], format='%Y%B')
df = df.rename_axis('dt').sort_values(['Item','dt']).reset_index(drop=True)