根据唯一 'Other' 列按月重复年份对数据框进行排序
Sorting a Data Frame by Month with Repeating Years, based on Unique 'Other' Column
在 pandas 中,我尝试按月对大型数据框的行进行排序。目前,月份是乱序的。它们按字母顺序排序,但我想按时间顺序排序。
棘手的部分是我对每一种产品按 21 个月的周期进行分类。有两个年份列,一个用于日历年,一个用于财政年,它们有意不同。 2021财年是2021年1月-2021年9月,2022财年是2021年10月-2022年9月。有数百种产品,下面的部分只是两种产品的示例。
如下面的 table 所示,月份顺序不对,但其他一切顺序正确。
同样,每个产品都有 21 个月,从 2021 年 1 月到 2022 年 9 月。我希望这些按顺序迭代每个产品。
我正在寻找以正确方式对该数据框进行排序的代码。
现在的样子(月份不是按年份顺序排列):
Item
Calendar Year
Fiscal Year
Month
Amount
Product 1
2021
2021
April
45
Product 1
2021
2021
August
85
Product 1
2021
2021
February
25
Product 1
2021
2021
January
15
Product 1
2021
2021
July
75
Product 1
2021
2021
June
65
Product 1
2021
2021
March
35
Product 1
2021
2021
May
55
Product 1
2021
2021
September
95
Product 1
2021
2022
December
125
Product 1
2021
2022
November
115
Product 1
2021
2022
October
105
Product 1
2022
2022
April
405
Product 1
2022
2022
August
805
Product 1
2022
2022
February
205
Product 1
2022
2022
January
1005
Product 1
2022
2022
July
705
Product 1
2022
2022
June
605
Product 1
2022
2022
March
305
Product 1
2022
2022
May
505
Product 1
2022
2022
September
905
Product 2
2021
2021
April
4000
Product 2
2021
2021
August
8000
Product 2
2021
2021
February
2000
Product 2
2021
2021
January
1000
Product 2
2021
2021
July
7000
Product 2
2021
2021
June
6000
Product 2
2021
2021
March
3000
Product 2
2021
2021
May
5000
Product 2
2021
2021
September
9000
Product 2
2021
2022
December
12000
Product 2
2021
2022
November
11000
Product 2
2021
2022
October
10000
Product 2
2022
2022
April
40000
Product 2
2022
2022
August
80000
Product 2
2022
2022
February
20000
Product 2
2022
2022
January
10000
Product 2
2022
2022
July
70000
Product 2
2022
2022
June
60000
Product 2
2022
2022
March
30000
Product 2
2022
2022
May
50000
Product 2
2022
2022
September
90000
它应该是什么样子(按月顺序):
Item
Calendar Year
Fiscal Year
Month
Amount
Product 1
2021
2021
January
15
Product 1
2021
2021
February
25
Product 1
2021
2021
March
35
Product 1
2021
2021
April
45
Product 1
2021
2021
May
55
Product 1
2021
2021
June
65
Product 1
2021
2021
July
75
Product 1
2021
2021
August
85
Product 1
2021
2021
September
95
Product 1
2021
2022
October
105
Product 1
2021
2022
November
115
Product 1
2021
2022
December
125
Product 1
2022
2022
January
1005
Product 1
2022
2022
February
205
Product 1
2022
2022
March
305
Product 1
2022
2022
April
405
Product 1
2022
2022
May
505
Product 1
2022
2022
June
605
Product 1
2022
2022
July
705
Product 1
2022
2022
August
805
Product 1
2022
2022
September
905
Product 2
2021
2021
January
1000
Product 2
2021
2021
February
2000
Product 2
2021
2021
March
3000
Product 2
2021
2021
April
4000
Product 2
2021
2021
May
5000
Product 2
2021
2021
June
6000
Product 2
2021
2021
July
7000
Product 2
2021
2021
August
8000
Product 2
2021
2021
September
9000
Product 2
2021
2022
October
10000
Product 2
2021
2022
November
11000
Product 2
2021
2022
December
12000
Product 2
2022
2022
January
10000
Product 2
2022
2022
February
20000
Product 2
2022
2022
March
30000
Product 2
2022
2022
April
40000
Product 2
2022
2022
May
50000
Product 2
2022
2022
June
60000
Product 2
2022
2022
July
70000
Product 2
2022
2022
August
80000
Product 2
2022
2022
September
90000
首先将值转换为 ordered categoricals, so possible sorting by multiple columns in DataFrame.sort_values
:
cat = ['January','February','March','April','May','June',
'July','August','September','October','November','December']
df['Month'] = pd.Categorical(df['Month'], ordered=True, categories=cat)
df = df.sort_values(['Item','Calendar Year','Month'])
或创建 DatetimeIndex
,因此可以按 Item
和日期时间排序:
df.index = pd.to_datetime(df['Calendar Year'] + df['Month'], format='%Y%B')
df = df.rename_axis('dt').sort_values(['Item','dt']).reset_index(drop=True)
在 pandas 中,我尝试按月对大型数据框的行进行排序。目前,月份是乱序的。它们按字母顺序排序,但我想按时间顺序排序。 棘手的部分是我对每一种产品按 21 个月的周期进行分类。有两个年份列,一个用于日历年,一个用于财政年,它们有意不同。 2021财年是2021年1月-2021年9月,2022财年是2021年10月-2022年9月。有数百种产品,下面的部分只是两种产品的示例。
如下面的 table 所示,月份顺序不对,但其他一切顺序正确。
同样,每个产品都有 21 个月,从 2021 年 1 月到 2022 年 9 月。我希望这些按顺序迭代每个产品。
我正在寻找以正确方式对该数据框进行排序的代码。
现在的样子(月份不是按年份顺序排列):
Item | Calendar Year | Fiscal Year | Month | Amount |
---|---|---|---|---|
Product 1 | 2021 | 2021 | April | 45 |
Product 1 | 2021 | 2021 | August | 85 |
Product 1 | 2021 | 2021 | February | 25 |
Product 1 | 2021 | 2021 | January | 15 |
Product 1 | 2021 | 2021 | July | 75 |
Product 1 | 2021 | 2021 | June | 65 |
Product 1 | 2021 | 2021 | March | 35 |
Product 1 | 2021 | 2021 | May | 55 |
Product 1 | 2021 | 2021 | September | 95 |
Product 1 | 2021 | 2022 | December | 125 |
Product 1 | 2021 | 2022 | November | 115 |
Product 1 | 2021 | 2022 | October | 105 |
Product 1 | 2022 | 2022 | April | 405 |
Product 1 | 2022 | 2022 | August | 805 |
Product 1 | 2022 | 2022 | February | 205 |
Product 1 | 2022 | 2022 | January | 1005 |
Product 1 | 2022 | 2022 | July | 705 |
Product 1 | 2022 | 2022 | June | 605 |
Product 1 | 2022 | 2022 | March | 305 |
Product 1 | 2022 | 2022 | May | 505 |
Product 1 | 2022 | 2022 | September | 905 |
Product 2 | 2021 | 2021 | April | 4000 |
Product 2 | 2021 | 2021 | August | 8000 |
Product 2 | 2021 | 2021 | February | 2000 |
Product 2 | 2021 | 2021 | January | 1000 |
Product 2 | 2021 | 2021 | July | 7000 |
Product 2 | 2021 | 2021 | June | 6000 |
Product 2 | 2021 | 2021 | March | 3000 |
Product 2 | 2021 | 2021 | May | 5000 |
Product 2 | 2021 | 2021 | September | 9000 |
Product 2 | 2021 | 2022 | December | 12000 |
Product 2 | 2021 | 2022 | November | 11000 |
Product 2 | 2021 | 2022 | October | 10000 |
Product 2 | 2022 | 2022 | April | 40000 |
Product 2 | 2022 | 2022 | August | 80000 |
Product 2 | 2022 | 2022 | February | 20000 |
Product 2 | 2022 | 2022 | January | 10000 |
Product 2 | 2022 | 2022 | July | 70000 |
Product 2 | 2022 | 2022 | June | 60000 |
Product 2 | 2022 | 2022 | March | 30000 |
Product 2 | 2022 | 2022 | May | 50000 |
Product 2 | 2022 | 2022 | September | 90000 |
它应该是什么样子(按月顺序):
Item | Calendar Year | Fiscal Year | Month | Amount |
---|---|---|---|---|
Product 1 | 2021 | 2021 | January | 15 |
Product 1 | 2021 | 2021 | February | 25 |
Product 1 | 2021 | 2021 | March | 35 |
Product 1 | 2021 | 2021 | April | 45 |
Product 1 | 2021 | 2021 | May | 55 |
Product 1 | 2021 | 2021 | June | 65 |
Product 1 | 2021 | 2021 | July | 75 |
Product 1 | 2021 | 2021 | August | 85 |
Product 1 | 2021 | 2021 | September | 95 |
Product 1 | 2021 | 2022 | October | 105 |
Product 1 | 2021 | 2022 | November | 115 |
Product 1 | 2021 | 2022 | December | 125 |
Product 1 | 2022 | 2022 | January | 1005 |
Product 1 | 2022 | 2022 | February | 205 |
Product 1 | 2022 | 2022 | March | 305 |
Product 1 | 2022 | 2022 | April | 405 |
Product 1 | 2022 | 2022 | May | 505 |
Product 1 | 2022 | 2022 | June | 605 |
Product 1 | 2022 | 2022 | July | 705 |
Product 1 | 2022 | 2022 | August | 805 |
Product 1 | 2022 | 2022 | September | 905 |
Product 2 | 2021 | 2021 | January | 1000 |
Product 2 | 2021 | 2021 | February | 2000 |
Product 2 | 2021 | 2021 | March | 3000 |
Product 2 | 2021 | 2021 | April | 4000 |
Product 2 | 2021 | 2021 | May | 5000 |
Product 2 | 2021 | 2021 | June | 6000 |
Product 2 | 2021 | 2021 | July | 7000 |
Product 2 | 2021 | 2021 | August | 8000 |
Product 2 | 2021 | 2021 | September | 9000 |
Product 2 | 2021 | 2022 | October | 10000 |
Product 2 | 2021 | 2022 | November | 11000 |
Product 2 | 2021 | 2022 | December | 12000 |
Product 2 | 2022 | 2022 | January | 10000 |
Product 2 | 2022 | 2022 | February | 20000 |
Product 2 | 2022 | 2022 | March | 30000 |
Product 2 | 2022 | 2022 | April | 40000 |
Product 2 | 2022 | 2022 | May | 50000 |
Product 2 | 2022 | 2022 | June | 60000 |
Product 2 | 2022 | 2022 | July | 70000 |
Product 2 | 2022 | 2022 | August | 80000 |
Product 2 | 2022 | 2022 | September | 90000 |
首先将值转换为 ordered categoricals, so possible sorting by multiple columns in DataFrame.sort_values
:
cat = ['January','February','March','April','May','June',
'July','August','September','October','November','December']
df['Month'] = pd.Categorical(df['Month'], ordered=True, categories=cat)
df = df.sort_values(['Item','Calendar Year','Month'])
或创建 DatetimeIndex
,因此可以按 Item
和日期时间排序:
df.index = pd.to_datetime(df['Calendar Year'] + df['Month'], format='%Y%B')
df = df.rename_axis('dt').sort_values(['Item','dt']).reset_index(drop=True)