如何对多索引数据帧的第一个索引执行类似于 group by 的操作
how to perform an operation similar to group by on the first index of a multi indexed dataframe
生成示例数据框的代码如下
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)
输出数据框应如下所示:
fruits_new1=pd.DataFrame()
fruits_new1['month']=['jan','jan','feb','feb','march','march','march','apr','apr','apr','jun','jun']
fruits_new1['fruit']=['apple','apple','orange','pear','orange','orange','cherry','pear','cherry','cherry','pear','apple']
fruits_new1['price']=[30,30,20,40,25,25,55,45,60,60,45,37]
ind1=fruits_new1.index
fruits_grp1 = fruits_new1.set_index(['month', ind1],drop=False)
fruits_grp1
谢谢
使用:
d={'Jan': 0, 'Feb': 1, 'Mar': 2, 'Apr': 3, 'May': 4, 'Jun': 5, 'Jul': 6, 'Aug': 7, 'Sep': 8, 'Oct': 9, 'Nov': 10, 'Dec': 11}
idx=fruits_grp['month'].str.title().str[:3].map(d).sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])
更新:
示例数据框:
fruits=pd.DataFrame()
fruits['month']=[1,2,2,3,1,4,4,6,3,3,6,4]
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)
然后只需简单地使用:
idx=fruits_grp['month'].sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])
fruits=pd.DataFrame()
fruits['month']=
['jan','feb','feb','mar','jan','apr','apr','jun','mar','mar','jun','apr']
fruits['fruit']=
['apple','orange','pear','orange','apple','pear','cherry','pear','orange',
'cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits["month"] = fruits["month"].str.capitalize()
fruits["month"] = pd.to_datetime(fruits.month, format='%b',
errors='coerce').dt.month
fruits = fruits.sort_values(by="month")
fruits["month"] = pd.to_datetime(fruits['month'], format='%m').dt.strftime('%b')
ind1 = fruits.index
fruits_grp1 = fruits.set_index(['month', ind1],drop=False)
fruits_grp1
print(fruits_grp1)
非常感谢您的所有回答。我发现 sort_values() 可以实现这一点。
相同的可重现代码如下:
fruit_grp_srt=fruits_grp.sort_values(by='month')
但是这会按字母顺序而不是第一个索引的原始顺序对行进行排序。
仍在寻找更好的解决方案,谢谢
对我来说,它看起来像是按 month
进行的简单排序。首先,您需要删除 month
列(因为索引中有一个 month
),然后是 reset_index
del fruits_grp['month']
df = fruits_grp.reset_index()
然后,重要的是将月份设置为有序的分类数据类型,并定义自定义顺序。
df.month = df.month.astype('category')
df.month = df.month.cat.reorder_categories(['jan', 'feb', 'march', 'april', 'june'])
现在只是简单地按month
排序
df.sort_values(by='month')
输出
month level_1 fruit price
0 jan 0 apple 30
4 jan 4 apple 30
1 feb 1 orange 20
2 feb 2 pear 40
3 march 3 orange 25
8 march 8 orange 25
9 march 9 cherry 55
5 april 5 pear 45
6 april 6 cherry 60
11 april 11 cherry 60
7 june 7 pear 45
10 june 10 apple 37
生成示例数据框的代码如下
fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)
输出数据框应如下所示:
fruits_new1=pd.DataFrame()
fruits_new1['month']=['jan','jan','feb','feb','march','march','march','apr','apr','apr','jun','jun']
fruits_new1['fruit']=['apple','apple','orange','pear','orange','orange','cherry','pear','cherry','cherry','pear','apple']
fruits_new1['price']=[30,30,20,40,25,25,55,45,60,60,45,37]
ind1=fruits_new1.index
fruits_grp1 = fruits_new1.set_index(['month', ind1],drop=False)
fruits_grp1
谢谢
使用:
d={'Jan': 0, 'Feb': 1, 'Mar': 2, 'Apr': 3, 'May': 4, 'Jun': 5, 'Jul': 6, 'Aug': 7, 'Sep': 8, 'Oct': 9, 'Nov': 10, 'Dec': 11}
idx=fruits_grp['month'].str.title().str[:3].map(d).sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])
更新:
示例数据框:
fruits=pd.DataFrame()
fruits['month']=[1,2,2,3,1,4,4,6,3,3,6,4]
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)
然后只需简单地使用:
idx=fruits_grp['month'].sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])
fruits=pd.DataFrame()
fruits['month']=
['jan','feb','feb','mar','jan','apr','apr','jun','mar','mar','jun','apr']
fruits['fruit']=
['apple','orange','pear','orange','apple','pear','cherry','pear','orange',
'cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]
fruits["month"] = fruits["month"].str.capitalize()
fruits["month"] = pd.to_datetime(fruits.month, format='%b',
errors='coerce').dt.month
fruits = fruits.sort_values(by="month")
fruits["month"] = pd.to_datetime(fruits['month'], format='%m').dt.strftime('%b')
ind1 = fruits.index
fruits_grp1 = fruits.set_index(['month', ind1],drop=False)
fruits_grp1
print(fruits_grp1)
非常感谢您的所有回答。我发现 sort_values() 可以实现这一点。
相同的可重现代码如下:
fruit_grp_srt=fruits_grp.sort_values(by='month')
但是这会按字母顺序而不是第一个索引的原始顺序对行进行排序。
仍在寻找更好的解决方案,谢谢
对我来说,它看起来像是按 month
进行的简单排序。首先,您需要删除 month
列(因为索引中有一个 month
),然后是 reset_index
del fruits_grp['month']
df = fruits_grp.reset_index()
然后,重要的是将月份设置为有序的分类数据类型,并定义自定义顺序。
df.month = df.month.astype('category')
df.month = df.month.cat.reorder_categories(['jan', 'feb', 'march', 'april', 'june'])
现在只是简单地按month
df.sort_values(by='month')
输出
month level_1 fruit price
0 jan 0 apple 30
4 jan 4 apple 30
1 feb 1 orange 20
2 feb 2 pear 40
3 march 3 orange 25
8 march 8 orange 25
9 march 9 cherry 55
5 april 5 pear 45
6 april 6 cherry 60
11 april 11 cherry 60
7 june 7 pear 45
10 june 10 apple 37