如何对多索引数据帧的第一个索引执行类似于 group by 的操作

Question

生成示例数据框的代码如下

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]

ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)

输出数据框应如下所示：

fruits_new1=pd.DataFrame()
fruits_new1['month']=['jan','jan','feb','feb','march','march','march','apr','apr','apr','jun','jun']
fruits_new1['fruit']=['apple','apple','orange','pear','orange','orange','cherry','pear','cherry','cherry','pear','apple']
fruits_new1['price']=[30,30,20,40,25,25,55,45,60,60,45,37]
ind1=fruits_new1.index
fruits_grp1 = fruits_new1.set_index(['month', ind1],drop=False)
fruits_grp1

谢谢

Answer 1

使用：

d={'Jan': 0, 'Feb': 1, 'Mar': 2, 'Apr': 3, 'May': 4, 'Jun': 5, 'Jul': 6, 'Aug': 7, 'Sep': 8, 'Oct': 9, 'Nov': 10, 'Dec': 11}

idx=fruits_grp['month'].str.title().str[:3].map(d).sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])

更新：

示例数据框：

fruits=pd.DataFrame()
fruits['month']=[1,2,2,3,1,4,4,6,3,3,6,4]
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]

ind=(fruits.index)
fruits_grp = fruits.set_index(['month', ind],drop=False)

然后只需简单地使用：

idx=fruits_grp['month'].sort_values().index
fruits_grp=fruits_grp.reindex(idx)
fruits_grp['s']=list(range(len(fruits_grp)))
fruits_grp=fruits_grp.set_index('s',append=True).droplevel(1).rename_axis(index=['month',None])

Answer 2

fruits=pd.DataFrame()
fruits['month']= 
['jan','feb','feb','mar','jan','apr','apr','jun','mar','mar','jun','apr']
fruits['fruit']= 
['apple','orange','pear','orange','apple','pear','cherry','pear','orange',
'cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]

fruits["month"] = fruits["month"].str.capitalize()
fruits["month"] = pd.to_datetime(fruits.month, format='%b', 
errors='coerce').dt.month
fruits = fruits.sort_values(by="month")

fruits["month"] = pd.to_datetime(fruits['month'], format='%m').dt.strftime('%b')
ind1 = fruits.index
fruits_grp1 = fruits.set_index(['month', ind1],drop=False)
fruits_grp1
print(fruits_grp1)

Answer 3

非常感谢您的所有回答。我发现 sort_values() 可以实现这一点。

相同的可重现代码如下：

fruit_grp_srt=fruits_grp.sort_values(by='month')

但是这会按字母顺序而不是第一个索引的原始顺序对行进行排序。

仍在寻找更好的解决方案，谢谢

Answer 4

对我来说，它看起来像是按 month 进行的简单排序。首先，您需要删除 month 列（因为索引中有一个 month），然后是 reset_index

del fruits_grp['month']
df = fruits_grp.reset_index()

然后，重要的是将月份设置为有序的分类数据类型，并定义自定义顺序。

df.month = df.month.astype('category')
df.month = df.month.cat.reorder_categories(['jan', 'feb', 'march', 'april', 'june'])

现在只是简单地按month

排序

df.sort_values(by='month')

输出

    month   level_1     fruit   price
0   jan     0   apple   30
4   jan     4   apple   30
1   feb     1   orange  20
2   feb     2   pear    40
3   march   3   orange  25
8   march   8   orange  25
9   march   9   cherry  55
5   april   5   pear    45
6   april   6   cherry  60
11  april   11  cherry  60
7   june    7   pear    45
10  june    10  apple   37

如何对多索引数据帧的第一个索引执行类似于 group by 的操作

how to perform an operation similar to group by on the first index of a multi indexed dataframe

multi-index

dataframe

pandas