pandas select 每个多索引组的前 N 个值
pandas select top N values for each multi index group
我有数据框
data = {'fruit': ['pear','pear','pear','banana', 'banana', 'banana', 'cherry', 'pear','cherry','pear','banana', 'banana', 'banana','banana', 'cherry', 'cherry','banana', 'cherry', 'cherry', 'cherry', 'cherry'],
'country': ['france','france', 'france', 'albania', 'albania', 'albania','france', 'france','france','france', 'albania', 'albania','france','france', 'france', 'france','france', 'france', 'france', 'france', 'armenia'],
'id': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5','5'],
'month1': ['january','november','january','january','january','january','january', 'november','march','march', 'november', 'march', 'january','january', 'march', 'january','november', 'march', 'march', 'november','july'],
'month': ['january','november','january','january','january','january','january', 'november','march','march', 'november', 'march', 'january','january', 'march', 'january','november', 'march', 'march', 'november','july']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id','month1', 'month'])
我用 df.pivot_table(values='month', index=['fruit','country'], columns='month1', aggfunc='count').reset_index()
制作了枢轴 table,在这里我得到了每个多索引组(水果和国家/地区)
我需要为每个组获取前 3 个值,但它可以是每 N 个值。
谁能看出问题
输出数据帧
请检查您是否能够使用此格式:
N = 3 #for N largest
df = df.groupby(["fruit", "country", "month"]).count()["month1"].rename("count")
df = df.groupby(["fruit", "country"]).nlargest(N)
df.index = df.index.droplevel([0,1])
df = df.reset_index()
df
>>
fruit country month count
0 banana albania january 3
1 banana albania march 1
2 banana albania november 1
3 banana france january 2
4 banana france november 1
5 cherry armenia july 1
6 cherry france march 4
7 cherry france january 2
8 cherry france november 1
9 pear france january 2
10 pear france november 2
11 pear france march 1
我有数据框
data = {'fruit': ['pear','pear','pear','banana', 'banana', 'banana', 'cherry', 'pear','cherry','pear','banana', 'banana', 'banana','banana', 'cherry', 'cherry','banana', 'cherry', 'cherry', 'cherry', 'cherry'],
'country': ['france','france', 'france', 'albania', 'albania', 'albania','france', 'france','france','france', 'albania', 'albania','france','france', 'france', 'france','france', 'france', 'france', 'france', 'armenia'],
'id': ['01','01','01','01','01','01','02','02','03','03','011', '011', '011','011', '6', '6','6', '5', '5', '5','5'],
'month1': ['january','november','january','january','january','january','january', 'november','march','march', 'november', 'march', 'january','january', 'march', 'january','november', 'march', 'march', 'november','july'],
'month': ['january','november','january','january','january','january','january', 'november','march','march', 'november', 'march', 'january','january', 'march', 'january','november', 'march', 'march', 'november','july']
}
df = pd.DataFrame(data, columns = ['fruit','country', 'id','month1', 'month'])
我用 df.pivot_table(values='month', index=['fruit','country'], columns='month1', aggfunc='count').reset_index()
制作了枢轴 table,在这里我得到了每个多索引组(水果和国家/地区)
我需要为每个组获取前 3 个值,但它可以是每 N 个值。 谁能看出问题
输出数据帧
请检查您是否能够使用此格式:
N = 3 #for N largest
df = df.groupby(["fruit", "country", "month"]).count()["month1"].rename("count")
df = df.groupby(["fruit", "country"]).nlargest(N)
df.index = df.index.droplevel([0,1])
df = df.reset_index()
df
>>
fruit country month count
0 banana albania january 3
1 banana albania march 1
2 banana albania november 1
3 banana france january 2
4 banana france november 1
5 cherry armenia july 1
6 cherry france march 4
7 cherry france january 2
8 cherry france november 1
9 pear france january 2
10 pear france november 2
11 pear france march 1