根据 Pandas 中的条件分配值的最佳方式
Best way to assign value on condition in Pandas
识别子组中的最大值并根据是否为最大值分配每行值的正确方法是什么?
这是一个例子 df:
group subgroup
A 1
B 1
A 2
A 3
A 4
B 2
C 2
C 1
规则是:
if subgroup = max then result = 1
else subgroup = 2
结果将是:
group subgroup result
A 1 2
B 1 2
A 2 2
A 3 2
A 4 1
B 2 1
C 2 1
C 1 2
我现在是这样做的:
df['subgroup_max'] = df.groupby(['group'])['subgroup'].nunique()
df3['result'] = 2
df3.loc[df3['result'] == df3['subgroup_max'],'result'] = 1
好像效率不是很高。但是有没有更好的方法呢?
您可以对每组 max
个值的索引使用 DataFrameGroupBy.idxmax
:
df['result'] = 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df.loc[idx, 'result'] = 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
numpy.where
and Index.isin
的另一个解决方案:
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = np.where(df.index.isin(idx), 1, 2)
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = (~df.index.isin(idx)).astype(int) + 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
但是如果每组有多个最大值并且需要为所有最大值分配值使用apply
:
print (df)
group subgroup
0 A 4
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
mask = df.groupby(['group'])['subgroup'].apply(lambda x: x == x.max())
df['result'] = np.where(mask, 1, 2)
print (df)
group subgroup result
0 A 4 1
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
您也可以使用 lambda 函数,它使您能够指定更多条件。
df=pd.DataFrame({'group':['A','B','A','A','A','B','C','C'],'subgroup':[1,1,2,3,4,2,2,1]})
group subgroup
0 A 1
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
df['results']=df['subgroup'].apply( lambda x:1 if df['subgroup'].max()==x else 2)
group subgroup results
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 2
6 C 2 2
7 C 1 2
识别子组中的最大值并根据是否为最大值分配每行值的正确方法是什么? 这是一个例子 df:
group subgroup
A 1
B 1
A 2
A 3
A 4
B 2
C 2
C 1
规则是:
if subgroup = max then result = 1
else subgroup = 2
结果将是:
group subgroup result
A 1 2
B 1 2
A 2 2
A 3 2
A 4 1
B 2 1
C 2 1
C 1 2
我现在是这样做的:
df['subgroup_max'] = df.groupby(['group'])['subgroup'].nunique()
df3['result'] = 2
df3.loc[df3['result'] == df3['subgroup_max'],'result'] = 1
好像效率不是很高。但是有没有更好的方法呢?
您可以对每组 max
个值的索引使用 DataFrameGroupBy.idxmax
:
df['result'] = 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df.loc[idx, 'result'] = 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
numpy.where
and Index.isin
的另一个解决方案:
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = np.where(df.index.isin(idx), 1, 2)
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
idx = df.groupby(['group'])['subgroup'].idxmax()
df['result'] = (~df.index.isin(idx)).astype(int) + 1
print (df)
group subgroup result
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
但是如果每组有多个最大值并且需要为所有最大值分配值使用apply
:
print (df)
group subgroup
0 A 4
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
mask = df.groupby(['group'])['subgroup'].apply(lambda x: x == x.max())
df['result'] = np.where(mask, 1, 2)
print (df)
group subgroup result
0 A 4 1
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 1
6 C 2 1
7 C 1 2
您也可以使用 lambda 函数,它使您能够指定更多条件。
df=pd.DataFrame({'group':['A','B','A','A','A','B','C','C'],'subgroup':[1,1,2,3,4,2,2,1]})
group subgroup
0 A 1
1 B 1
2 A 2
3 A 3
4 A 4
5 B 2
6 C 2
7 C 1
df['results']=df['subgroup'].apply( lambda x:1 if df['subgroup'].max()==x else 2)
group subgroup results
0 A 1 2
1 B 1 2
2 A 2 2
3 A 3 2
4 A 4 1
5 B 2 2
6 C 2 2
7 C 1 2