Pandas Dataframe:如何在其他列中添加出现次数的列
Pandas Dataframe: how to add column with number of occurrences in other column
我必须关注 df:
Col1 Col2
test Something
test2 Something
test3 Something
test Something
test2 Something
test5 Something
我想得到
Col1 Col2 Occur
test Something 2
test2 Something 2
test3 Something 1
test Something 2
test2 Something 2
test5 Something 1
我试过使用:
df["Occur"] = df["Col1"].value_counts()
但这并没有帮助。我的 Occur 专栏充满了 'NaN'
groupby
on 'col1' and then apply transform
Col2
到 return 一个系列,其索引与原始 df 对齐,因此您可以将其添加为列:
In [3]:
df['Occur'] = df.groupby('Col1')['Col2'].transform(pd.Series.value_counts)
df
Out[3]:
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
您还可以使用 GroupBy
+ transform
和 size
:
df['Occur'] = df.groupby('Col1')['Col1'].transform('size')
print(df)
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
当我想保留比 Col1 和 Col2 两列更多的列时,我无法得到其他答案。下面对我来说效果很好,保留了任意数量的其他列。
df['Occur'] = df['Col1'].apply(lambda x: (df['Col1'] == x).sum())
我必须关注 df:
Col1 Col2
test Something
test2 Something
test3 Something
test Something
test2 Something
test5 Something
我想得到
Col1 Col2 Occur
test Something 2
test2 Something 2
test3 Something 1
test Something 2
test2 Something 2
test5 Something 1
我试过使用:
df["Occur"] = df["Col1"].value_counts()
但这并没有帮助。我的 Occur 专栏充满了 'NaN'
groupby
on 'col1' and then apply transform
Col2
到 return 一个系列,其索引与原始 df 对齐,因此您可以将其添加为列:
In [3]:
df['Occur'] = df.groupby('Col1')['Col2'].transform(pd.Series.value_counts)
df
Out[3]:
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
您还可以使用 GroupBy
+ transform
和 size
:
df['Occur'] = df.groupby('Col1')['Col1'].transform('size')
print(df)
Col1 Col2 Occur
0 test Something 2
1 test2 Something 2
2 test3 Something 1
3 test Something 2
4 test2 Something 2
5 test5 Something 1
当我想保留比 Col1 和 Col2 两列更多的列时,我无法得到其他答案。下面对我来说效果很好,保留了任意数量的其他列。
df['Occur'] = df['Col1'].apply(lambda x: (df['Col1'] == x).sum())