如何在dataframe中保存value_counts并从原始Dataframe中提取相关数据
How to save the value_counts in dataframe and pull out the related data from original Dataframe
我想在一列中找到频繁重复的元素,并将结果保存为Dataframe,然后从原始Dataframe中提取这些元素的相关信息
df = pd.DataFrame({ 'A' : np.random.randint(1000, 1005, ( 10)),
'B' : pd.Categorical(['company0', 'company1', 'company1', 'company2', 'company5', 'company5', 'company0', 'company5', 'company2', 'company2']),
'C' : 'foo',
'D' : pd.Categorical(["test","train","train","cup","bib","bib","test",'bib',"cup","cup"]),
})
# # generate 'company' DF
company = pd.DataFrame(df.B.value_counts().reset_index())
company.columns = ['B', 'count']
print(brands)
# # merge 'df' & 'company_count'
merged = pd.merge(df, company, on='B')
print(merged)
上面的代码给了我
A B C D count
0 1003 company0 foo test 2
1 1002 company0 foo test 2
2 1004 company1 foo train 2
3 1004 company1 foo train 2
4 1001 company2 foo cup 3
5 1000 company2 foo cup 3
6 1003 company2 foo cup 3
7 1000 company5 foo bib 3
8 1004 company5 foo bib 3
9 1001 company5 foo bib 3
但我想要的是
B count D
0 company5 3 bib
1 company2 3 cup
2 company1 2 train
3 company0 2 test
我怎样才能得到我想要的结果?
谢谢
从外观上看,一个B
有一个独特的D
。如果是这样,你可以这样做:
(df.groupby(['B','D'], observed=True).size()
.reset_index(name='count')
)
输出:
B D count
0 company0 test 2
1 company1 train 2
2 company2 cup 3
3 company5 bib 3
我想在一列中找到频繁重复的元素,并将结果保存为Dataframe,然后从原始Dataframe中提取这些元素的相关信息
df = pd.DataFrame({ 'A' : np.random.randint(1000, 1005, ( 10)),
'B' : pd.Categorical(['company0', 'company1', 'company1', 'company2', 'company5', 'company5', 'company0', 'company5', 'company2', 'company2']),
'C' : 'foo',
'D' : pd.Categorical(["test","train","train","cup","bib","bib","test",'bib',"cup","cup"]),
})
# # generate 'company' DF
company = pd.DataFrame(df.B.value_counts().reset_index())
company.columns = ['B', 'count']
print(brands)
# # merge 'df' & 'company_count'
merged = pd.merge(df, company, on='B')
print(merged)
上面的代码给了我
A B C D count
0 1003 company0 foo test 2
1 1002 company0 foo test 2
2 1004 company1 foo train 2
3 1004 company1 foo train 2
4 1001 company2 foo cup 3
5 1000 company2 foo cup 3
6 1003 company2 foo cup 3
7 1000 company5 foo bib 3
8 1004 company5 foo bib 3
9 1001 company5 foo bib 3
但我想要的是
B count D
0 company5 3 bib
1 company2 3 cup
2 company1 2 train
3 company0 2 test
我怎样才能得到我想要的结果? 谢谢
从外观上看,一个B
有一个独特的D
。如果是这样,你可以这样做:
(df.groupby(['B','D'], observed=True).size()
.reset_index(name='count')
)
输出:
B D count
0 company0 test 2
1 company1 train 2
2 company2 cup 3
3 company5 bib 3