数据框中出现次数超过 100 的唯一值的计数

Count of unique values that occur more than 100 in a data frame

我有一个数据框,它有一个名为“drug_name”的列名,我想获取该列中所有唯一值的列表以及它出现的次数。 为此,我使用

print(df['drug_name'].value_counts()) 

pd.value_counts(df.drug_name)

Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.

您可以 select 之后的值:

s = df['drug_name'].value_counts()
s[s.ge(100)]

或者,由于value_counts是按计数递减排序的,您只能查看最前面的:

df['drug_name'].value_counts().head(20) # 20 top items

这将解决问题。


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()


# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)


这是示例数据框:

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

这些是计算的唯一值:

hello    4
bye      2

这些是唯一值 > n:

hello    4