数据框中出现次数超过 100 的唯一值的计数

Question

我有一个数据框，它有一个名为“drug_name”的列名，我想获取该列中所有唯一值的列表以及它出现的次数。为此，我使用

print(df['drug_name'].value_counts())

和

pd.value_counts(df.drug_name)

Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.

Answer 1

您可以 select 之后的值：

s = df['drug_name'].value_counts()
s[s.ge(100)]

或者，由于value_counts是按计数递减排序的，您只能查看最前面的：

df['drug_name'].value_counts().head(20) # 20 top items

Answer 2

这将解决问题。


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()


# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

这是示例数据框：

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

这些是计算的唯一值：

hello    4
bye      2

这些是唯一值 > n:

hello    4

数据框中出现次数超过 100 的唯一值的计数

Count of unique values that occur more than 100 in a data frame

python

unique

count

dataframe