数据框中出现次数超过 100 的唯一值的计数
Count of unique values that occur more than 100 in a data frame
我有一个数据框,它有一个名为“drug_name”的列名,我想获取该列中所有唯一值的列表以及它出现的次数。
为此,我使用
print(df['drug_name'].value_counts())
和
pd.value_counts(df.drug_name)
Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.
您可以 select 之后的值:
s = df['drug_name'].value_counts()
s[s.ge(100)]
或者,由于value_counts
是按计数递减排序的,您只能查看最前面的:
df['drug_name'].value_counts().head(20) # 20 top items
这将解决问题。
import pandas as pd
# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()
# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()
# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)
这是示例数据框:
drug_name
0 hello
1 hello
2 hello
3 bye
4 bye
这些是计算的唯一值:
hello 4
bye 2
这些是唯一值 > n:
hello 4
我有一个数据框,它有一个名为“drug_name”的列名,我想获取该列中所有唯一值的列表以及它出现的次数。 为此,我使用
print(df['drug_name'].value_counts())
和
pd.value_counts(df.drug_name)
Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.
您可以 select 之后的值:
s = df['drug_name'].value_counts()
s[s.ge(100)]
或者,由于value_counts
是按计数递减排序的,您只能查看最前面的:
df['drug_name'].value_counts().head(20) # 20 top items
这将解决问题。
import pandas as pd
# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()
# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()
# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)
这是示例数据框:
drug_name
0 hello
1 hello
2 hello
3 bye
4 bye
这些是计算的唯一值:
hello 4
bye 2
这些是唯一值 > n:
hello 4