如何计算一个值的出现次数

Question

如何使用数据帧计算直方图的出现次数

d = {'color': ["blue", "green", "yellow", "red, blue", "green, yellow", "yellow, red, blue"],}
df = pd.DataFrame(data=d)

你如何从

颜色
蓝色
绿色
黄色
红、蓝
绿色、黄色
黄色、红色、蓝色

至

颜色	发生率
蓝色	3
绿色	2
黄色	3

Answer 1

让我们尝试 split by regex ,s\* for comma with zero or more whitespaces, then explode into rows and value_counts 获取值的计数：

s = (
    df['color'].str.split(r',\s*')
        .explode()
        .value_counts()
        .rename_axis('color')
        .reset_index(name='occurance')
)

或者可以 split 然后扩展 stack:

s = (
    df['color'].str.split(r',\s*', expand=True)
        .stack()
        .value_counts()
        .rename_axis('color')
        .reset_index(name='occurance')
)

s:

    color  occurance
0    blue          3
1  yellow          3
2   green          2
3     red          2

Answer 2

这是使用 .str.get_dummies()

的另一种方法

df['color'].str.get_dummies(sep=', ').sum()

如何计算一个值的出现次数

how to count the occurences of a value

histogram

dataframe

pandas