计算一列在 Pandas 中包含特定值的次数

Question

假设我的数据框如下所示：

   column_name
1  book
2  fish
3  icecream|book
4  fish
5  campfire|book

现在，如果我使用 df['column_name'].value_counts()，它会告诉我 fish 是最常见的值。

但是，我希望返回 book，因为第 1、3 和 5 行包含单词 'book'。

我知道 .value_counts() 将 icecream|book 识别为一个值，但是有没有一种方法可以通过计算每个列单元格包含特定值的次数来确定最常见的值，以便'book'请问出现次数最多的值？

Answer 1

对Series使用split with stack:

a = df['column_name'].str.split('|', expand=True).stack().value_counts()
print (a)
book        3
fish        2
icecream    1
campfire    1
dtype: int64

或 Counter 列表推导和扁平化：

from collections import Counter

a = pd.Series(Counter([y for x in df['column_name'] for y in x.split('|')]))
print (a)
book        3
fish        2
icecream    1
campfire    1
dtype: int64

Answer 2

`pd.value_counts`

您还可以将列表传递给 value_counts 函数。注意我 join 被 | 然后被 | 分割。

pd.value_counts('|'.join(df.column_name).split('|'))

book        3
fish        2
icecream    1
campfire    1
dtype: int64

`get_dummies`

之所以有效，是因为您的数据结构采用 | 作为分隔符。如果您有不同的分隔符，请将其传递给 get_dummies 调用 df.column_name.str.get_dummies(sep='|').sum()

df.column_name.str.get_dummies().sum()

book        3
campfire    1
fish        2
icecream    1
dtype: int64

如果您希望结果排序

df.column_name.str.get_dummies().sum().sort_values(ascending=False)

book        3
fish        2
icecream    1
campfire    1
dtype: int64

`pd.factorize` 和 `np.bincount`

请注意，我 join 整个列并再次拆分。

f, u = pd.factorize('|'.join(df.column_name).split('|'))
pd.Series(np.bincount(f), u)

book        3
fish        2
icecream    1
campfire    1
dtype: int64

要排序，我们可以像上面那样使用sort_values。或者这个

f, u = pd.factorize('|'.join(df.column_name).split('|'))
counts = np.bincount(f)
a = counts.argsort()[::-1]
pd.Series(counts[a], u[a])

book        3
fish        2
campfire    1
icecream    1
dtype: int64

Answer 3

使用 collections.Counter + itertools.chain:

from collections import Counter
from itertools import chain

c = Counter(chain.from_iterable(df['column_name'].str.split('|')))

res = pd.Series(c)

print(res)

book        3
campfire    1
fish        2
icecream    1
dtype: int64

计算一列在 Pandas 中包含特定值的次数

Count how many times a column contains a certain value in Pandas

python

counting

dataframe

pandas

`pd.value_counts`

`get_dummies`

`pd.factorize` 和 `np.bincount`

计算一列在 Pandas 中包含特定值的次数

Count how many times a column contains a certain value in Pandas

python

counting

dataframe

pandas

pd.value_counts

get_dummies

pd.factorize 和 np.bincount

`pd.value_counts`

`get_dummies`

`pd.factorize` 和 `np.bincount`