在其他列中添加包含出现次数 (.isin) 的新列

Question

如何计算列 substring 中的每个值在列 string 中出现的频率并将结果附加为给定此示例数据框的新列：

df = pd.DataFrame({
'substring':['a', 'b', 'c', 'a', 'b', 'c', 'd', 'd', 'a']
'string':['a a b', '', 'a a', 'b', 'a b', 'a c a', '', 'b c a', 'd d']})

  substring string
0         a  a a b
1         b      
2         c    a a
3         a      b
4         b    a b
5         c  a c a
6         d       
7         d  b c a
8         a    d d

这里是我希望输出的样子：

  substring string count
0         a  a a b    5
1         b           4
2         c    a a    2
3         a      b    5
4         b    a b    4
5         c  a c a    2
6         d           1
7         d  b c a    1
8         a    d d    5

Answer 1

你的问题不是很明确，但我猜测你想计算字符（或单词？）在整个字符串中出现的次数，而不计算每个重复项字符串。

您可以使用转换来设置 collections.Counter:

from itertools import chain
from collections import Counter

# count unique elements (here words)
c = Counter(chain.from_iterable(set(x.split()) for x in df['string']))

## alternative for characters
# c = Counter(chain.from_iterable(set(x) for x in df['string']))

# map counts
df['count'] = df['substring'].map(c)

输出：

  substring string  count
0         a  a a b      5
1         b             4
2         c    a a      2
3         a      b      5
4         b    a b      4
5         c  a c a      2
6         d             1
7         d  b c a      1
8         a    d d      5

计数器的纯 pandas 变体（相当慢）

c = df['string'].str.split().apply(set).explode().value_counts()

在其他列中添加包含出现次数 (.isin) 的新列

Add new column with number of occurences (.isin) in other column

python

group-by

pandas

isin

计数器的纯 pandas 变体（相当慢）