在其他列中添加包含出现次数 (.isin) 的新列
Add new column with number of occurences (.isin) in other column
如何计算列 substring
中的每个值在列 string
中出现的频率并将结果附加为给定此示例数据框的新列:
df = pd.DataFrame({
'substring':['a', 'b', 'c', 'a', 'b', 'c', 'd', 'd', 'a']
'string':['a a b', '', 'a a', 'b', 'a b', 'a c a', '', 'b c a', 'd d']})
substring string
0 a a a b
1 b
2 c a a
3 a b
4 b a b
5 c a c a
6 d
7 d b c a
8 a d d
这里是我希望输出的样子:
substring string count
0 a a a b 5
1 b 4
2 c a a 2
3 a b 5
4 b a b 4
5 c a c a 2
6 d 1
7 d b c a 1
8 a d d 5
你的问题不是很明确,但我猜测你想计算字符(或单词?)在整个字符串中出现的次数,而不计算每个重复项字符串。
您可以使用转换来设置 collections.Counter
:
from itertools import chain
from collections import Counter
# count unique elements (here words)
c = Counter(chain.from_iterable(set(x.split()) for x in df['string']))
## alternative for characters
# c = Counter(chain.from_iterable(set(x) for x in df['string']))
# map counts
df['count'] = df['substring'].map(c)
输出:
substring string count
0 a a a b 5
1 b 4
2 c a a 2
3 a b 5
4 b a b 4
5 c a c a 2
6 d 1
7 d b c a 1
8 a d d 5
计数器的纯 pandas 变体(相当慢)
c = df['string'].str.split().apply(set).explode().value_counts()
如何计算列 substring
中的每个值在列 string
中出现的频率并将结果附加为给定此示例数据框的新列:
df = pd.DataFrame({
'substring':['a', 'b', 'c', 'a', 'b', 'c', 'd', 'd', 'a']
'string':['a a b', '', 'a a', 'b', 'a b', 'a c a', '', 'b c a', 'd d']})
substring string
0 a a a b
1 b
2 c a a
3 a b
4 b a b
5 c a c a
6 d
7 d b c a
8 a d d
这里是我希望输出的样子:
substring string count
0 a a a b 5
1 b 4
2 c a a 2
3 a b 5
4 b a b 4
5 c a c a 2
6 d 1
7 d b c a 1
8 a d d 5
你的问题不是很明确,但我猜测你想计算字符(或单词?)在整个字符串中出现的次数,而不计算每个重复项字符串。
您可以使用转换来设置 collections.Counter
:
from itertools import chain
from collections import Counter
# count unique elements (here words)
c = Counter(chain.from_iterable(set(x.split()) for x in df['string']))
## alternative for characters
# c = Counter(chain.from_iterable(set(x) for x in df['string']))
# map counts
df['count'] = df['substring'].map(c)
输出:
substring string count
0 a a a b 5
1 b 4
2 c a a 2
3 a b 5
4 b a b 4
5 c a c a 2
6 d 1
7 d b c a 1
8 a d d 5
计数器的纯 pandas 变体(相当慢)
c = df['string'].str.split().apply(set).explode().value_counts()