Python：使用列表和数据框进行精确单词匹配

Question

大家好:)希望你们一切都好。我是 python 的新手，在获得精确的单词匹配时遇到了问题。我有一个单词列表 key_list，我需要使用此列表循环遍历字符串数据帧 df['response'] 以计算 key_list 中的单词出现在数据帧中的次数 df['response'].

目前，这是我正在使用的代码：

df['count_response']=df['response'].str.count('|'.join(key_list))

这是我收到的输出：

key_list:  ['honestli', 'know', 'realli', 'feel', 'wast', 'time', 'school', 'good', 'reason', 'go', 'colleg', 
'howev', 'wonder', 'whether', 'continu', 'cant', 'see', 'frankli', 'care', 'less', 'understand']
              response  count_response
0          parent said             0
1     want make differ             0
2            dont know             1
3                 rich             0
4       go career want             2
5              actuari             0
6          social life             0
7       expect societi             0
8                                  0
9           help peopl             0
10   realli love learn             1
11               money             0
12       passion field             0
13  happi learn econom             0
14   want uplift peopl             0

不幸的是，这不是正确的输出。在第 4 行中，count_response 获得值 2；然而，在 key_list 中只有 "go" 这个词存在。我怀疑 python 正在计算单词 "care" （在 key_list 中）并且它在单词 "career" 中但它不应该计算这个单词因为我需要精确的单词匹配。

感谢您的宝贵时间，非常感谢您的回复！

Answer 1

我认为你需要 \b\b:

的字边界

df['count_response']=df['response'].str.count('|'.join(r"\b{}\b".format(x) for x in key_list))

Python：使用列表和数据框进行精确单词匹配

Python: Exact word match using a list and data frame

python

string

exact-match

pandas