如何使用 Python 和 re 从字符串中提取准确的单词？

Question

数据样本为：

a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
                           'and you xxx and x you xxxxxx and you and you']})
b=['i','and you']

b中有两个词（相）。我想在a中找到它们。我想找到确切的单词，而不是子字符串。所以，我希望结果是：

['i' ,'i' ,'i']
['and you',' and you ',' and you']

我需要计算这些词在一个字符串中出现了多少次。所以我真的不需要上面的列表。我把它放在这里是因为我想表明我想在字符串中找到确切的单词。这是我的尝试：

s='r\'^'+b[0]+' | '+b[0]+' | '+b[0]+'$\''
len(re.findall(s,a.loc[0,'Strings']))

希望s能找到开头、中间、结尾的词。我有一个大 a 和 b。所以我不能只在这里使用真正的字符串。但结果是：

len(re.findall(s,a.loc[0,'Strings']))
Out[110]: 1
re.findall(s,a.loc[0,'Strings'])
Out[111]: [' i ']

看来只有中间那个被匹配到。我不确定哪里出错了。

Answer 1

a=pd.DataFrame({'Strings':['i xxx iwantto iii i xxx i',
                           'and you xxx and x you xxxxxx and you and you']})
print(a.Strings.str.findall('i |and you'))

输出

0                   [i , i , i ]
1    [and you, and you, and you]
Name: Strings, dtype: object

print(a.Strings.str.findall('{} |{}'.format(*b)))

如何使用 Python 和 re 从字符串中提取准确的单词？

How to extract exact words from a string using Python and re?

python

string

word

match

findall