如何识别包含多个单词的字符串
How to identify the string where it contains multiple words
数据类型为字符串的数据框列文本包含句子,我希望提取包含某些单词的行,而不考虑它们出现的位置。
例如:
Column
Cat and mouse are the born enemies
Cat is a furry pet
df = df[df['cleantext'].str.contains('cat' & 'mouse')].reset_index()
df.shape
以上是抛出错误。
我知道对于或条件我们可以写 -
df = df[df['cleantext'].str.contains('cat | mouse')].reset_index()
但我想提取同时存在猫和老鼠的行
预期输出 -
Column
Cat and mouse are the born enemies
这是一种方法,也适用于多个单词:
words = ['cat', 'mouse']
m = pd.concat([df.Column.str.lower().str.contains(w) for w in words], axis=1).all(1)
df.loc[m,:]
Column
0 Cat and mouse are the born enemies
数据类型为字符串的数据框列文本包含句子,我希望提取包含某些单词的行,而不考虑它们出现的位置。
例如:
Column
Cat and mouse are the born enemies
Cat is a furry pet
df = df[df['cleantext'].str.contains('cat' & 'mouse')].reset_index()
df.shape
以上是抛出错误。
我知道对于或条件我们可以写 -
df = df[df['cleantext'].str.contains('cat | mouse')].reset_index()
但我想提取同时存在猫和老鼠的行
预期输出 -
Column
Cat and mouse are the born enemies
这是一种方法,也适用于多个单词:
words = ['cat', 'mouse']
m = pd.concat([df.Column.str.lower().str.contains(w) for w in words], axis=1).all(1)
df.loc[m,:]
Column
0 Cat and mouse are the born enemies