Dataframe 从列表中获取恰好出现一次值的行

Question

代码：

import pandas as pd 

df = pd.DataFrame({'data': ['hey how r u', 'hello', 'hey abc d e f hey f', 'g h i i j k', 'hello how r u hello']})
vals = ['hey', 'hello']

我想获取列表 vals 中恰好包含一个单词的所有行。在这种情况下，这些将是 'hey how r u'、'hello'

我尝试了什么：

def exactly_one(text):
    for v in vals:
        if text.count(v) > 1:
            return False
    return True


df = df[df['data'].contains('|'.join(vals)) & (exactly_one(df['data'].str))]

因错误中断

Answer 1

您可以将 str.count 与正则表达式一起使用：

df[df['data'].str.count('|'.join(vals)).eq(1)]

输出：

          data
0  hey how r u
1        hello

中级：

df['data'].str.count('|'.join(vals))

0    1
1    1
2    2
3    0
4    2
Name: data, dtype: int64

Dataframe 从列表中获取恰好出现一次值的行

Dataframe take rows that have exactly one occurence of a value from list

python

string

dataframe

pandas