
Python: show rows if there's certain keyword from the list and show what was the detected keyword

我试图获取垃圾邮件的数据框,以便对它们进行分析。这是原始 CSV 文件的样子。



###import the original CSV (it's simplified sample which has only two columns - sender, text)
import pandas as pd
df = pd.read_csv("spam.csv")

### if any of those is in the text column, I'll put that row in the new data frame.
keyword = ["prize", "", "shorturl"]

### putting rows that have a keyword into a new data frame. 
spam_list = df[df['text'].str.contains('|'.join(keyword))]

### creating a new column 'detected keyword' and trying to show what was detected keyword
spam_list['detected word'] = keyword

但是,“检测到的单词”是按列表顺序排列的。 我知道这是因为我将列表放入了新列,但我无法 think/find 更好的方法来做到这一点。我应该使用“for”作为解决方案吗?还是我以完全错误的方式接近它?


def detect_keyword(row):
    for key in keyword:
        if key in row['text']:
            return key

然后使用 pandas.apply() 完成所有行并将结果保存为新列:

df['detected_word'] = df.apply(lambda x: detect_keyword(x), axis=1)

您可以使用下面图片中给出的代码来解决您提出的问题,我无法粘贴代码,因为 Whosebug 不允许粘贴短 links。 link代码可用。
