str.contains 在单词和特殊字符之间没有 space 时不起作用

Question

我有一个包含电影名称和电视剧名称的数据框。

From specific keywords 我想根据这些关键字将每一行分类为 Movie 或 Title。然而，由于括号之间没有 space 关键字，它们没有被 str.contains() 函数提取，我需要做一个解决方法。

这是我的数据框：

import pandas as pd
import numpy as np

watched_df = pd.DataFrame([['Love Death Robots (Episode 1)'], 
                   ['James Bond'],
                   ['How I met your Mother (Avnsitt 3)'], 
                   ['random name'],
                   ['Random movie 3 Episode 8383893']], 
                  columns=['Title'])
watched_df.head()

要添加将标题分类为电视剧或电影的列，我有以下代码。

watched_df["temporary_brackets_removed_title"] = watched_df['Title'].str.replace('(', '')
watched_df["Film_Type"] = np.where(watched_df.temporary_brackets_removed_title.astype(str).str.contains(pat = 'Episode | Avnsitt', case = False), 'Series', 'Movie')
watched_df = watched_df.drop('temporary_brackets_removed_title', 1)
watched_df.head()

有没有更简单的方法来解决这个问题而无需添加和删除列？

也许是一个类似于 str.contains 的函数，它不会查看完全相同但只包含给定单词的字符串？类似于 SQL 你如何拥有“喜欢”功能？

Answer 1

您可以使用 str.contains 然后 map 结果：

watched_df['Film_Type'] = watched_df['Title'].str.contains(r'(?:Episode|Avnsitt)').map({True: 'Series', False: 'Movie'})

输出：

>>> watched_df
                               Title Film_Type
0      Love Death Robots (Episode 1)    Series
1                         James Bond     Movie
2  How I met your Mother (Avnsitt 3)    Series
3                        random name     Movie
4     Random movie 3 Episode 8383893    Series

str.contains 在单词和特殊字符之间没有 space 时不起作用

str.contains not working when there is not a space between the word and special character

python

numpy

contains

dataframe

pandas