使用字符串匹配并基于 if-else 条件创建新列

Creating a new column using string match and based on if-else condition

我有一个数据框,其列 'url_text' 包含 OCR 的文本输出。我正在尝试创建一个新列 'blocked',如果满足条件,行等于 1,否则为 0。

df[df['url_text'].str.contains('blocked you')] # detect all rows in 'url_text' column 
# that contain 'blocked you'. Code works.  

我试过在下面的函数中插入上面的代码。但是,当我将该函数应用于我的数据框时,出现以下错误:

def f(row):
    if row['url_text'] == df[df['url_text'].str.contains('blocked you')]:
        val = 1
    else:
        val = 0
    return val
df['blocked'] = df.apply(f)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 8740, in apply
    return op.apply()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 688, in apply
    return self.apply_standard()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 812, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
    results[i] = self.f(v)
  File "<input>", line 3, in f
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
    return self._get_value(key)
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
    raise KeyError(key)
KeyError: 'url_text'

这里的根本问题是您的代码将单个字符串 (row['url_text']) 与数据帧 (df[df...])

进行比较

不要在函数内部引用 df,只需使用在行本身上定义的方法。您还可以将其实现为 lambda 函数,以更接近 canonical examples.

df['blocked'] = df.apply(
    lambda row: 1 if 'blocked you' in row['url_text'] else 0,
    axis=1
)