使用字符串匹配并基于 if-else 条件创建新列
Creating a new column using string match and based on if-else condition
我有一个数据框,其列 'url_text' 包含 OCR 的文本输出。我正在尝试创建一个新列 'blocked',如果满足条件,行等于 1,否则为 0。
df[df['url_text'].str.contains('blocked you')] # detect all rows in 'url_text' column
# that contain 'blocked you'. Code works.
我试过在下面的函数中插入上面的代码。但是,当我将该函数应用于我的数据框时,出现以下错误:
def f(row):
if row['url_text'] == df[df['url_text'].str.contains('blocked you')]:
val = 1
else:
val = 0
return val
df['blocked'] = df.apply(f)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "<input>", line 3, in f
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
return self._get_value(key)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
raise KeyError(key)
KeyError: 'url_text'
这里的根本问题是您的代码将单个字符串 (row['url_text']
) 与数据帧 (df[df...]
)
进行比较
不要在函数内部引用 df
,只需使用在行本身上定义的方法。您还可以将其实现为 lambda 函数,以更接近 canonical examples.
df['blocked'] = df.apply(
lambda row: 1 if 'blocked you' in row['url_text'] else 0,
axis=1
)
我有一个数据框,其列 'url_text' 包含 OCR 的文本输出。我正在尝试创建一个新列 'blocked',如果满足条件,行等于 1,否则为 0。
df[df['url_text'].str.contains('blocked you')] # detect all rows in 'url_text' column
# that contain 'blocked you'. Code works.
我试过在下面的函数中插入上面的代码。但是,当我将该函数应用于我的数据框时,出现以下错误:
def f(row):
if row['url_text'] == df[df['url_text'].str.contains('blocked you')]:
val = 1
else:
val = 0
return val
df['blocked'] = df.apply(f)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "<input>", line 3, in f
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 942, in __getitem__
return self._get_value(key)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/series.py", line 1051, in _get_value
loc = self.index.get_loc(label)
File "/Users/marcoliedecke/Desktop/Who_Blocks_Who?/Code/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
raise KeyError(key)
KeyError: 'url_text'
这里的根本问题是您的代码将单个字符串 (row['url_text']
) 与数据帧 (df[df...]
)
不要在函数内部引用 df
,只需使用在行本身上定义的方法。您还可以将其实现为 lambda 函数,以更接近 canonical examples.
df['blocked'] = df.apply(
lambda row: 1 if 'blocked you' in row['url_text'] else 0,
axis=1
)