pandas: 完全匹配在 if AND 条件下不起作用
pandas: exact match does not work in an if AND condition
我有两个数据框如下:
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': ['noun','not noun','noun', 'not noun']}
df = pd.DataFrame (data, columns = ['First','Second'])
和
data2 = {'example': ['First value is important', 'second value is important too','it us good to know',
'Firstap is also good', 'aplsecond is very good']}
df2 = pd.DataFrame (data2, columns = ['example'])
并且我编写了以下代码,如果句子的第一个单词在 df 中有匹配项,只有在第二列中我们有单词 'noun'。所以基本上有两个条件。
def checker():
result =[]
for l in df2.example:
df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
if df.first_unlist.str.match(pat=l.split(' ', 1)[0]).any() and df.Second.str.match('noun').any():
result.append(l)
return result
然而,我意识到当我 运行 函数时我得到 ['First value is important', 'second value is important too'] 作为输出,这表明 'noun' 的第二个条件仅过滤器不起作用。所以我想要的输出是 ['First value is important']。
我也试过 .str.contains() 和 .eq() 但我仍然得到相同的输出
我建议在尝试匹配之前过滤掉 df
:
def checker():
result = []
for l in df2.example:
first_unlist = [x[0] for x in df.loc[df.Second == 'noun', 'First']
if l.split(' ')[0] in first_unlist:
result.append(l)
return result
checker()
['First value is important']
我有两个数据框如下:
data = {'First': [['First', 'value'],['second','value'],['third','value','is'],['fourth','value','is']],
'Second': ['noun','not noun','noun', 'not noun']}
df = pd.DataFrame (data, columns = ['First','Second'])
和
data2 = {'example': ['First value is important', 'second value is important too','it us good to know',
'Firstap is also good', 'aplsecond is very good']}
df2 = pd.DataFrame (data2, columns = ['example'])
并且我编写了以下代码,如果句子的第一个单词在 df 中有匹配项,只有在第二列中我们有单词 'noun'。所以基本上有两个条件。
def checker():
result =[]
for l in df2.example:
df['first_unlist'] = [','.join(map(str, l)) for l in df.First]
if df.first_unlist.str.match(pat=l.split(' ', 1)[0]).any() and df.Second.str.match('noun').any():
result.append(l)
return result
然而,我意识到当我 运行 函数时我得到 ['First value is important', 'second value is important too'] 作为输出,这表明 'noun' 的第二个条件仅过滤器不起作用。所以我想要的输出是 ['First value is important']。 我也试过 .str.contains() 和 .eq() 但我仍然得到相同的输出
我建议在尝试匹配之前过滤掉 df
:
def checker():
result = []
for l in df2.example:
first_unlist = [x[0] for x in df.loc[df.Second == 'noun', 'First']
if l.split(' ')[0] in first_unlist:
result.append(l)
return result
checker()
['First value is important']