Pandas 使用来自单独数据帧的值在数据帧中搜索子字符串

Question

我已经解决了这个问题，但似乎无法解决问题。基本上我有一个 excel spreadsheet 和多个 sheets，对于这个 python 程序我只关心两个 sheets，更具体地说一个每个 sheet.

的列

我想从一个 dataframe/column (A) 中获取所有值，然后查看第二个 dataframe/column (B) 是否包含 A 列中该值的子字符串。最终我想要一个csv 输出包含 A 列中的行，其中 A 列中的值在 B 列中没有匹配的子字符串。

到目前为止我已经知道了

读取 excel 文件并分别使用我感兴趣的列创建两个数据帧：

df_A = pd.read_excel('test.xlsx',
        sheet_name='Sheet_1',
        usecols=['Column_A'])

df_B = pd.read_excel('test.xlsx',
        sheet_name='Sheet_2',
        usecols=['Column_B'])

以下是数据帧的内容：

Column_A
20220201_ABC_TEST-00012345_987654
20220201_ABC_TEST-00012346_987654
20220201_ABC_TEST-00012347_987654
20220201_ABC_TEST-00012351_987654
20220201_ABC_TEST-00012352_987654
20220201_ABC_TEST-00012353_987654

Column_B
TEST-00012345
TEST-00012346
TEST-00012347
TEST-00012348
TEST-00012349
TEST-00012350

这是我不知道如何正确完成的部分，即从 df_A 中获取所有值并将它们与 df_B 中的所有值进行比较以找出哪个来自 df_A 的值在 df_B.

中没有子串匹配

substring_matches = df_A.str.contains(df_B)
print(substring_matches)

这给出了一个错误：

AttributeError: 'DataFrame' object has no attribute 'str'

所以我在这里做的事情不太正确。

Answer 1

试试这个：

substring_matches = df_A['Column_A'].apply(lambda s1: df_B['Column B'].apply(lambda s2: s2 in s1).any())
df_A[~substring_matches].to_csv('unmatched.csv')

Pandas 使用来自单独数据帧的值在数据帧中搜索子字符串

Pandas search a dataframe for substrings using the values from a separate dataframe

python

dataframe

python-3.x

pandas