使用 python 根据特定条件合并两个表

Combine two tables based on certain criteria using python

我有两个 table(table1,table2)如下:

table1:

ID Filename
12345 12345.txt
12346 12346.txt
12347 12347.txt
12348 12348.txt
12349 12349.txt
12350 12350.txt

table2:包含存在 table 1 个文件的路径

Path
/table/text3/12349.txt
/table/text1/12345.txt
/table/text2/12346.txt
/table/text1/12350.txt
/table/text3/12347.txt
/table/text1/12348.txt

如何合并这两个文件,使路径和文件名匹配。到目前为止我尝试了什么?

pd.concat([table1, table2])

我也试过pd.merge,但它与文件名不匹配。我该如何解决?

期望的输出:

ID Filename Path
12345 12345.txt /table/text1/12345.txt
12346 12346.txt /table/text2/12346.txt
12347 12347.txt /table/text3/12347.txt
12348 12348.txt /table/text1/12348.txt
12349 12349.txt /table/text3/12349.txt
12350 12350.txt /table/text1/12350.txt

您可以提取 Filename 然后进行合并:

merged_df = df2.assign(Filename = df2['Path'].str.rsplit('/', 1).str[-1]).merge(df1, on = 'Filename')

选项通过 pathlib:

from pathlib import Path
merged_df = df2.assign(Filename = df2['Path'].apply(lambda x: Path(x).name)).merge(df1, on = 'Filename')

输出:

                     Path   Filename     ID
0  /table/text3/12349.txt  12349.txt  12349
1  /table/text1/12345.txt  12345.txt  12345
2  /table/text2/12346.txt  12346.txt  12346
3  /table/text1/12350.txt  12350.txt  12350
4  /table/text3/12347.txt  12347.txt  12347
5  /table/text1/12348.txt  12348.txt  12348

您可以通过 .assign() and use the resulting copy of df2 with newly added column to merge with df1 using .merge() 在公共列 Filename 上创建临时列 Filenamedf2 上,如下所示:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')

结果:

print(df_merge)



      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt  /table/text3/12347.txt
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt

编辑

如果在df2中没有匹配条目的情况下仍想显示df1的条目,可以使用带参数how='left'的左合并,如下:

df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')

结果:

如果路径 /table/text3/12347.txt 丢失:

      ID   Filename                    Path
0  12345  12345.txt  /table/text1/12345.txt
1  12346  12346.txt  /table/text2/12346.txt
2  12347  12347.txt                     NaN
3  12348  12348.txt  /table/text1/12348.txt
4  12349  12349.txt  /table/text3/12349.txt
5  12350  12350.txt  /table/text1/12350.txt