使用 python 根据特定条件合并两个表
Combine two tables based on certain criteria using python
我有两个 table(table1,table2)如下:
table1:
ID
Filename
12345
12345.txt
12346
12346.txt
12347
12347.txt
12348
12348.txt
12349
12349.txt
12350
12350.txt
table2:包含存在 table 1 个文件的路径
Path
/table/text3/12349.txt
/table/text1/12345.txt
/table/text2/12346.txt
/table/text1/12350.txt
/table/text3/12347.txt
/table/text1/12348.txt
如何合并这两个文件,使路径和文件名匹配。到目前为止我尝试了什么?
pd.concat([table1, table2])
我也试过pd.merge
,但它与文件名不匹配。我该如何解决?
期望的输出:
ID
Filename
Path
12345
12345.txt
/table/text1/12345.txt
12346
12346.txt
/table/text2/12346.txt
12347
12347.txt
/table/text3/12347.txt
12348
12348.txt
/table/text1/12348.txt
12349
12349.txt
/table/text3/12349.txt
12350
12350.txt
/table/text1/12350.txt
您可以提取 Filename
然后进行合并:
merged_df = df2.assign(Filename = df2['Path'].str.rsplit('/', 1).str[-1]).merge(df1, on = 'Filename')
选项通过 pathlib
:
from pathlib import Path
merged_df = df2.assign(Filename = df2['Path'].apply(lambda x: Path(x).name)).merge(df1, on = 'Filename')
输出:
Path Filename ID
0 /table/text3/12349.txt 12349.txt 12349
1 /table/text1/12345.txt 12345.txt 12345
2 /table/text2/12346.txt 12346.txt 12346
3 /table/text1/12350.txt 12350.txt 12350
4 /table/text3/12347.txt 12347.txt 12347
5 /table/text1/12348.txt 12348.txt 12348
您可以通过 .assign()
and use the resulting copy of df2
with newly added column to merge with df1
using .merge()
在公共列 Filename
上创建临时列 Filename
在 df2
上,如下所示:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')
结果:
print(df_merge)
ID Filename Path
0 12345 12345.txt /table/text1/12345.txt
1 12346 12346.txt /table/text2/12346.txt
2 12347 12347.txt /table/text3/12347.txt
3 12348 12348.txt /table/text1/12348.txt
4 12349 12349.txt /table/text3/12349.txt
5 12350 12350.txt /table/text1/12350.txt
编辑
如果在df2
中没有匹配条目的情况下仍想显示df1
的条目,可以使用带参数how='left'
的左合并,如下:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')
结果:
如果路径 /table/text3/12347.txt 丢失:
ID Filename Path
0 12345 12345.txt /table/text1/12345.txt
1 12346 12346.txt /table/text2/12346.txt
2 12347 12347.txt NaN
3 12348 12348.txt /table/text1/12348.txt
4 12349 12349.txt /table/text3/12349.txt
5 12350 12350.txt /table/text1/12350.txt
我有两个 table(table1,table2)如下:
table1:
ID | Filename |
---|---|
12345 | 12345.txt |
12346 | 12346.txt |
12347 | 12347.txt |
12348 | 12348.txt |
12349 | 12349.txt |
12350 | 12350.txt |
table2:包含存在 table 1 个文件的路径
Path |
---|
/table/text3/12349.txt |
/table/text1/12345.txt |
/table/text2/12346.txt |
/table/text1/12350.txt |
/table/text3/12347.txt |
/table/text1/12348.txt |
如何合并这两个文件,使路径和文件名匹配。到目前为止我尝试了什么?
pd.concat([table1, table2])
我也试过pd.merge
,但它与文件名不匹配。我该如何解决?
期望的输出:
ID | Filename | Path |
---|---|---|
12345 | 12345.txt | /table/text1/12345.txt |
12346 | 12346.txt | /table/text2/12346.txt |
12347 | 12347.txt | /table/text3/12347.txt |
12348 | 12348.txt | /table/text1/12348.txt |
12349 | 12349.txt | /table/text3/12349.txt |
12350 | 12350.txt | /table/text1/12350.txt |
您可以提取 Filename
然后进行合并:
merged_df = df2.assign(Filename = df2['Path'].str.rsplit('/', 1).str[-1]).merge(df1, on = 'Filename')
选项通过 pathlib
:
from pathlib import Path
merged_df = df2.assign(Filename = df2['Path'].apply(lambda x: Path(x).name)).merge(df1, on = 'Filename')
输出:
Path Filename ID
0 /table/text3/12349.txt 12349.txt 12349
1 /table/text1/12345.txt 12345.txt 12345
2 /table/text2/12346.txt 12346.txt 12346
3 /table/text1/12350.txt 12350.txt 12350
4 /table/text3/12347.txt 12347.txt 12347
5 /table/text1/12348.txt 12348.txt 12348
您可以通过 .assign()
and use the resulting copy of df2
with newly added column to merge with df1
using .merge()
在公共列 Filename
上创建临时列 Filename
在 df2
上,如下所示:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename')
结果:
print(df_merge)
ID Filename Path
0 12345 12345.txt /table/text1/12345.txt
1 12346 12346.txt /table/text2/12346.txt
2 12347 12347.txt /table/text3/12347.txt
3 12348 12348.txt /table/text1/12348.txt
4 12349 12349.txt /table/text3/12349.txt
5 12350 12350.txt /table/text1/12350.txt
编辑
如果在df2
中没有匹配条目的情况下仍想显示df1
的条目,可以使用带参数how='left'
的左合并,如下:
df_merge = df1.merge(df2.assign(Filename=df2['Path'].str.split('/').str[-1]), on='Filename', how='left')
结果:
如果路径 /table/text3/12347.txt 丢失:
ID Filename Path
0 12345 12345.txt /table/text1/12345.txt
1 12346 12346.txt /table/text2/12346.txt
2 12347 12347.txt NaN
3 12348 12348.txt /table/text1/12348.txt
4 12349 12349.txt /table/text3/12349.txt
5 12350 12350.txt /table/text1/12350.txt