›如何在不覆盖现有列数据的情况下连接 Python 中的表
›How to join tables in Python without overwriting existing column data
我需要加入多个 table,但我无法让 Python 中的加入按预期运行。我需要 left join table 2 到 table 1,而不覆盖 table 1 的“几何”列中的现有数据。我想要实现的有点像Excel 中的 VLOOKUP。我想从其他 tables (~10) 中提取匹配值到 table 1 中,而不覆盖已经存在的值。有没有更好的办法?以下是我的尝试:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1 | 123 | |
| 2 | 456 | |
| 3 | 789 | GHI |
TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |----- | -------- |
| 1 | 123 | 123 | ABC |
| 2 | 456 | 456 | DEF |
| 3 | | 789 | GHI |
What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |
| -- | ----- | -------- | --------- |
| 1 | 123 | ABC | |
| 2 | 456 | DEF | |
| 3 | 789 | | GHI |
join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"
当我尝试这样做时:
join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]
我明白了:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
你可以试试:
# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})
# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)
我需要加入多个 table,但我无法让 Python 中的加入按预期运行。我需要 left join table 2 到 table 1,而不覆盖 table 1 的“几何”列中的现有数据。我想要实现的有点像Excel 中的 VLOOKUP。我想从其他 tables (~10) 中提取匹配值到 table 1 中,而不覆盖已经存在的值。有没有更好的办法?以下是我的尝试:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1 | 123 | |
| 2 | 456 | |
| 3 | 789 | GHI |
TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |----- | -------- |
| 1 | 123 | 123 | ABC |
| 2 | 456 | 456 | DEF |
| 3 | | 789 | GHI |
What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |
| -- | ----- | -------- | --------- |
| 1 | 123 | ABC | |
| 2 | 456 | DEF | |
| 3 | 789 | | GHI |
join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"
当我尝试这样做时:
join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]
我明白了:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
你可以试试:
# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})
# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)