需要使用两列纬度和经度合并两个 pandas 数据框
Need to merge two pandas dataframe using two columns latitude and longitude
这是我的数据框#1:带有纬度和经度的城市名称
df1 = {"city":['delhi','new york','london','paris','chennai'],"lat":[12.23,22.444,23.233,45.32,34.22],"long":[11.22,22.332,34.23,55.23,24.22]
这是数据框#2:带纬度和经度的国家/地区名称
df2 = pd.DataFrame({"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
我需要匹配经纬度这两列来合并这两个表。问题是纬度和经度不完全匹配,值是 + 或 - 0.1 或 0.2。 (如果匹配我可以使用 pd.merge 选项)
纬度和经度在这里不是真实的。举个例子
预期结果:
result = pd.DataFrame({"city":['delhi','new york','london','paris','chennai'],"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
合并这些表的最佳方法是什么?
交叉合并示例:
(df1.assign(dummy=1)
.merge(df2.assign(dummy=1),on='dummy')
.query('abs(lat_x-lat_y)<=0.1 and abs(long_x-long_y)<=0.2')
.drop('dummy', axis=1)
)
输出:
city lat_x long_x country lat_y long_y
0 delhi 12.230 11.220 India 12.13 11.120
6 new york 22.444 22.332 US 22.54 22.132
24 chennai 34.220 24.220 India 34.22 24.220
Geopandas 可以用在这里。
如果你有国家边界作为多边形,你可以使用spacial joins。
在您的问题中,您将国家/地区缩减为单点,这可能不是最好的代表。
文档中的示例:
在空间连接中,两个几何对象根据彼此的空间关系合并。
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [11]: countries.head()
Out[11]:
geometry country
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America
In [12]: cities.head()
Out[12]:
name geometry
0 Vatican City POINT (12.453386545 41.903282180)
1 San Marino POINT (12.441770158 43.936095835)
2 Vaduz POINT (9.516669473 47.133723774)
3 Luxembourg POINT (6.130002806 49.611660379)
4 Palikir POINT (158.149974324 6.916643696)
# Execute spatial join
In [13]: cities_with_country = geopandas.sjoin(cities, countries, how="inner", op='intersects')
In [14]: cities_with_country.head()
Out[14]:
name geometry index_right country
0 Vatican City POINT (12.453386545 41.903282180) 141 Italy
1 San Marino POINT (12.441770158 43.936095835) 141 Italy
192 Rome POINT (12.481312563 41.897901485) 141 Italy
2 Vaduz POINT (9.516669473 47.133723774) 114 Austria
184 Vienna POINT (16.364693097 48.201961137) 114 Austria
如果您没有代表国家的多边形,则需要将代表每个国家的点延伸到一个区域。您可以使用 buffer method in Shapely 将点延伸到给定距离的区域来执行此操作:
Point(0, 0).buffer(10.0),
假设坐标 [0,0]
处的点和距离 10.0
.
这是我的数据框#1:带有纬度和经度的城市名称
df1 = {"city":['delhi','new york','london','paris','chennai'],"lat":[12.23,22.444,23.233,45.32,34.22],"long":[11.22,22.332,34.23,55.23,24.22]
这是数据框#2:带纬度和经度的国家/地区名称
df2 = pd.DataFrame({"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
我需要匹配经纬度这两列来合并这两个表。问题是纬度和经度不完全匹配,值是 + 或 - 0.1 或 0.2。 (如果匹配我可以使用 pd.merge 选项) 纬度和经度在这里不是真实的。举个例子
预期结果:
result = pd.DataFrame({"city":['delhi','new york','london','paris','chennai'],"country":['India','US','UK','France','India'],"lat":[12.13,22.54,22.33,45.32,34.22],"long":[11.12,22.132,34.23,54.23,24.22]})
合并这些表的最佳方法是什么?
交叉合并示例:
(df1.assign(dummy=1)
.merge(df2.assign(dummy=1),on='dummy')
.query('abs(lat_x-lat_y)<=0.1 and abs(long_x-long_y)<=0.2')
.drop('dummy', axis=1)
)
输出:
city lat_x long_x country lat_y long_y
0 delhi 12.230 11.220 India 12.13 11.120
6 new york 22.444 22.332 US 22.54 22.132
24 chennai 34.220 24.220 India 34.22 24.220
Geopandas 可以用在这里。
如果你有国家边界作为多边形,你可以使用spacial joins。
在您的问题中,您将国家/地区缩减为单点,这可能不是最好的代表。
文档中的示例:
在空间连接中,两个几何对象根据彼此的空间关系合并。
# One GeoDataFrame of countries, one of Cities.
# Want to merge so we can get each city's country.
In [11]: countries.head()
Out[11]:
geometry country
0 MULTIPOLYGON (((180.000000000 -16.067132664, 1... Fiji
1 POLYGON ((33.903711197 -0.950000000, 34.072620... Tanzania
2 POLYGON ((-8.665589565 27.656425890, -8.665124... W. Sahara
3 MULTIPOLYGON (((-122.840000000 49.000000000, -... Canada
4 MULTIPOLYGON (((-122.840000000 49.000000000, -... United States of America
In [12]: cities.head()
Out[12]:
name geometry
0 Vatican City POINT (12.453386545 41.903282180)
1 San Marino POINT (12.441770158 43.936095835)
2 Vaduz POINT (9.516669473 47.133723774)
3 Luxembourg POINT (6.130002806 49.611660379)
4 Palikir POINT (158.149974324 6.916643696)
# Execute spatial join
In [13]: cities_with_country = geopandas.sjoin(cities, countries, how="inner", op='intersects')
In [14]: cities_with_country.head()
Out[14]:
name geometry index_right country
0 Vatican City POINT (12.453386545 41.903282180) 141 Italy
1 San Marino POINT (12.441770158 43.936095835) 141 Italy
192 Rome POINT (12.481312563 41.897901485) 141 Italy
2 Vaduz POINT (9.516669473 47.133723774) 114 Austria
184 Vienna POINT (16.364693097 48.201961137) 114 Austria
如果您没有代表国家的多边形,则需要将代表每个国家的点延伸到一个区域。您可以使用 buffer method in Shapely 将点延伸到给定距离的区域来执行此操作:
Point(0, 0).buffer(10.0),
假设坐标 [0,0]
处的点和距离 10.0
.