Geopandas:不同的 .sjoin() 结果与不同的投影系统
Geopandas: different .sjoin() results with different projections systems
我试图 运行 资产列表和流域数据集之间的空间连接,您可以在下面的 link 中找到
https://datasets.wri.org/dataset/aqueduct-global-flood-risk-maps?msclkid=630fc948b63611ec9931936b22cf4990
第一种方法是加入 ESPG 4326 投影设置,效果很好。
rfd = r"C:\Users\~\aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
test = ['Unit 1', 'Unit 2' ]
test_lat = ['0.176095', '-24.193790']
test_lon = ['117.495523', '150.370650']
df = pd.DataFrame()
df['Name'] = test
df['Latitude'] = test_lat
df['Longitude'] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
gdf = gdf.set_crs('epsg:4326')
joined = gpd.sjoin(gdf, wri_rfr, how='inner')
len(joined )
这两个资产都有一个连接。
在第二种方法中,我尝试使用基于仪表的投影系统 (3006) 在我的资产周围创建一个 500 公吨的缓冲区,然后继续合并它们...但是 returns 没有结果?
rfd = r"C:\Users\~\aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
test = ['Unit 1', 'Unit 2' ]
test_lat = ['0.176095', '-24.193790']
test_lon = ['117.495523', '150.370650']
df = pd.DataFrame()
df['Name'] = test
df['Latitude'] = test_lat
df['Longitude'] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
gdf = gdf.set_crs('epsg:4326')
gdf = gdf.to_crs({'init': 'epsg:3006'})
gdf.geometry = gdf.geometry.buffer(500)
gdf = gdf.loc[gdf.is_valid]
wri_rfr_3006 = wri_rfr.to_crs({'init': 'epsg:3006'})
wri_rfr_3006 = wri_rfr_3006.loc[wri_rfr_3006.is_valid]
joined = gpd.sjoin(gdf, wri_rfr_3006 , how='inner')
len(joined )
它returns没有连接。
我在这里错过了什么?为什么结果会不同?
- 已编写形状文件的数据来源
- 查看文档 https://epsg.io/3006 这是针对瑞典的。因此,在瑞典以米表示时,婆罗洲和澳大利亚的位置将开始出现舍入误差
- 已采取计算每个点的 UTM CRS 的方法,对其进行缓冲,然后转换回 epsg:4386
- 缓冲点几何现在可以进行空间连接,因为尚未使用不适合全局几何的 CRS
test = ["Unit 1", "Unit 2"]
test_lat = ["0.176095", "-24.193790"]
test_lon = ["117.495523", "150.370650"]
df = pd.DataFrame()
df["Name"] = test
df["Latitude"] = test_lat
df["Longitude"] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["Longitude"], df["Latitude"]))
gdf = gdf.set_crs("epsg:4326")
# work out UTM CRS for each point, then buffer it and return back as original CRS
def buffer_meter(g, crs="epsg:6666", buffer=50):
t = gpd.GeoDataFrame(geometry=[g], crs=crs)
return t.to_crs(t.estimate_utm_crs()).buffer(buffer).to_crs(crs).values[0]
# buffer the points
gdf["geometry"] = gdf["geometry"].apply(buffer_meter, crs=gdf.crs, buffer=500)
# now join
gpd.sjoin(gdf, wri_rfr, how='inner')
数据来源
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import geopandas as gpd
import pandas as pd
# download data sets
urls = [
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/498319f7-992a-4447-94b4-c62d8f1daa38/download/aqueductglobalfloodriskdatabycountry20150304.zip",
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/471ef133-939c-4ca6-9b1c-5f81b5251c2b/download/aqueductglobalfloodriskdatabyriverbasin20150304.zip",
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/dd90c26a-edf2-46e4-be22-4273ab2344d0/download/aqueductglobalfloodriskdatabystate20150304.zip",
]
dfs = {}
for url in urls:
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
dfs[f.stem] = gpd.read_file(list(f.parent.joinpath(f.stem).glob("*.shp"))[0])
wri_rfr = dfs["aqueductglobalfloodriskdatabyriverbasin20150304"]
我试图 运行 资产列表和流域数据集之间的空间连接,您可以在下面的 link 中找到 https://datasets.wri.org/dataset/aqueduct-global-flood-risk-maps?msclkid=630fc948b63611ec9931936b22cf4990
第一种方法是加入 ESPG 4326 投影设置,效果很好。
rfd = r"C:\Users\~\aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
test = ['Unit 1', 'Unit 2' ]
test_lat = ['0.176095', '-24.193790']
test_lon = ['117.495523', '150.370650']
df = pd.DataFrame()
df['Name'] = test
df['Latitude'] = test_lat
df['Longitude'] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
gdf = gdf.set_crs('epsg:4326')
joined = gpd.sjoin(gdf, wri_rfr, how='inner')
len(joined )
这两个资产都有一个连接。
在第二种方法中,我尝试使用基于仪表的投影系统 (3006) 在我的资产周围创建一个 500 公吨的缓冲区,然后继续合并它们...但是 returns 没有结果?
rfd = r"C:\Users\~\aqueduct_global_flood_risk_data_by_river_basin_20150304.shp"
wri_rfr = gpd.read_file(rfd, crs='epsg:4326')
test = ['Unit 1', 'Unit 2' ]
test_lat = ['0.176095', '-24.193790']
test_lon = ['117.495523', '150.370650']
df = pd.DataFrame()
df['Name'] = test
df['Latitude'] = test_lat
df['Longitude'] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df['Longitude'], df['Latitude']))
gdf = gdf.set_crs('epsg:4326')
gdf = gdf.to_crs({'init': 'epsg:3006'})
gdf.geometry = gdf.geometry.buffer(500)
gdf = gdf.loc[gdf.is_valid]
wri_rfr_3006 = wri_rfr.to_crs({'init': 'epsg:3006'})
wri_rfr_3006 = wri_rfr_3006.loc[wri_rfr_3006.is_valid]
joined = gpd.sjoin(gdf, wri_rfr_3006 , how='inner')
len(joined )
它returns没有连接。
我在这里错过了什么?为什么结果会不同?
- 已编写形状文件的数据来源
- 查看文档 https://epsg.io/3006 这是针对瑞典的。因此,在瑞典以米表示时,婆罗洲和澳大利亚的位置将开始出现舍入误差
- 已采取计算每个点的 UTM CRS 的方法,对其进行缓冲,然后转换回 epsg:4386
- 缓冲点几何现在可以进行空间连接,因为尚未使用不适合全局几何的 CRS
test = ["Unit 1", "Unit 2"]
test_lat = ["0.176095", "-24.193790"]
test_lon = ["117.495523", "150.370650"]
df = pd.DataFrame()
df["Name"] = test
df["Latitude"] = test_lat
df["Longitude"] = test_lon
gdf = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["Longitude"], df["Latitude"]))
gdf = gdf.set_crs("epsg:4326")
# work out UTM CRS for each point, then buffer it and return back as original CRS
def buffer_meter(g, crs="epsg:6666", buffer=50):
t = gpd.GeoDataFrame(geometry=[g], crs=crs)
return t.to_crs(t.estimate_utm_crs()).buffer(buffer).to_crs(crs).values[0]
# buffer the points
gdf["geometry"] = gdf["geometry"].apply(buffer_meter, crs=gdf.crs, buffer=500)
# now join
gpd.sjoin(gdf, wri_rfr, how='inner')
数据来源
import requests
from pathlib import Path
from zipfile import ZipFile
import urllib
import geopandas as gpd
import pandas as pd
# download data sets
urls = [
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/498319f7-992a-4447-94b4-c62d8f1daa38/download/aqueductglobalfloodriskdatabycountry20150304.zip",
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/471ef133-939c-4ca6-9b1c-5f81b5251c2b/download/aqueductglobalfloodriskdatabyriverbasin20150304.zip",
"http://datasets.wri.org/dataset/c19396d9-45c8-4e92-bf05-d1411c9cc2ca/resource/dd90c26a-edf2-46e4-be22-4273ab2344d0/download/aqueductglobalfloodriskdatabystate20150304.zip",
]
dfs = {}
for url in urls:
f = Path.cwd().joinpath(urllib.parse.urlparse(url).path.split("/")[-1])
if not f.exists():
r = requests.get(url, stream=True, headers={"User-Agent": "XY"})
with open(f, "wb") as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
zfile = ZipFile(f)
zfile.extractall(f.stem)
dfs[f.stem] = gpd.read_file(list(f.parent.joinpath(f.stem).glob("*.shp"))[0])
wri_rfr = dfs["aqueductglobalfloodriskdatabyriverbasin20150304"]