如何正确地重新投影具有多个几何列的数据框?
How do I correctly reproject a geodataframe with multiple geometry colums?
在the geopandas documentation中说
A GeoDataFrame
may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use the set_geometry
method.
我想知道如果目标是灵活地重新投影这些不同列中的几何数据到一个或多个其他坐标参考系统,如何使用这样的 GeoDataFrame。这是我尝试过的。
第一次尝试
import geopandas as gpd
from shapely.geometry import Point
crs_lonlat = 'epsg:4326' #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395' #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)
gdf['geom1'] = [Point(9,53), Point(9,54)]
gdf['geom2'] = [Point(8,63), Point(8,64)]
#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new) #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8 63)
1 POINT (1001875.417 7135562.568) POINT (8 64)
gdf.crs
Out: 'epsg:3395'
到目前为止,还不错。如果我想将 geom2
设置为几何列,并重新投影该列,那么 rails 就会消失:
#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2') #still in crs_lonlat...
gdf.crs #...but this still says crs_new...
Out: 'epsg:3395'
gdf = gdf.to_crs(crs_new) #...so this doesn't do anything! (geom2 unchanged)
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8.00000 63.00000)
1 POINT (1001875.417 7135562.568) POINT (8.00000 64.00000)
好吧,很明显,当更改用作几何的列时,gdf
的 .crs
属性不会重置为其原始值 - 似乎没有存储 crs对于各个列。如果是这种情况,我看到对该数据框使用重投影的唯一方法是回溯:开始 --> select column as geometry --> reproject gdf to crs_new --> use/visualize/... --> 将 gdf 重新投影回 crs_lonlat --> 开始。如果我想在一个图中显示两列,这将不可用。
第二次尝试
我的第二次尝试是,通过将上面脚本中的相应行更改为:
来分别存储每一列的 crs
gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)
然而,很快就清楚了,虽然初始化为 GeoSeries
,但这些列是正常的 pandas
Series
,并且没有 .crs
属性同样的方式 GeoSeries
做:
gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'
s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'
我在这里遗漏了什么吗?
是唯一的解决方案,预先决定 'final' crs - 并在添加列之前进行所有重新投影?像这样...
gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/
...然后,当需要另一个 crs 时,从头开始重建整个 gdf
?这不可能是预期的使用方式。
遗憾的是,目前无法实现所需的行为。由于包的限制,geopandas
目前不支持此用例,如 this issue in the github repo.
中所示
我的解决方法是根本不使用 GeoDataFrame
,而是将正常的 pandas
DataFrame
与多个单独的 geopandas
GeoSeries
,为匀称的几何数据。 GeoSeries
每个都有自己的 crs,并且可以在必要时正确地重新投影。
在the geopandas documentation中说
A
GeoDataFrame
may also contain other columns with geometrical (shapely) objects, but only one column can be the active geometry at a time. To change which column is the active geometry column, use theset_geometry
method.
我想知道如果目标是灵活地重新投影这些不同列中的几何数据到一个或多个其他坐标参考系统,如何使用这样的 GeoDataFrame。这是我尝试过的。
第一次尝试
import geopandas as gpd
from shapely.geometry import Point
crs_lonlat = 'epsg:4326' #geometries entered in this crs (lon, lat in degrees)
crs_new = 'epsg:3395' #geometries needed in (among others) this crs
gdf = gpd.GeoDataFrame(crs=crs_lonlat)
gdf['geom1'] = [Point(9,53), Point(9,54)]
gdf['geom2'] = [Point(8,63), Point(8,64)]
#Working: setting geometry and reprojecting for first time.
gdf = gdf.set_geometry('geom1')
gdf = gdf.to_crs(crs_new) #geom1 is reprojected to crs_new, geom2 still in crs_lonlat
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8 63)
1 POINT (1001875.417 7135562.568) POINT (8 64)
gdf.crs
Out: 'epsg:3395'
到目前为止,还不错。如果我想将 geom2
设置为几何列,并重新投影该列,那么 rails 就会消失:
#Not working: setting geometry and reprojecting for second time.
gdf = gdf.set_geometry('geom2') #still in crs_lonlat...
gdf.crs #...but this still says crs_new...
Out: 'epsg:3395'
gdf = gdf.to_crs(crs_new) #...so this doesn't do anything! (geom2 unchanged)
gdf
Out:
geom1 geom2
0 POINT (1001875.417 6948849.385) POINT (8.00000 63.00000)
1 POINT (1001875.417 7135562.568) POINT (8.00000 64.00000)
好吧,很明显,当更改用作几何的列时,gdf
的 .crs
属性不会重置为其原始值 - 似乎没有存储 crs对于各个列。如果是这种情况,我看到对该数据框使用重投影的唯一方法是回溯:开始 --> select column as geometry --> reproject gdf to crs_new --> use/visualize/... --> 将 gdf 重新投影回 crs_lonlat --> 开始。如果我想在一个图中显示两列,这将不可用。
第二次尝试
我的第二次尝试是,通过将上面脚本中的相应行更改为:
来分别存储每一列的crs
gdf = gpd.GeoDataFrame()
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat)
然而,很快就清楚了,虽然初始化为 GeoSeries
,但这些列是正常的 pandas
Series
,并且没有 .crs
属性同样的方式 GeoSeries
做:
gdf['geom1'].crs
AttributeError: 'Series' object has no attribute 'crs'
s = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat)
s.crs
Out: 'epsg:4326'
我在这里遗漏了什么吗?
是唯一的解决方案,预先决定 'final' crs - 并在添加列之前进行所有重新投影?像这样...
gdf = gpd.GeoDataFrame(crs=crs_new)
gdf['geom1'] = gpd.GeoSeries([Point(9,53), Point(9,54)], crs=crs_lonlat).to_crs(crs_new)
gdf['geom2'] = gpd.GeoSeries([Point(8,63), Point(8,64)], crs=crs_lonlat).to_crs(crs_new)
#no more reprojecting done/necessary/possible! :/
...然后,当需要另一个 crs 时,从头开始重建整个 gdf
?这不可能是预期的使用方式。
遗憾的是,目前无法实现所需的行为。由于包的限制,geopandas
目前不支持此用例,如 this issue in the github repo.
我的解决方法是根本不使用 GeoDataFrame
,而是将正常的 pandas
DataFrame
与多个单独的 geopandas
GeoSeries
,为匀称的几何数据。 GeoSeries
每个都有自己的 crs,并且可以在必要时正确地重新投影。