创建 GeoDataFrame 时什么时候需要使用 GeoSeries，什么时候列表就足够了？

Question

import geopandas as gpd
import matplotlib.pyplot as plt
from shapely.geometry import Polygon, Point
import numpy as np

我定义了一个多边形：

polygon = Polygon([(0,0),(0,1),(1,1),(1,0)])

并创建一个随机点列表：

np.random.seed(42)
points = [Point([np.random.uniform(low=-1,high=1),
                 np.random.uniform(low=-1,high=1)]) for _ in range(1000)]

我想知道多边形内有哪些点。我通过首先将 points list 转换为 GeoSeries:

来创建一个带有名为 points 的列的 GeoDataFrame

gdf = gpd.GeoDataFrame(dict(points=gpd.GeoSeries(points)))

然后简单地做：

gdf.points.within(polygon)

其中 returns 个 pandas.core.series.Series 布尔值，表示哪些点在多边形内。

但是，如果我不从 list 而不是 GeoSeries 对象创建 GeoDataFrame：

gdf = gpd.GeoDataFrame(dict(points=points))

然后做：

gdf.points.within(polygon)

我得到：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-171-831eddc859a1> in <module>()
----> 1 gdf.points.within(polygon)

/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py in __getattr__(self, name)
   5485         ):
   5486             return self[name]
-> 5487         return object.__getattribute__(self, name)
   5488 
   5489     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'within'

在 geopandas.GeoDataFrame 页面上给出的示例中，GeoDataFrame 是从 list 创建的，而不是 shapely.geometry.Point 个对象的 GeoSeries：

from shapely.geometry import Point
d = {'col1': ['name1', 'name2'], 'geometry': [Point(1, 2), Point(2, 1)]}
gdf = geopandas.GeoDataFrame(d, crs="EPSG:4326")

我什么时候需要先将我的 list 转换为 GeoSeries，以及什么时候可以在创建 [=19= 时将它们保留为 list ]s?

Answer 1

在 the docs for geopandas.GeoDataFrame 上，在你的示例中，有一个小注释：

Notice that the inferred dtype of ‘geometry’ columns is geometry.

可以看到here，也可以自己观察一下：

>>> import geopandas as gpd

>>> gpd.GeoDataFrame({'geometry': [Point(0,0)]}).dtypes
geometry    geometry
dtype: object

>>> gpd.GeoDataFrame({'geometryXXX': [Point(0,0)]}).dtypes
geometryXXX    object
dtype: object

来自 the docs for geopandas.GeoSeries:

A Series object designed to store shapely geometry objects.

...所以它会尝试将其创建的对象转换为 geometry dtype 是有道理的。事实上，当您尝试使用 non-shapely 个对象创建 GeoSeries 时，您会收到警告：

>>> gpd.GeoSeries([1,2,3])
<ipython-input-53-ca5248fcdaf8>:1: FutureWarning:     You are passing non-geometry data to the GeoSeries constructor. Currently,
    it falls back to returning a pandas Series. But in the future, we will start
    to raise a TypeError instead.
  gpd.GeoSeries([1,2,3])

...正如警告所说，将来会成为错误。

由于您不是创建 GeoSeries 对象（您使用的是列表），并且由于 列不是 geometry，GeoDataFrame 使其 dtype 成为最通用的数据类型，它可以将其中的对象转换为 - object。因此，由于该列是数据类型 object 而不是 geometry，因此您不能调用 geometry 特定的方法，例如 within.

如果您需要使用列表，您有两个简单的选择。

方法1.将geometry=关键字参数传递给GeoDataFrame()：

>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]}, geometry='points')
>>> gdf['points'].dtypes
<geopandas.array.GeometryDtype at 0x12882a1c0>
>>> gdf['points'].within
<bound method GeoPandasBase.within of 0    POINT (0.00000 0.00000)
1    POINT (0.00000 1.00000)
Name: points, dtype: geometry>

方法 2. 使用 astype 就像处理普通数据帧一样：

>>> gdf = gpd.GeoDataFrame({'points': [Point(0,0), Point(0,1)]})
>>> gdf['points'].dtype
dtype('O')
>>> gdf['points'].within
...
AttributeError: 'Series' object has no attribute 'within'

>>> gdf['points'] = gdf['points'].astype('geometry')
>>> gdf['points'].dtype
<geopandas.array.GeometryDtype at 0x122189e20>
>>> gdf['points'].within
<bound method GeoPandasBase.within of 0    POINT (0.00000 0.00000)
1    POINT (0.00000 1.00000)
Name: points, dtype: geometry>

创建 GeoDataFrame 时什么时候需要使用 GeoSeries，什么时候列表就足够了？

When do I need to use a GeoSeries when creating a GeoDataFrame, and when is a list enough?

python

dataframe

pandas

shapely

geopandas