投影坐标系中的核密度估计

Question

我正在使用 Python 模块 sklearn 中的 核密度估计 。我的数据在 Geopandas GeoDataframe 中。目前，我正在地理坐标 (EPSG:4326) 中执行此操作。但是，我想使用 UTM (EPSG:25833) 中的投影坐标来执行此操作。当我将数据保留在 4326 时，KDE 工作，但是，当我将 GeoDataframe 重新投影到 25833 时，KDE 给出空输出。

示例取自此处：https://pygis.io/docs/e_summarize_vector.html#method-2-display-and-export-with-scikit-learn

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import KernelDensity

# County boundaries
# Source: https://opendata.mtc.ca.gov/datasets/san-francisco-bay-region-counties-clipped?geometry=-125.590%2C37.123%2C-119.152%2C38.640
counties = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_counties/sf_bay_counties.shp")


# Well locations
# Source: https://gis.data.ca.gov/datasets/3a3e681b894644a9a95f9815aeeeb57f_0?geometry=-123.143%2C36.405%2C-119.230%2C37.175
# Modified by author so that only the well locations within the counties and the surrounding 50 km were kept
wells = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_wells_50km/sf_bay_wells_50km.shp")

# Set projection to WGS 84 and reproject data
proj_wgs = 4326
counties_wgs = counties.to_crs(proj_wgs)
wells_wgs = wells.to_crs(proj_wgs)

# Get X and Y coordinates of well points
x_sk = wells_wgs["geometry"].x
y_sk = wells_wgs["geometry"].y

# Get minimum and maximum coordinate values of well points
min_x_sk, min_y_sk, max_x_sk, max_y_sk = wells_wgs.total_bounds

# Create a cell mesh grid
# Horizontal and vertical cell counts should be the same
XX_sk, YY_sk = np.mgrid[min_x_sk:max_x_sk:100j, min_y_sk:max_y_sk:100j]

# Create 2-D array of the coordinates (paired) of each cell in the mesh grid
positions_sk = np.vstack([XX_sk.ravel(), YY_sk.ravel()]).T

# Create 2-D array of the coordinate values of the well points
Xtrain_sk = np.vstack([x_sk, y_sk]).T

# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 0.04, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')

# Fit kernel density estimator to wells coordinates
kde_sk.fit(Xtrain_sk)

# Evaluate the estimator on coordinate pairs
Z_sk = np.exp(kde_sk.score_samples(positions_sk))

# Reshape the data to fit mesh grid
Z_sk = Z_sk.reshape(XX_sk.shape)

fig, ax = plt.subplots(1, 1, figsize = (10, 10))
ax.imshow(np.rot90(Z_sk), cmap = "RdPu", extent = [min_x_sk, max_x_sk, min_y_sk, max_y_sk])
ax.plot(x_sk, y_sk, 'k.', markersize = 2, alpha = 0.1)
counties_wgs.plot(ax = ax, color = 'none', edgecolor = 'dimgray')
ax.set_title('San Francisco Bay Area - SciKit-Learn Kernel Density Estimation for Wells', fontdict = {'fontsize': '15', 'fontweight' : '3'})
plt.show()

这行得通。但是，当我设置 proj_wgs = 25833 结果为空。

如何从投影坐标中的 sklearn 模块执行 KDE？

Answer 1

我 cross-posted 这个 skitlearn GitHub Page, beause I assumed this is too specific for Whosebug. I got the following response from cmarm:

当您从地理坐标转换为投影坐标时，坐标变化的比例也会发生变化。在您的示例中，地理坐标覆盖大约十分之一度，投影坐标覆盖数十万米。内核估计的带宽特别受坐标变化的影响，它与密度估计的分辨率有关（例如，参见文档中的 1D 直方图示例）。为了在这两种情况下获得相似的结果，应增加带宽，同时考虑线性坐标和投影坐标之间的关系。在地球上，这意味着您的带宽应该增加 ~10^5 倍。

# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 4000, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')

也就是说，避免这种依赖关系的更好方法是对球形问题使用正确的度量：Haversine metric is available in scikit-learn. This is how a similar problem is solved in the related example.

全部归功于cmarm

投影坐标系中的核密度估计

Kernel Density estimation in projected coordinate system

python

geospatial

kernel-density

scikit-learn

geopandas