投影坐标系中的核密度估计

Kernel Density estimation in projected coordinate system

我正在使用 Python 模块 sklearn 中的 核密度估计 。我的数据在 Geopandas GeoDataframe 中。目前,我正在地理坐标 (EPSG:4326) 中执行此操作。但是,我想使用 UTM (EPSG:25833) 中的投影坐标来执行此操作。当我将数据保留在 4326 时,KDE 工作,但是,当我将 GeoDataframe 重新投影到 25833 时,KDE 给出空输出。

示例取自此处:https://pygis.io/docs/e_summarize_vector.html#method-2-display-and-export-with-scikit-learn

import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.neighbors import KernelDensity

# County boundaries
# Source: https://opendata.mtc.ca.gov/datasets/san-francisco-bay-region-counties-clipped?geometry=-125.590%2C37.123%2C-119.152%2C38.640
counties = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_counties/sf_bay_counties.shp")


# Well locations
# Source: https://gis.data.ca.gov/datasets/3a3e681b894644a9a95f9815aeeeb57f_0?geometry=-123.143%2C36.405%2C-119.230%2C37.175
# Modified by author so that only the well locations within the counties and the surrounding 50 km were kept
wells = gpd.read_file("../_static/e_vector_shapefiles/sf_bay_wells_50km/sf_bay_wells_50km.shp")

# Set projection to WGS 84 and reproject data
proj_wgs = 4326
counties_wgs = counties.to_crs(proj_wgs)
wells_wgs = wells.to_crs(proj_wgs)

# Get X and Y coordinates of well points
x_sk = wells_wgs["geometry"].x
y_sk = wells_wgs["geometry"].y

# Get minimum and maximum coordinate values of well points
min_x_sk, min_y_sk, max_x_sk, max_y_sk = wells_wgs.total_bounds

# Create a cell mesh grid
# Horizontal and vertical cell counts should be the same
XX_sk, YY_sk = np.mgrid[min_x_sk:max_x_sk:100j, min_y_sk:max_y_sk:100j]

# Create 2-D array of the coordinates (paired) of each cell in the mesh grid
positions_sk = np.vstack([XX_sk.ravel(), YY_sk.ravel()]).T

# Create 2-D array of the coordinate values of the well points
Xtrain_sk = np.vstack([x_sk, y_sk]).T

# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 0.04, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')

# Fit kernel density estimator to wells coordinates
kde_sk.fit(Xtrain_sk)

# Evaluate the estimator on coordinate pairs
Z_sk = np.exp(kde_sk.score_samples(positions_sk))

# Reshape the data to fit mesh grid
Z_sk = Z_sk.reshape(XX_sk.shape)

fig, ax = plt.subplots(1, 1, figsize = (10, 10))
ax.imshow(np.rot90(Z_sk), cmap = "RdPu", extent = [min_x_sk, max_x_sk, min_y_sk, max_y_sk])
ax.plot(x_sk, y_sk, 'k.', markersize = 2, alpha = 0.1)
counties_wgs.plot(ax = ax, color = 'none', edgecolor = 'dimgray')
ax.set_title('San Francisco Bay Area - SciKit-Learn Kernel Density Estimation for Wells', fontdict = {'fontsize': '15', 'fontweight' : '3'})
plt.show()

这行得通。但是,当我设置 proj_wgs = 25833 结果为空。

如何从投影坐标中的 sklearn 模块执行 KDE?

我 cross-posted 这个 skitlearn GitHub Page, beause I assumed this is too specific for Whosebug. I got the following response from cmarm:

当您从地理坐标转换为投影坐标时,坐标变化的比例也会发生变化。 在您的示例中,地理坐标覆盖大约十分之一度,投影坐标覆盖数十万米。 内核估计的带宽特别受坐标变化的影响,它与密度估计的分辨率有关(例如,参见文档中的 1D 直方图示例)。 为了在这两种情况下获得相似的结果,应增加带宽,同时考虑线性坐标和投影坐标之间的关系。在地球上,这意味着您的带宽应该增加 ~10^5 倍。

# Get kernel density estimator (can change parameters as desired)
kde_sk = KernelDensity(bandwidth = 4000, metric = 'euclidean', kernel = 'gaussian', algorithm = 'auto')

也就是说,避免这种依赖关系的更好方法是对球形问题使用正确的度量:Haversine metric is available in scikit-learn. This is how a similar problem is solved in the related example.

全部归功于cmarm