Python：将 KMeans 质心转换为 Shapefile，用于土地覆盖分析中的像素分类

Question

我正在尝试使用 KMeans 质心对 label/clump 像素进行土地覆盖分析。我希望只使用 sklearn 和 matplotlib 来做到这一点。目前我的代码如下所示：

kmeans.fit(band_5)
centroids = kmeans.cluster_centers_
plt.scatter(centroids[:, 0], centroids[:, 1])

band_5 的形状是 (713, 1163)，但从散点图中我可以看出质心坐标的值远远超过该形状。

根据我的理解，KMeans 提供的质心需要转换为正确的坐标，然后转换为 shapefile，然后在监督过程中将其用于 label/clump 像素。

如何将这些质心转换为正确的坐标，然后导出到 shapefile？另外，我需要创建一个 shapefile 吗？

我试图采用此 post 中的一些代码，但我无法让它工作。 http://scikit-learn.org/stable/auto_examples/cluster/plot_color_quantization.html#sphx-glr-auto-examples-cluster-plot-color-quantization-py

Answer 1

几点：

scikit-learn 需要列中的数据（想想电子表格中的 table），所以简单地传入一个代表栅格波段的数组实际上会尝试对数据进行分类，就好像你每个样本有 1163 个样本点和 713 个值（条带）。相反，您需要 flatten 数组，如果您在 ArcGIS 之类的东西中查看栅格，则 return 的 kmeans 将等同于栅格的分位数分类，其质心在范围内波段最小值到波段最大值（不在单元格坐标中）。

查看您提供的示例，他们有一个三波段 jpeg，将其重塑为三个长列：

image_array = np.reshape(china, (w * h, d))

如果您需要空间受限的像素，那么您有两种选择：选择连通性受限的聚类方法，例如 Agglomerative Clustering or Affinity Propagation，然后考虑将归一化的细胞坐标添加到您的样本集中，例如：

xs, ys = np.meshgrid(
    np.linspace(0, 1, 1163), # x
    np.linspace(0, 1, 713), # y
)
data_with_coordinates = np.column_stack([
    band_5.flatten(),
    xs.flatten(),
    ys.flatten()
])

# And on with the clustering

使用 scikit-learn 完成聚类后，假设您使用 fit_predict，您将按聚类为每个值返回一个标签，并且您可以重塑回乐队的原始形状绘制聚类结果。

labels = classifier.fit_predict(data_with_coordinates)
plt.imshow(labels.reshape(band_5.shape)

如果你已经标记了点，你真的需要簇质心吗？你在现实世界的空间坐标中需要它们吗？如果是，那么您需要查看 rasterio and the affine methods to transform from map coordinates to array coordinates and vice versa. And then look into fiona 以将点写入 shapefile。

Python：将 KMeans 质心转换为 Shapefile，用于土地覆盖分析中的像素分类

Python: Converting KMeans Centroids to Shapefile for Pixel Classification in Land Cover Analysis

python

matplotlib

geospatial

unsupervised-learning

scikit-learn