找到最接近预测坐标的簇质心和 return 最近质心的簇

Find the cluster centroid closest to a predicted coordinate and return the cluster of the closest centroid

我正在预测纬度和经度坐标。例如,当我预测纬度坐标时,我想将此预测与另一个变量进行比较,该变量包含我为纬度和经度制作的集群的集群质心。我想要 return 最接近预测纬度坐标的簇质心的簇(我在另一个变量中有)。由于 Whosebug 上的另一个 post,我确实有正确的设置,但我没有得到正确的集群作为答案。谁能帮我看看我做错了什么?

我希望 'predclustertest' 变量包含属于 ClusterEndLatitudeCenter 的集群 (ClusterEnd),它最接近纬度预测 (predictions_test)

df <- dfTraining %>%
group_by(TripID) %>%
mutate(pred_cluster_test = case_when(ClusterEnd_LatitudeCenter == predictions_test ~
ClusterEnd[ClusterEnd_LatitudeCenter],TRUE ~ ClusterEnd[sapply(ClusterEnd_LatitudeCenter,
function(x) which.min(x - predictions_test))]))

数据是这样的:

structure(list(EndLatitude = c(38.26, 38.218, 38.255, 38.258, 
38.213, 38.215), EndLongitude = c(-85.75, -85.754, -85.746, -85.751, 
-85.751, -85.757), ClusterEnd = c(1, 4, 1, 5, 4, 4), ClusterEnd_LatitudeCenter = c(38.25629, 
38.21723, 38.25629, 38.25322, 38.21723, 38.21723), ClusterEnd_LongitudeCenter = c(-85.74133, 
-85.75955, -85.74133, -85.75783, -85.75955, -85.75955), predictions_test = c(`1` = 38.2407296518939, 
`2` = 38.2326115950784, `3` = 38.2428487622735, `4` = 38.2449069816005, 
`5` = 38.234314694847, `6` = 38.2347388488934), pred_cluster_test = c(38.25629, 
38.21723, 38.25629, 38.25322, 38.21723, 38.21723)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

如果我正确理解以下内容可能会起作用:

library(dplyr)

foo <- function(x, cluster_coords) {
  mat <- cbind(x, cluster_coords)
  distance <- apply(mat, MARGIN = 1, FUN = dist, method = "euclidean")
  which.min(distance)
}

df %>% 
  mutate(
    cluster_pred_test = ClusterEnd[
    sapply(
      predictions_test,
      function(x) foo(x, ClusterEnd_LatitudeCenter)
      )
    ]
  ) %>%
  pull(cluster_pred_test)
[1] 5 4 5 5 4 4

您可能想要编辑它以包括您的坐标,并查看 dplyr::group_mapdplyr::group_modify 函数,它们可以帮助您实现高效的分组操作。