找到最接近预测坐标的簇质心和 return 最近质心的簇
Find the cluster centroid closest to a predicted coordinate and return the cluster of the closest centroid
我正在预测纬度和经度坐标。例如,当我预测纬度坐标时,我想将此预测与另一个变量进行比较,该变量包含我为纬度和经度制作的集群的集群质心。我想要 return 最接近预测纬度坐标的簇质心的簇(我在另一个变量中有)。由于 Whosebug 上的另一个 post,我确实有正确的设置,但我没有得到正确的集群作为答案。谁能帮我看看我做错了什么?
我希望 'predclustertest' 变量包含属于 ClusterEndLatitudeCenter 的集群 (ClusterEnd),它最接近纬度预测 (predictions_test)
df <- dfTraining %>%
group_by(TripID) %>%
mutate(pred_cluster_test = case_when(ClusterEnd_LatitudeCenter == predictions_test ~
ClusterEnd[ClusterEnd_LatitudeCenter],TRUE ~ ClusterEnd[sapply(ClusterEnd_LatitudeCenter,
function(x) which.min(x - predictions_test))]))
数据是这样的:
structure(list(EndLatitude = c(38.26, 38.218, 38.255, 38.258,
38.213, 38.215), EndLongitude = c(-85.75, -85.754, -85.746, -85.751,
-85.751, -85.757), ClusterEnd = c(1, 4, 1, 5, 4, 4), ClusterEnd_LatitudeCenter = c(38.25629,
38.21723, 38.25629, 38.25322, 38.21723, 38.21723), ClusterEnd_LongitudeCenter = c(-85.74133,
-85.75955, -85.74133, -85.75783, -85.75955, -85.75955), predictions_test = c(`1` = 38.2407296518939,
`2` = 38.2326115950784, `3` = 38.2428487622735, `4` = 38.2449069816005,
`5` = 38.234314694847, `6` = 38.2347388488934), pred_cluster_test = c(38.25629,
38.21723, 38.25629, 38.25322, 38.21723, 38.21723)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
如果我正确理解以下内容可能会起作用:
library(dplyr)
foo <- function(x, cluster_coords) {
mat <- cbind(x, cluster_coords)
distance <- apply(mat, MARGIN = 1, FUN = dist, method = "euclidean")
which.min(distance)
}
df %>%
mutate(
cluster_pred_test = ClusterEnd[
sapply(
predictions_test,
function(x) foo(x, ClusterEnd_LatitudeCenter)
)
]
) %>%
pull(cluster_pred_test)
[1] 5 4 5 5 4 4
您可能想要编辑它以包括您的坐标,并查看 dplyr::group_map
和 dplyr::group_modify
函数,它们可以帮助您实现高效的分组操作。
我正在预测纬度和经度坐标。例如,当我预测纬度坐标时,我想将此预测与另一个变量进行比较,该变量包含我为纬度和经度制作的集群的集群质心。我想要 return 最接近预测纬度坐标的簇质心的簇(我在另一个变量中有)。由于 Whosebug 上的另一个 post,我确实有正确的设置,但我没有得到正确的集群作为答案。谁能帮我看看我做错了什么?
我希望 'predclustertest' 变量包含属于 ClusterEndLatitudeCenter 的集群 (ClusterEnd),它最接近纬度预测 (predictions_test)
df <- dfTraining %>%
group_by(TripID) %>%
mutate(pred_cluster_test = case_when(ClusterEnd_LatitudeCenter == predictions_test ~
ClusterEnd[ClusterEnd_LatitudeCenter],TRUE ~ ClusterEnd[sapply(ClusterEnd_LatitudeCenter,
function(x) which.min(x - predictions_test))]))
数据是这样的:
structure(list(EndLatitude = c(38.26, 38.218, 38.255, 38.258,
38.213, 38.215), EndLongitude = c(-85.75, -85.754, -85.746, -85.751,
-85.751, -85.757), ClusterEnd = c(1, 4, 1, 5, 4, 4), ClusterEnd_LatitudeCenter = c(38.25629,
38.21723, 38.25629, 38.25322, 38.21723, 38.21723), ClusterEnd_LongitudeCenter = c(-85.74133,
-85.75955, -85.74133, -85.75783, -85.75955, -85.75955), predictions_test = c(`1` = 38.2407296518939,
`2` = 38.2326115950784, `3` = 38.2428487622735, `4` = 38.2449069816005,
`5` = 38.234314694847, `6` = 38.2347388488934), pred_cluster_test = c(38.25629,
38.21723, 38.25629, 38.25322, 38.21723, 38.21723)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
如果我正确理解以下内容可能会起作用:
library(dplyr)
foo <- function(x, cluster_coords) {
mat <- cbind(x, cluster_coords)
distance <- apply(mat, MARGIN = 1, FUN = dist, method = "euclidean")
which.min(distance)
}
df %>%
mutate(
cluster_pred_test = ClusterEnd[
sapply(
predictions_test,
function(x) foo(x, ClusterEnd_LatitudeCenter)
)
]
) %>%
pull(cluster_pred_test)
[1] 5 4 5 5 4 4
您可能想要编辑它以包括您的坐标,并查看 dplyr::group_map
和 dplyr::group_modify
函数,它们可以帮助您实现高效的分组操作。