R - 如何从相异矩阵中找到最近的邻居?
R - How to find closest neighbours from dissimilarity matrix?
我有一个相异矩阵 (gower.dist),现在我想找到离某个数据点最近的 n 个邻居(例如,行号 50)。谁能帮帮我?
示例数据
https://towardsdatascience.com/hierarchical-clustering-on-categorical-data-in-r-a27e578f2995
#----- Dummy Data -----#
library(dplyr)
set.seed(40)
id.s <- c(1:200) %>%
factor()
budget.s <- sample(c("small", "med", "large"), 200, replace = T) %>%
factor(levels=c("small", "med", "large"),
ordered = TRUE)
origins.s <- sample(c("x", "y", "z"), 200, replace = T,
prob = c(0.7, 0.15, 0.15))
area.s <- sample(c("area1", "area2", "area3", "area4"), 200,
replace = T,
prob = c(0.3, 0.1, 0.5, 0.2))
source.s <- sample(c("facebook", "email", "link", "app"), 200,
replace = T,
prob = c(0.1,0.2, 0.3, 0.4))
dow.s <- sample(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), 200, replace = T,
prob = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2)) %>%
factor(levels=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"),
ordered = TRUE)
dish.s <- sample(c("delicious", "the one you don't like", "pizza"), 200, replace = T)
synthetic.customers <- data.frame(id.s, budget.s, origins.s, area.s, source.s, dow.s, dish.s)
#----- Dissimilarity Matrix -----#
library(cluster)
# to perform different types of hierarchical clustering
# package functions used: daisy(), diana(), clusplot()
gower.dist <- daisy(synthetic.customers[ ,2:7], metric = c("gower"))
假设您想要数据点 50 的 5 个最近邻居:
row_number <- 50
n <- 5
dists <- unname(as.matrix(gower.dist)[row_number,])
order(dists)[1:n]
# 50 83 112 60 75
的确,数字50最接近它自己,其余的是下一个最接近的4个值。
"trick" 是将你的对象转换为矩阵,提取你感兴趣的行,并使用基础 R order
函数找到该行中最小值的索引。
我有一个相异矩阵 (gower.dist),现在我想找到离某个数据点最近的 n 个邻居(例如,行号 50)。谁能帮帮我?
示例数据 https://towardsdatascience.com/hierarchical-clustering-on-categorical-data-in-r-a27e578f2995
#----- Dummy Data -----#
library(dplyr)
set.seed(40)
id.s <- c(1:200) %>%
factor()
budget.s <- sample(c("small", "med", "large"), 200, replace = T) %>%
factor(levels=c("small", "med", "large"),
ordered = TRUE)
origins.s <- sample(c("x", "y", "z"), 200, replace = T,
prob = c(0.7, 0.15, 0.15))
area.s <- sample(c("area1", "area2", "area3", "area4"), 200,
replace = T,
prob = c(0.3, 0.1, 0.5, 0.2))
source.s <- sample(c("facebook", "email", "link", "app"), 200,
replace = T,
prob = c(0.1,0.2, 0.3, 0.4))
dow.s <- sample(c("mon", "tue", "wed", "thu", "fri", "sat", "sun"), 200, replace = T,
prob = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2)) %>%
factor(levels=c("mon", "tue", "wed", "thu", "fri", "sat", "sun"),
ordered = TRUE)
dish.s <- sample(c("delicious", "the one you don't like", "pizza"), 200, replace = T)
synthetic.customers <- data.frame(id.s, budget.s, origins.s, area.s, source.s, dow.s, dish.s)
#----- Dissimilarity Matrix -----#
library(cluster)
# to perform different types of hierarchical clustering
# package functions used: daisy(), diana(), clusplot()
gower.dist <- daisy(synthetic.customers[ ,2:7], metric = c("gower"))
假设您想要数据点 50 的 5 个最近邻居:
row_number <- 50
n <- 5
dists <- unname(as.matrix(gower.dist)[row_number,])
order(dists)[1:n]
# 50 83 112 60 75
的确,数字50最接近它自己,其余的是下一个最接近的4个值。
"trick" 是将你的对象转换为矩阵,提取你感兴趣的行,并使用基础 R order
函数找到该行中最小值的索引。