比较R中的2个不同数据集时找到具有相同值的点的最近邻居
Find nearest neighbour of points with the same value when comparing 2 different data sets in R
我有 2 个包含三列的数据框(df1 和 df2); x 坐标、y 坐标、类别(具有 5 个级别 A-E)。所以我基本上有 2 组点数据,每个点都分配给一个类别
例如
X Y Cat
1 1.5 A
2 1.5 B
3.3 1.9 C
等...
(尽管我的两个数据框都有 100 个点)
我想为我的第一个数据帧 (df1) 中的每个点从第二个数据帧 (df2) 中找到同一类别的最近邻居。
我已经使用包 spatstat 中的 nncross 为 df1 和 df2 中的每个点找到最近的邻居,然后列出这些距离中的每一个,如下所示;
# Convert the dataframes to ppp objects
df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)
# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2
out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))
但我正在努力弄清楚如何使用存储在 ppp 对象中的类别标签(由标记定义)从同一类别中找到最近的邻居。我相信它应该是相当直截了当的,但如果有人有任何建议或任何替代方法来达到同样的结果,我将非常感激。
首先要使用一些人工数据:
library(spatstat)
# Artificial data similar to the question
set.seed(42)
X1 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
X2 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
然后是一个简单的解决方案(但它丢失了 id 信息):
# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)
# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}
最后,恢复邻居 ID 的丑陋解决方案:
# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)
# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))
# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)
# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
rslt$dist[i] <- nnd[i, type[i]]
rslt$which[i] <- nnw[i, type[i]]
}
# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)
我还尝试解决这个问题,但是通过使用包 geosphere 从我的原始数据帧创建距离矩阵,我找到了一个非常简单的方法来解决这个问题。
# load geosphere library
library("geosphere")
#create a distance matrix between all points in the 2 dataframes
dist<-distm(df1[,c('X','Y')],df2[,c('X','Y')])
# find the nearest neighbour to each point
df1$nearestneighbor <- apply(dist,1,min)
# create a distance matrix where only the distances between points of the same category are recorded
sameCat <- outer(df1$Cat, df2$Cat, "!=")
dist2 <- dist + ifelse(sameCat, Inf, 0)
# find the nearest neighbour of the same category
df1$closestmatch <- apply(dist2,1,min)
我有 2 个包含三列的数据框(df1 和 df2); x 坐标、y 坐标、类别(具有 5 个级别 A-E)。所以我基本上有 2 组点数据,每个点都分配给一个类别
例如
X Y Cat
1 1.5 A
2 1.5 B
3.3 1.9 C
等... (尽管我的两个数据框都有 100 个点)
我想为我的第一个数据帧 (df1) 中的每个点从第二个数据帧 (df2) 中找到同一类别的最近邻居。
我已经使用包 spatstat 中的 nncross 为 df1 和 df2 中的每个点找到最近的邻居,然后列出这些距离中的每一个,如下所示;
# Convert the dataframes to ppp objects
df1.ppp <- ppp(df1$X,df1$Y,c(0,10),c(0,10),marks=df1$Cat)
df2.ppp <- ppp(df2$X,df2$Y,c(0,10),c(0,10),marks=df2$Cat)
# Produce anfrom output that lists the distance from each point in df1 to its nearest neighbour in df2
out<-nncross(X=df1.ppp,Y=df2.ppp,what=c("dist","which"))
但我正在努力弄清楚如何使用存储在 ppp 对象中的类别标签(由标记定义)从同一类别中找到最近的邻居。我相信它应该是相当直截了当的,但如果有人有任何建议或任何替代方法来达到同样的结果,我将非常感激。
首先要使用一些人工数据:
library(spatstat)
# Artificial data similar to the question
set.seed(42)
X1 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
X2 <- rmpoint(100, win = square(10), types = factor(LETTERS[1:5]))
然后是一个简单的解决方案(但它丢失了 id 信息):
# Separate patterns for each type:
X1list <- split(X1)
X2list <- split(X2)
# For each point in X1 find nearest neighbour of same type in X2:
out <- list()
for(i in 1:5){
out[[i]] <- nncross(X1list[[i]], X2list[[i]], what=c("dist","which"))
}
最后,恢复邻居 ID 的丑陋解决方案:
# Make separate marks for pattern 1 and 2 and collect into one pattern
marks(X1) <- factor(paste0(marks(X1), "1"))
marks(X2) <- factor(paste0(marks(X2), "2"))
X <- superimpose(X1, X2)
# For each point get the nearest neighbour of each type from both X1 and X2
# (both dist and index)
nnd <- nndist(X, by = marks(X))
nnw <- nnwhich(X, by = marks(X))
# Type to look for. I.e. the mark with 1 and 2 swapped
# (with 0 as intermediate step)
type <- marks(X)
type <- gsub("1", "0", type)
type <- gsub("2", "1", type)
type <- gsub("0", "2", type)
# Result
rslt <- cbind(as.data.frame(X), dist = 0, which = 0)
for(i in 1:nrow(rslt)){
rslt$dist[i] <- nnd[i, type[i]]
rslt$which[i] <- nnw[i, type[i]]
}
# Separate results
rslt1 <- rslt[1:npoints(X1),]
rslt2 <- rslt[npoints(X1) + 1:npoints(X2),]
rslt1$which <- rslt1$which - npoints(X1)
我还尝试解决这个问题,但是通过使用包 geosphere 从我的原始数据帧创建距离矩阵,我找到了一个非常简单的方法来解决这个问题。
# load geosphere library
library("geosphere")
#create a distance matrix between all points in the 2 dataframes
dist<-distm(df1[,c('X','Y')],df2[,c('X','Y')])
# find the nearest neighbour to each point
df1$nearestneighbor <- apply(dist,1,min)
# create a distance matrix where only the distances between points of the same category are recorded
sameCat <- outer(df1$Cat, df2$Cat, "!=")
dist2 <- dist + ifelse(sameCat, Inf, 0)
# find the nearest neighbour of the same category
df1$closestmatch <- apply(dist2,1,min)