来自 3d 数组 R 的 n 个第一个邻居列表
List of n first Neighbors from a 3d Array R
假设我们有一个 3d 数组:
my.array <- array(1:27, dim=c(3,3,3))
我想创建一个包含前 n 个邻居的列表。
示例:让我们得到 my.array[2,2,2]=14,因此 14 的第一个邻居是:
list[14] = [1 to 27] - 14
我也想使用 R、C 或 Matlab 对第二、第三、n 个最近的邻居执行相同的操作。
谢谢
我认为按照这些思路可以解决问题:
nClosest <- function(pts, pt, n)
{
# Get the target value
val <- pts[pt[1], pt[2], pt[3]]
# Turn the matrix into a DF
ptsDF <- adply(pts, 1:3)
# Create Dist column for distance to val
ptsDF$Dist <- abs(ptsDF$V1 - val)
# Order by the distance to val
ptsDF <- ptsDF[with(ptsDF, order(Dist)),]
# Split into groups:
sp <- split(ptsDF, ptsDF$Dist)
# Get max index
topInd = min(n+1, length(sp))
# Agg the split dfs into a single df
rbind.fill(sp[2:topInd])
}
输出:
> nClosest(my.array, c(1,2,2), 3)
X1 X2 X3 V1 Dist
1 3 1 2 12 1
2 2 2 2 14 1
3 2 1 2 11 2
4 3 2 2 15 2
5 1 1 2 10 3
6 1 3 2 16 3
根据评论,我假设您将 "first nearest neighbor" 定义为所有欧几里德距离等于或小于 1 的单元格(不包括自身),"second nearest neighbors" 定义为等于或小于 2 的单元格,等等。你在 的评论中断言 "for (1,1,1) the first level neighbors is 2,4,5,10,11,13",我实际上将其解释为包括直接对角线(距离为 1.414)但不包括更远的对角线(在您的示例中,14 将是距离为 1.732 的更远的对角线)。
此函数接受预定义数组 (ary
) 或构成数组的维度 (dims
)。
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 1 2 1
# [3,] 1 1 2
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1,
return_indices = FALSE)
# [1] 2 4 10
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 2 3 4 5 7 10 11 13 14 19
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 3 1 1
# [3,] 1 2 1
# [4,] 2 2 1
# [5,] 1 3 1
# [6,] 1 1 2
# [7,] 2 1 2
# [8,] 1 2 2
# [9,] 2 2 2
# [10,] 1 1 3
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 26 25 24 23 21 18 17 15 14 9
函数:
#' Find nearest neighbors.
#'
#' @param ary array
#' @param elem integer vector indicating the indices on array from
#' which all nearest neighbors will be found; must be the same
#' length as \code{dims} (or \code{dim(ary)}). Only one of
#' \code{ary} and \code{dim} needs to be provided.
#' @param dist numeric, the max distance from \code{elem}, not
#' including the 'self' point.
#' @param dims integer vector indicating the dimensions of the array.
#' Only one of \code{ary} and \code{dim} needs to be provided.
#' @param return_indices logical, whether to return a matrix of
#' indices (as many columns as dimensions) or the values from
#' \code{ary} of the nearest neighbors
#' @return either matrix of indices (one column per dimension) if
#' \code{return_indices == TRUE}, or the appropriate values in
#' \code{ary} otherwise.
nearestNeighbors <- function(ary, elem, dist, dims, return_indices = TRUE) {
if (missing(dims)) dims <- dim(ary)
tmpary <- array(1:prod(dims), dim = dims)
if (missing(ary)) ary <- tmpary
if (length(elem) != length(dims))
stop("'elem'' needs to have the same dimensions as 'ary'")
# work on a subset of the whole matrix
usedims <- mapply(function(el, d) {
seq(max(1, el - dist), min(d, el + dist))
}, elem, dims, SIMPLIFY=FALSE)
df <- as.matrix(do.call('expand.grid', usedims))
# now, df is only as big as we need to possibly satisfy `dist`
ndist <- sqrt(apply(df, 1, function(x) sum((x - elem)^2)))
ret <- df[which(ndist > 0 & ndist <= dist),,drop = FALSE]
if (return_indices) {
return(ret)
} else {
return(ary[ret])
}
}
编辑:更改代码以提高 "slight" 速度:使用 256x256x256 数组和 2 之前的距离在我的机器上花费了约 90 秒。现在只需不到 1 秒。即使是 5 的距离(相同的阵列)也需要不到一秒的时间。 未完全测试,请验证是否正确。
编辑:删除了函数第 50 行多余的 {。
假设我们有一个 3d 数组:
my.array <- array(1:27, dim=c(3,3,3))
我想创建一个包含前 n 个邻居的列表。
示例:让我们得到 my.array[2,2,2]=14,因此 14 的第一个邻居是:
list[14] = [1 to 27] - 14
我也想使用 R、C 或 Matlab 对第二、第三、n 个最近的邻居执行相同的操作。
谢谢
我认为按照这些思路可以解决问题:
nClosest <- function(pts, pt, n)
{
# Get the target value
val <- pts[pt[1], pt[2], pt[3]]
# Turn the matrix into a DF
ptsDF <- adply(pts, 1:3)
# Create Dist column for distance to val
ptsDF$Dist <- abs(ptsDF$V1 - val)
# Order by the distance to val
ptsDF <- ptsDF[with(ptsDF, order(Dist)),]
# Split into groups:
sp <- split(ptsDF, ptsDF$Dist)
# Get max index
topInd = min(n+1, length(sp))
# Agg the split dfs into a single df
rbind.fill(sp[2:topInd])
}
输出:
> nClosest(my.array, c(1,2,2), 3)
X1 X2 X3 V1 Dist
1 3 1 2 12 1
2 2 2 2 14 1
3 2 1 2 11 2
4 3 2 2 15 2
5 1 1 2 10 3
6 1 3 2 16 3
根据评论,我假设您将 "first nearest neighbor" 定义为所有欧几里德距离等于或小于 1 的单元格(不包括自身),"second nearest neighbors" 定义为等于或小于 2 的单元格,等等。你在
此函数接受预定义数组 (ary
) 或构成数组的维度 (dims
)。
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 1 2 1
# [3,] 1 1 2
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 1,
return_indices = FALSE)
# [1] 2 4 10
nearestNeighbors(dims = c(3,3,3), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 2 3 4 5 7 10 11 13 14 19
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2)
# dim1 dim2 dim3
# [1,] 2 1 1
# [2,] 3 1 1
# [3,] 1 2 1
# [4,] 2 2 1
# [5,] 1 3 1
# [6,] 1 1 2
# [7,] 2 1 2
# [8,] 1 2 2
# [9,] 2 2 2
# [10,] 1 1 3
nearestNeighbors(ary = array(27:1, dim = c(3,3,3)), elem = c(1,1,1), dist = 2,
return_indices = FALSE)
# [1] 26 25 24 23 21 18 17 15 14 9
函数:
#' Find nearest neighbors.
#'
#' @param ary array
#' @param elem integer vector indicating the indices on array from
#' which all nearest neighbors will be found; must be the same
#' length as \code{dims} (or \code{dim(ary)}). Only one of
#' \code{ary} and \code{dim} needs to be provided.
#' @param dist numeric, the max distance from \code{elem}, not
#' including the 'self' point.
#' @param dims integer vector indicating the dimensions of the array.
#' Only one of \code{ary} and \code{dim} needs to be provided.
#' @param return_indices logical, whether to return a matrix of
#' indices (as many columns as dimensions) or the values from
#' \code{ary} of the nearest neighbors
#' @return either matrix of indices (one column per dimension) if
#' \code{return_indices == TRUE}, or the appropriate values in
#' \code{ary} otherwise.
nearestNeighbors <- function(ary, elem, dist, dims, return_indices = TRUE) {
if (missing(dims)) dims <- dim(ary)
tmpary <- array(1:prod(dims), dim = dims)
if (missing(ary)) ary <- tmpary
if (length(elem) != length(dims))
stop("'elem'' needs to have the same dimensions as 'ary'")
# work on a subset of the whole matrix
usedims <- mapply(function(el, d) {
seq(max(1, el - dist), min(d, el + dist))
}, elem, dims, SIMPLIFY=FALSE)
df <- as.matrix(do.call('expand.grid', usedims))
# now, df is only as big as we need to possibly satisfy `dist`
ndist <- sqrt(apply(df, 1, function(x) sum((x - elem)^2)))
ret <- df[which(ndist > 0 & ndist <= dist),,drop = FALSE]
if (return_indices) {
return(ret)
} else {
return(ary[ret])
}
}
编辑:更改代码以提高 "slight" 速度:使用 256x256x256 数组和 2 之前的距离在我的机器上花费了约 90 秒。现在只需不到 1 秒。即使是 5 的距离(相同的阵列)也需要不到一秒的时间。 未完全测试,请验证是否正确。
编辑:删除了函数第 50 行多余的 {。