比较具有公差区间的四个数值向量并报告公共值
Compare four numeric vectors with a tolerance interval and report common values
我有四个不等长的大向量。下面我提供了一个类似于我的原始数据集的玩具数据集:
a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
c <- c(52,21,1021.9288,12019.12, 879.1)
d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
有没有一种方法可以将所有这些向量与匹配的允许阈值 ±0.5 一个一个地进行比较?换句话说,我想报告所有四个向量中共有的数字,同时 允许漂移 0.5。
以上述玩具数据集为例,最终答案为:
Match1
a 1021.923
b 1021.900
c 1021.929
d 1021.950
我知道这对于两个向量是可行的,但是对于 4 个向量我该如何做呢?
相关
这是一个 data.table 解决方案。
它可扩展到 n 个向量,所以请尝试尽可能多地喂养它。当多个值在所有向量中具有 'hits' 时,它也表现良好。
示例数据
a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
c <- c(52,21,1021.9288,12019.12, 879.1)
d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
代码
library(data.table)
#create list with vectors
l <- list( a,b,c,d )
names(l) <- letters[1:4]
#create data.table to work with
DT <- rbindlist( lapply(l, function(x) {data.table( value = x)} ), idcol = "group")
#add margins to each value
DT[, `:=`( id = 1:.N, start = value - 0.5, end = value + 0.5 ) ]
#set keys for joining
setkey(DT, start, end)
#perform overlap-join
result <- foverlaps(DT,DT)
#cast, to check how the 'hits' each id has in each group (a,b,c,d)
answer <- dcast( result,
group + value ~ i.group,
fun.aggregate = function(x){ x * 1 },
value.var = "i.value",
fill = NA )
#get your final answer
#set columns to look at (i.e. the names from the earlier created list)
cols = names(l)
#keep the rows without NA (use rowSums, because TRUE = 1, FALSE = 0 )
#so if rowSums == 0, then columns in the vactor 'cols' do not contain a 'NA'
answer[ rowSums( is.na( answer[ , ..cols ] ) ) == 0, ]
输出
# group value a b c d
# 1: a 1021.923 1021.923 1021.9 1021.929 1021.95
# 2: b 1021.900 1021.923 1021.9 1021.929 1021.95
# 3: c 1021.929 1021.923 1021.9 1021.929 1021.95
# 4: d 1021.950 1021.923 1021.9 1021.929 1021.95
我有四个不等长的大向量。下面我提供了一个类似于我的原始数据集的玩具数据集:
a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
c <- c(52,21,1021.9288,12019.12, 879.1)
d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
有没有一种方法可以将所有这些向量与匹配的允许阈值 ±0.5 一个一个地进行比较?换句话说,我想报告所有四个向量中共有的数字,同时 允许漂移 0.5。
以上述玩具数据集为例,最终答案为:
Match1
a 1021.923
b 1021.900
c 1021.929
d 1021.950
我知道这对于两个向量是可行的,但是对于 4 个向量我该如何做呢?
相关
这是一个 data.table 解决方案。
它可扩展到 n 个向量,所以请尝试尽可能多地喂养它。当多个值在所有向量中具有 'hits' 时,它也表现良好。
示例数据
a <- c(1021.923, 3491.31, 102.3, 12019.11, 879.2, 583.1)
b <- c(21,32,523,123.1,123.4545,12345,95.434, 879.25, 1021.9,11,12,662)
c <- c(52,21,1021.9288,12019.12, 879.1)
d <- c(432.432,23466.3,45435,3456,123,6688,1021.95)
代码
library(data.table)
#create list with vectors
l <- list( a,b,c,d )
names(l) <- letters[1:4]
#create data.table to work with
DT <- rbindlist( lapply(l, function(x) {data.table( value = x)} ), idcol = "group")
#add margins to each value
DT[, `:=`( id = 1:.N, start = value - 0.5, end = value + 0.5 ) ]
#set keys for joining
setkey(DT, start, end)
#perform overlap-join
result <- foverlaps(DT,DT)
#cast, to check how the 'hits' each id has in each group (a,b,c,d)
answer <- dcast( result,
group + value ~ i.group,
fun.aggregate = function(x){ x * 1 },
value.var = "i.value",
fill = NA )
#get your final answer
#set columns to look at (i.e. the names from the earlier created list)
cols = names(l)
#keep the rows without NA (use rowSums, because TRUE = 1, FALSE = 0 )
#so if rowSums == 0, then columns in the vactor 'cols' do not contain a 'NA'
answer[ rowSums( is.na( answer[ , ..cols ] ) ) == 0, ]
输出
# group value a b c d
# 1: a 1021.923 1021.923 1021.9 1021.929 1021.95
# 2: b 1021.900 1021.923 1021.9 1021.929 1021.95
# 3: c 1021.929 1021.923 1021.9 1021.929 1021.95
# 4: d 1021.950 1021.923 1021.9 1021.929 1021.95