R vlookup基于最接近的数字变量

R vlookup based on the closest numeric variable

我想在 R 中做类似于 vlookup 的事情,使用数字变量作为基础。

示例查找 table:

> Value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
> Code <- c("A","B","C","D","E","F","G","H","I")
> Lookup_Table <- data.frame(Value, Code)
> Lookup_Table
  Value Code
1   1.0    A
2   1.5    B
3   2.0    C
4   2.5    D
5   3.0    E
6   3.5    F
7   4.0    G
8   4.5    H
9   5.0    I

样本数据table:

> DataSample <- c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8)
> DataSample <- data.frame(DataSample)
> DataSample
  DataSample
1        1.2
2        1.0
3        2.3
4        2.7
5        3.1
6        3.0
7        4.6
8        4.5
9        3.8

因此,我想从这个 DataSample 值根据查找 table 值匹配相应的 Code。例如,如果我的值是 1.2,我想将它四舍五入到查找中最接近的值 table,即 1.5。所以我希望得到1.5.

对应的代码

我想要的输出是:

> DataSample
  DataSample Code
1        1.2    B
2        1.0    A
3        2.3    D
4        2.7    E
5        3.1    F
6        3.0    E
7        4.6    I
8        4.5    H
9        3.8    G

这里我用data.table来:

  1. 在查找中创建间隔 table
  2. 应用foverlaps函数进行合并
value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
code <- c("A","B","C","D","E","F","G","H","I")
Lookup_Table <- data.frame(value, code)
setDT(Lookup_Table)

Lookup_Table <- Lookup_Table[order(value)]
Lookup_Table[, previous.value := shift(value)]
Lookup_Table[, next.value := shift(value, type = "lead")]
Lookup_Table[, start := (previous.value + value) / 2]
Lookup_Table[, end := (next.value + value) / 2]
Lookup_Table[is.na(start), start := value]
Lookup_Table[is.na(end), end := value]
Lookup_Table <- Lookup_Table[, .(start, end, value, code)]
setkey(Lookup_Table, start, end)

DataSample <- data.frame(value = c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8))
setDT(DataSample)
DataSample[, start := value]
DataSample[, end := value]
DataSample <- DataSample[, .(start, end, value)]
setkey(DataSample, start, end)


res <- foverlaps(
  DataSample, 
  Lookup_Table, 
  by.x = c("start", "end"),
  by.y = c("start", "end")
)

res <- res[, .(value = i.value, code)]

> res
#   value code
# 1:   1.0    A
# 2:   1.2    A
# 3:   2.3    D
# 4:   2.7    D
# 5:   3.0    E
# 6:   3.1    E
# 7:   3.8    G
# 8:   4.5    H
# 9:   4.6    H

结果略有不同,您可能想了解一下范围的定义和应用方式

基础 R 方法可以像这样使用 findInterval

DataSample$Code <- with(Lookup_Table, 
                        Code[findInterval(DataSample$DataSample, Value, left.open = T) + 1]) 

输出

  DataSample Code
1        1.2    B
2        1.0    A
3        2.3    D
4        2.7    E
5        3.1    F
6        3.0    E
7        4.6    I
8        4.5    H
9        3.8    G

data.table 选项 non-equi join

setorder(
  setDT(Lookup_Table),
  "Value"
)[setDT(DataSample),
  on = .(Value >= DataSample)
][
  ,
  .(Code = first(Code)), .(DataSample = Value)
]

这给出了

   DataSample Code
1:        1.2    B
2:        1.0    A
3:        2.3    D
4:        2.7    E
5:        3.1    F
6:        3.0    E
7:        4.6    I
8:        4.5    H
9:        3.8    G