R vlookup基于最接近的数字变量
R vlookup based on the closest numeric variable
我想在 R 中做类似于 vlookup
的事情,使用数字变量作为基础。
示例查找 table:
> Value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
> Code <- c("A","B","C","D","E","F","G","H","I")
> Lookup_Table <- data.frame(Value, Code)
> Lookup_Table
Value Code
1 1.0 A
2 1.5 B
3 2.0 C
4 2.5 D
5 3.0 E
6 3.5 F
7 4.0 G
8 4.5 H
9 5.0 I
样本数据table:
> DataSample <- c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8)
> DataSample <- data.frame(DataSample)
> DataSample
DataSample
1 1.2
2 1.0
3 2.3
4 2.7
5 3.1
6 3.0
7 4.6
8 4.5
9 3.8
因此,我想从这个 DataSample
值根据查找 table 值匹配相应的 Code
。例如,如果我的值是 1.2
,我想将它四舍五入到查找中最接近的值 table,即 1.5
。所以我希望得到1.5
.
对应的代码
我想要的输出是:
> DataSample
DataSample Code
1 1.2 B
2 1.0 A
3 2.3 D
4 2.7 E
5 3.1 F
6 3.0 E
7 4.6 I
8 4.5 H
9 3.8 G
这里我用data.table
来:
- 在查找中创建间隔 table
- 应用
foverlaps
函数进行合并
value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
code <- c("A","B","C","D","E","F","G","H","I")
Lookup_Table <- data.frame(value, code)
setDT(Lookup_Table)
Lookup_Table <- Lookup_Table[order(value)]
Lookup_Table[, previous.value := shift(value)]
Lookup_Table[, next.value := shift(value, type = "lead")]
Lookup_Table[, start := (previous.value + value) / 2]
Lookup_Table[, end := (next.value + value) / 2]
Lookup_Table[is.na(start), start := value]
Lookup_Table[is.na(end), end := value]
Lookup_Table <- Lookup_Table[, .(start, end, value, code)]
setkey(Lookup_Table, start, end)
DataSample <- data.frame(value = c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8))
setDT(DataSample)
DataSample[, start := value]
DataSample[, end := value]
DataSample <- DataSample[, .(start, end, value)]
setkey(DataSample, start, end)
res <- foverlaps(
DataSample,
Lookup_Table,
by.x = c("start", "end"),
by.y = c("start", "end")
)
res <- res[, .(value = i.value, code)]
> res
# value code
# 1: 1.0 A
# 2: 1.2 A
# 3: 2.3 D
# 4: 2.7 D
# 5: 3.0 E
# 6: 3.1 E
# 7: 3.8 G
# 8: 4.5 H
# 9: 4.6 H
结果略有不同,您可能想了解一下范围的定义和应用方式
基础 R 方法可以像这样使用 findInterval
:
DataSample$Code <- with(Lookup_Table,
Code[findInterval(DataSample$DataSample, Value, left.open = T) + 1])
输出
DataSample Code
1 1.2 B
2 1.0 A
3 2.3 D
4 2.7 E
5 3.1 F
6 3.0 E
7 4.6 I
8 4.5 H
9 3.8 G
data.table
选项 non-equi join
setorder(
setDT(Lookup_Table),
"Value"
)[setDT(DataSample),
on = .(Value >= DataSample)
][
,
.(Code = first(Code)), .(DataSample = Value)
]
这给出了
DataSample Code
1: 1.2 B
2: 1.0 A
3: 2.3 D
4: 2.7 E
5: 3.1 F
6: 3.0 E
7: 4.6 I
8: 4.5 H
9: 3.8 G
我想在 R 中做类似于 vlookup
的事情,使用数字变量作为基础。
示例查找 table:
> Value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
> Code <- c("A","B","C","D","E","F","G","H","I")
> Lookup_Table <- data.frame(Value, Code)
> Lookup_Table
Value Code
1 1.0 A
2 1.5 B
3 2.0 C
4 2.5 D
5 3.0 E
6 3.5 F
7 4.0 G
8 4.5 H
9 5.0 I
样本数据table:
> DataSample <- c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8)
> DataSample <- data.frame(DataSample)
> DataSample
DataSample
1 1.2
2 1.0
3 2.3
4 2.7
5 3.1
6 3.0
7 4.6
8 4.5
9 3.8
因此,我想从这个 DataSample
值根据查找 table 值匹配相应的 Code
。例如,如果我的值是 1.2
,我想将它四舍五入到查找中最接近的值 table,即 1.5
。所以我希望得到1.5
.
我想要的输出是:
> DataSample
DataSample Code
1 1.2 B
2 1.0 A
3 2.3 D
4 2.7 E
5 3.1 F
6 3.0 E
7 4.6 I
8 4.5 H
9 3.8 G
这里我用data.table
来:
- 在查找中创建间隔 table
- 应用
foverlaps
函数进行合并
value <- c(1,1.5,2,2.5,3,3.5,4,4.5,5)
code <- c("A","B","C","D","E","F","G","H","I")
Lookup_Table <- data.frame(value, code)
setDT(Lookup_Table)
Lookup_Table <- Lookup_Table[order(value)]
Lookup_Table[, previous.value := shift(value)]
Lookup_Table[, next.value := shift(value, type = "lead")]
Lookup_Table[, start := (previous.value + value) / 2]
Lookup_Table[, end := (next.value + value) / 2]
Lookup_Table[is.na(start), start := value]
Lookup_Table[is.na(end), end := value]
Lookup_Table <- Lookup_Table[, .(start, end, value, code)]
setkey(Lookup_Table, start, end)
DataSample <- data.frame(value = c(1.2,1,2.3,2.7,3.1,3,4.6,4.5,3.8))
setDT(DataSample)
DataSample[, start := value]
DataSample[, end := value]
DataSample <- DataSample[, .(start, end, value)]
setkey(DataSample, start, end)
res <- foverlaps(
DataSample,
Lookup_Table,
by.x = c("start", "end"),
by.y = c("start", "end")
)
res <- res[, .(value = i.value, code)]
> res
# value code
# 1: 1.0 A
# 2: 1.2 A
# 3: 2.3 D
# 4: 2.7 D
# 5: 3.0 E
# 6: 3.1 E
# 7: 3.8 G
# 8: 4.5 H
# 9: 4.6 H
结果略有不同,您可能想了解一下范围的定义和应用方式
基础 R 方法可以像这样使用 findInterval
:
DataSample$Code <- with(Lookup_Table,
Code[findInterval(DataSample$DataSample, Value, left.open = T) + 1])
输出
DataSample Code
1 1.2 B
2 1.0 A
3 2.3 D
4 2.7 E
5 3.1 F
6 3.0 E
7 4.6 I
8 4.5 H
9 3.8 G
data.table
选项 non-equi join
setorder(
setDT(Lookup_Table),
"Value"
)[setDT(DataSample),
on = .(Value >= DataSample)
][
,
.(Code = first(Code)), .(DataSample = Value)
]
这给出了
DataSample Code
1: 1.2 B
2: 1.0 A
3: 2.3 D
4: 2.7 E
5: 3.1 F
6: 3.0 E
7: 4.6 I
8: 4.5 H
9: 3.8 G