根据最接近的时间戳加入R中的两个数据帧
Join two data frames in R based on closest timestamp
您好,我有两个表(下面的表 1 和表 2),我想根据最接近的时间戳将它们连接起来形成 expected_output。如果可能的话,涉及 dplyr 的某种解决方案会很好,但如果它使事情进一步复杂化就不是了。
table1 =
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L,
33L, 53L)), .Names = c("date", "val1"), row.names = c(NA, -3L
), class = "data.frame")
table2 =
structure(list(date = structure(c(1430248288, 1435690482, 1434050843
), class = c("POSIXct", "POSIXt"), tzone = ""), val2 = c(67L,
90L, 18L)), .Names = c("date", "val2"), row.names = c(NA, -3L
), class = "data.frame")
expected_output =
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L,
33L, 53L), val2 = c(90L, 18L, 67L)), .Names = c("date", "val1",
"val2"), row.names = c(NA, -3L), class = "data.frame")
这可能会很慢,但是...
d <- function(x,y) abs(x-y) # define the distance function
idx <- sapply( table1$date, function(x) which.min( d(x,table2$date) )) # find matches
cbind(table1,table2[idx,-1,drop=FALSE])
# date val1 val2
# 2 2015-07-16 08:55:00 94 90
# 3 2015-06-20 00:15:00 33 18
# 1 2015-05-12 15:00:00 53 67
构造idx
的另一种方法是max.col(-outer(table1$date, table2$date, d))
。
使用 data.table
与 roll = "nearest"
的滚动连接功能:
require(data.table) # v1.9.6+
setDT(table1)[, val2 := setDT(table2)[table1, val2, on = "date", roll = "nearest"]]
此处,val2
列是通过使用 roll = "nearest"
选项对 date
列执行 join 创建的。对于 table1$date
的每一行,计算来自 table2$date
最接近的匹配行,并提取相应行的 val2
。
您好,我有两个表(下面的表 1 和表 2),我想根据最接近的时间戳将它们连接起来形成 expected_output。如果可能的话,涉及 dplyr 的某种解决方案会很好,但如果它使事情进一步复杂化就不是了。
table1 =
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L,
33L, 53L)), .Names = c("date", "val1"), row.names = c(NA, -3L
), class = "data.frame")
table2 =
structure(list(date = structure(c(1430248288, 1435690482, 1434050843
), class = c("POSIXct", "POSIXt"), tzone = ""), val2 = c(67L,
90L, 18L)), .Names = c("date", "val2"), row.names = c(NA, -3L
), class = "data.frame")
expected_output =
structure(list(date = structure(c(1437051300, 1434773700, 1431457200
), class = c("POSIXct", "POSIXt"), tzone = ""), val1 = c(94L,
33L, 53L), val2 = c(90L, 18L, 67L)), .Names = c("date", "val1",
"val2"), row.names = c(NA, -3L), class = "data.frame")
这可能会很慢,但是...
d <- function(x,y) abs(x-y) # define the distance function
idx <- sapply( table1$date, function(x) which.min( d(x,table2$date) )) # find matches
cbind(table1,table2[idx,-1,drop=FALSE])
# date val1 val2
# 2 2015-07-16 08:55:00 94 90
# 3 2015-06-20 00:15:00 33 18
# 1 2015-05-12 15:00:00 53 67
构造idx
的另一种方法是max.col(-outer(table1$date, table2$date, d))
。
使用 data.table
与 roll = "nearest"
的滚动连接功能:
require(data.table) # v1.9.6+
setDT(table1)[, val2 := setDT(table2)[table1, val2, on = "date", roll = "nearest"]]
此处,val2
列是通过使用 roll = "nearest"
选项对 date
列执行 join 创建的。对于 table1$date
的每一行,计算来自 table2$date
最接近的匹配行,并提取相应行的 val2
。