如果匹配,如何匹配两个时间列并打印值?
How to match two time columns and print value if they match?
我运行头靠在墙上。希望有人能帮忙。
我在 R 中有一个聚合数据框 (d1),其中包含一个时间列和一个包含二进制值的列。时间列没有统一的时间步长。
d1:
Time Set
1: 2015-01-03 14:55:00 0
2: 2015-01-06 14:20:00 1
3: 2015-01-06 14:25:00 1
4: 2015-01-06 14:30:00 1
5: 2015-01-06 14:35:00 1
6: 2015-01-06 14:40:00 1
7: 2015-01-06 14:45:00 0
8: 2015-01-06 16:10:00 1
9: 2015-01-07 07:45:00 0
10: 2015-01-07 08:00:00 1
11: 2015-01-07 08:05:00 1
12: 2015-01-07 08:45:00 0
我还有一个数据框 (d2),其中一列具有统一的时间步长,因此 d2 中的行数比 d1 中的长
d2:
Time_Ideal
1: 2015-01-09 14:05:00
2: 2015-01-09 14:10:00
3: 2015-01-09 14:15:00
4: 2015-01-09 14:20:00
5: 2015-01-09 14:25:00
6: 2015-01-09 14:30:00
7: 2015-01-09 14:35:00
8: 2015-01-09 14:40:00
9: 2015-01-09 14:45:00
10: 2015-01-09 14:50:00
我想要做的是在 Time_Ideal 旁边打印设置值,其中 d1 和 d2 中的两个时间列中的时间值分别匹配。
我试过了
d1 <- data.table(d1, key = 'Time')
d2 <- data.table(d2, key = 'Time_Ideal')
d2[d1, nomatch=0]
d2[d1]
灵感来自 this SO post
但是我无法让它正常工作..
可能不是最好的解决方案,但我认为它可行:
library(plyr)
d3 <- d2
colnames(d3) <- c("Time")
d4 <- join(d3, d1)
for(i in 2:length(d4$Set)){
if(is.na(d4$Set[i])){
d4$Set[i] <- d4$Set[i - 1]
}
}
也许用 dplyr?
library(dplyr)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
要填充 Set 的最后一个值,请使用:
library(dplyr)
library(zoo)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time")) %>%
mutate(Set = na.locf(d3$Set, na.rm = FALSE))
测试:
输入数据
没有关于使用的日期时间类型的提示。我在下面使用 POSIXct:
d1 <-
structure(list(Time = structure(c(1420293300, 1420550400, 1420550700,
1420551000, 1420551300, 1420551600, 1420551900, 1420557000, 1420613100,
1420614000, 1420614300, 1420616700), class = c("POSIXct", "POSIXt"
), tzone = ""),
Set = c(0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
1L, 0L)), row.names = c(NA, -12L), .Names = c("Time", "Set"),
class = "data.frame")
d2 <-
structure(list(Time_Ideal = structure(c(1420808700, 1420809000,
1420809300, 1420809600, 1420809900, 1420810200, 1420810500, 1420810800,
1420811100, 1420811400), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, -10L), .Names = "Time_Ideal",
class = "data.frame")
执行加入 #1
没有日期交集(d1次
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 NA
5 2015-01-09 14:25:00 NA
6 2015-01-09 14:30:00 NA
7 2015-01-09 14:35:00 NA
8 2015-01-09 14:40:00 NA
9 2015-01-09 14:45:00 NA
10 2015-01-09 14:50:00 NA
执行联接#2(更正输入数据)
未来 3 天的 shift d1:
d1$Time <- d1$Time + 3600*24*3 # three days shift
再次执行
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 1
5 2015-01-09 14:25:00 1
6 2015-01-09 14:30:00 1
7 2015-01-09 14:35:00 1
8 2015-01-09 14:40:00 1
9 2015-01-09 14:45:00 0
10 2015-01-09 14:50:00 NA
这是解决此问题的 data.table
方法(因为这是实际问题)。使用@bergant 提供的修改后的数据(因为 OP 数据集不匹配),只需执行:
setkey(setDT(d1), Time) # `d2` doesn't have to be a `data.table`
d1[d2] # you can set `, nomatch = 0L` if you want to remove non-matches
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA
另一种(更好的)方法是通过引用修改 d2
。您必须先将 d2
转换为 data.table
和 key
setkey(setDT(d2), Time_Ideal)
d2[d1, Set := i.Set][] # `d2` was modified by reference.
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA
我运行头靠在墙上。希望有人能帮忙。
我在 R 中有一个聚合数据框 (d1),其中包含一个时间列和一个包含二进制值的列。时间列没有统一的时间步长。
d1:
Time Set
1: 2015-01-03 14:55:00 0
2: 2015-01-06 14:20:00 1
3: 2015-01-06 14:25:00 1
4: 2015-01-06 14:30:00 1
5: 2015-01-06 14:35:00 1
6: 2015-01-06 14:40:00 1
7: 2015-01-06 14:45:00 0
8: 2015-01-06 16:10:00 1
9: 2015-01-07 07:45:00 0
10: 2015-01-07 08:00:00 1
11: 2015-01-07 08:05:00 1
12: 2015-01-07 08:45:00 0
我还有一个数据框 (d2),其中一列具有统一的时间步长,因此 d2 中的行数比 d1 中的长
d2:
Time_Ideal
1: 2015-01-09 14:05:00
2: 2015-01-09 14:10:00
3: 2015-01-09 14:15:00
4: 2015-01-09 14:20:00
5: 2015-01-09 14:25:00
6: 2015-01-09 14:30:00
7: 2015-01-09 14:35:00
8: 2015-01-09 14:40:00
9: 2015-01-09 14:45:00
10: 2015-01-09 14:50:00
我想要做的是在 Time_Ideal 旁边打印设置值,其中 d1 和 d2 中的两个时间列中的时间值分别匹配。
我试过了
d1 <- data.table(d1, key = 'Time')
d2 <- data.table(d2, key = 'Time_Ideal')
d2[d1, nomatch=0]
d2[d1]
灵感来自 this SO post
但是我无法让它正常工作..
可能不是最好的解决方案,但我认为它可行:
library(plyr)
d3 <- d2
colnames(d3) <- c("Time")
d4 <- join(d3, d1)
for(i in 2:length(d4$Set)){
if(is.na(d4$Set[i])){
d4$Set[i] <- d4$Set[i - 1]
}
}
也许用 dplyr?
library(dplyr)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
要填充 Set 的最后一个值,请使用:
library(dplyr)
library(zoo)
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time")) %>%
mutate(Set = na.locf(d3$Set, na.rm = FALSE))
测试:
输入数据
没有关于使用的日期时间类型的提示。我在下面使用 POSIXct:
d1 <-
structure(list(Time = structure(c(1420293300, 1420550400, 1420550700,
1420551000, 1420551300, 1420551600, 1420551900, 1420557000, 1420613100,
1420614000, 1420614300, 1420616700), class = c("POSIXct", "POSIXt"
), tzone = ""),
Set = c(0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L,
1L, 0L)), row.names = c(NA, -12L), .Names = c("Time", "Set"),
class = "data.frame")
d2 <-
structure(list(Time_Ideal = structure(c(1420808700, 1420809000,
1420809300, 1420809600, 1420809900, 1420810200, 1420810500, 1420810800,
1420811100, 1420811400), class = c("POSIXct", "POSIXt"
), tzone = "")), row.names = c(NA, -10L), .Names = "Time_Ideal",
class = "data.frame")
执行加入 #1
没有日期交集(d1次 未来 3 天的 shift d1: 再次执行d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 NA
5 2015-01-09 14:25:00 NA
6 2015-01-09 14:30:00 NA
7 2015-01-09 14:35:00 NA
8 2015-01-09 14:40:00 NA
9 2015-01-09 14:45:00 NA
10 2015-01-09 14:50:00 NA
执行联接#2(更正输入数据)
d1$Time <- d1$Time + 3600*24*3 # three days shift
d2 %>%
left_join(d1, by = c("Time_Ideal" = "Time"))
Time_Ideal Set
1 2015-01-09 14:05:00 NA
2 2015-01-09 14:10:00 NA
3 2015-01-09 14:15:00 NA
4 2015-01-09 14:20:00 1
5 2015-01-09 14:25:00 1
6 2015-01-09 14:30:00 1
7 2015-01-09 14:35:00 1
8 2015-01-09 14:40:00 1
9 2015-01-09 14:45:00 0
10 2015-01-09 14:50:00 NA
这是解决此问题的 data.table
方法(因为这是实际问题)。使用@bergant 提供的修改后的数据(因为 OP 数据集不匹配),只需执行:
setkey(setDT(d1), Time) # `d2` doesn't have to be a `data.table`
d1[d2] # you can set `, nomatch = 0L` if you want to remove non-matches
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA
另一种(更好的)方法是通过引用修改 d2
。您必须先将 d2
转换为 data.table
和 key
setkey(setDT(d2), Time_Ideal)
d2[d1, Set := i.Set][] # `d2` was modified by reference.
# Time Set
# 1: 2015-01-09 15:05:00 NA
# 2: 2015-01-09 15:10:00 NA
# 3: 2015-01-09 15:15:00 NA
# 4: 2015-01-09 15:20:00 1
# 5: 2015-01-09 15:25:00 1
# 6: 2015-01-09 15:30:00 1
# 7: 2015-01-09 15:35:00 1
# 8: 2015-01-09 15:40:00 1
# 9: 2015-01-09 15:45:00 0
# 10: 2015-01-09 15:50:00 NA