data.table: 按小时增加 IDateTime 以滞后每小时数据
data.table: increase IDateTime by hour to lag hourly data
如何将data.table
的idate和itime增加一小时?
我想按照 Christoph_J 的 here 所述延迟我的数据。
我的数据是这样的
> dt
idate itime windgeschwindigkeit
1: 1958-02-01 00:00:00 -0.9049475
2: 1958-02-01 01:00:00 -0.9049475
3: 1958-02-01 02:00:00 -0.9049475
4: 1958-02-01 03:00:00 -1.0049475
5: 1958-02-01 04:00:00 -2.0049475
---
498020: 2014-11-24 19:00:00 -1.0852256
498021: 2014-11-24 20:00:00 -0.7852256
498022: 2014-11-24 21:00:00 -0.8852256
498023: 2014-11-24 22:00:00 -1.0852256
498024: 2014-11-24 23:00:00 -1.3852256
我试着用上面提到的 SO-answer 中的代码来延迟它,如下所示:
setkeyv(dt, c("idate","itime"))
m_col = "windgeschwindigkeit"
pm_col = parse(text="windgeschwindigkeit")
lagg = 1
dt[, paste0(m_col,"_",lagg) :=
dt[list(idate,itime+lagg*3600), eval(pm_col), roll=-1]]
这会产生预期的输出:
一个滞后一小时的新列。 但是(见下文)
> dt
idate itime windgeschwindigkeit windgeschwindigkeit_1
1: 1958-02-01 00:00:00 -0.9049475 -0.9049475
2: 1958-02-01 01:00:00 -0.9049475 -0.9049475
3: 1958-02-01 02:00:00 -0.9049475 -1.0049475
4: 1958-02-01 03:00:00 -1.0049475 -2.0049475
5: 1958-02-01 04:00:00 -2.0049475 -2.0049475
---
498020: 2014-11-24 19:00:00 -1.0852256 -0.7852256
498021: 2014-11-24 20:00:00 -0.7852256 -0.8852256
498022: 2014-11-24 21:00:00 -0.8852256 -1.0852256
498023: 2014-11-24 22:00:00 -1.0852256 -1.3852256
498024: 2014-11-24 23:00:00 -1.3852256 NA
但是随着 list(idate,itime+lagg*3600)
的增加,所有 24 的倍数的行现在都是 NA
itime 从 0:23 到 1:24 的小时和数据表无法匹配 itime
的小时 24 到任何结果。
> dt[c(24,48)]
idate itime windgeschwindigkeit windgeschwindigkeit_1
1: 1958-02-01 23:00:00 0.5950525 NA
2: 1958-02-02 23:00:00 4.0939842 NA
有什么办法可以解决这个问题,例如将 idate 和 itime 增加 1 小时?
非常感谢任何帮助。
我设法用 "work-around" 和 as.POSIXct
做到了,但效率不高:
setkeyv(dt, c("idate","itime"))
m_col = "windgeschwindigkeit"
pm_col = parse(text="windgeschwindigkeit")
lagg = 1
new_time <- dt[,IDateTime(as.POSIXct(idate)+itime+lagg*3600)]
dt[, paste0(m_col,"_",lagg) :=
dt[new_time, eval(pm_col), roll=-1]]
我数据头部的dput:
structure(list(idate = structure(c(-4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L), class = c("IDate", "Date")),
itime = structure(c(0L, 3600L, 7200L, 10800L, 14400L, 18000L,
21600L, 25200L, 28800L, 32400L, 36000L, 39600L, 43200L, 46800L,
50400L, 54000L, 57600L, 61200L, 64800L, 68400L, 72000L, 75600L,
79200L, 82800L, 0L, 3600L, 7200L, 10800L, 14400L, 18000L,
21600L, 25200L, 28800L, 32400L, 36000L, 39600L, 43200L, 46800L,
50400L, 54000L, 57600L, 61200L, 64800L, 68400L, 72000L, 75600L,
79200L, 82800L), class = "ITime"), windgeschwindigkeit = c(-0.904947510665982,
-0.904947510665982, -0.904947510665982, -1.00494751066598,
-2.00494751066598, -2.00494751066598, -2.90494751066598,
-2.50494751066598, -2.50494751066598, -1.40494751066598,
-1.50494751066598, -1.30494751066598, -1.00494751066598,
-0.704947510665983, -0.504947510665983, -0.504947510665983,
-0.204947510665982, -0.104947510665983, 0.0950524893340177,
1.09505248933402, 0.195052489334017, -0.204947510665982,
0.0950524893340177, 0.595052489334018, 1.79398421777773,
2.99398421777773, 3.39398421777773, 3.29398421777773, 2.99398421777773,
2.89398421777773, 1.89398421777773, 0.593984217777727, 0.293984217777727,
-0.706015782222273, -0.706015782222273, -0.806015782222273,
-0.406015782222273, 0.893984217777727, -0.206015782222273,
-0.606015782222273, -0.00601578222227328, 0.693984217777727,
1.29398421777773, 2.49398421777773, 3.79398421777773, 4.29398421777773,
3.99398421777773, 4.09398421777773)), .Names = c("idate",
"itime", "windgeschwindigkeit"), row.names = c(NA, -48L), class = c("data.table",
"data.frame"), sorted = c("idate", "itime"))
我刚刚推送了能够生成 lead/lag 个多周期向量的函数 shift()
。它总是 returns 一个列表。参见 this issue. Although to use it, you'd need v1.9.5, which is the current development version - Installation instructions here。
这样,IIUC,你想做的事情可以按如下方式完成:
require(data.table) ## v1.9.5+
dt[, lead_1 := shift(windgeschwindigkeit, 1L, type="lead"), by=.(idate)]
这是假设对应于 idate
的 itime
列的顺序都是正确的。如果没有,你可以这样做:
dt[order(idate, itime), lead_1 := shift(windgeschwindigkeit, 1L, type="lead"), by=.(idate)]
如何将data.table
的idate和itime增加一小时?
我想按照 Christoph_J 的 here 所述延迟我的数据。
我的数据是这样的
> dt
idate itime windgeschwindigkeit
1: 1958-02-01 00:00:00 -0.9049475
2: 1958-02-01 01:00:00 -0.9049475
3: 1958-02-01 02:00:00 -0.9049475
4: 1958-02-01 03:00:00 -1.0049475
5: 1958-02-01 04:00:00 -2.0049475
---
498020: 2014-11-24 19:00:00 -1.0852256
498021: 2014-11-24 20:00:00 -0.7852256
498022: 2014-11-24 21:00:00 -0.8852256
498023: 2014-11-24 22:00:00 -1.0852256
498024: 2014-11-24 23:00:00 -1.3852256
我试着用上面提到的 SO-answer 中的代码来延迟它,如下所示:
setkeyv(dt, c("idate","itime"))
m_col = "windgeschwindigkeit"
pm_col = parse(text="windgeschwindigkeit")
lagg = 1
dt[, paste0(m_col,"_",lagg) :=
dt[list(idate,itime+lagg*3600), eval(pm_col), roll=-1]]
这会产生预期的输出:
一个滞后一小时的新列。 但是(见下文)
> dt
idate itime windgeschwindigkeit windgeschwindigkeit_1
1: 1958-02-01 00:00:00 -0.9049475 -0.9049475
2: 1958-02-01 01:00:00 -0.9049475 -0.9049475
3: 1958-02-01 02:00:00 -0.9049475 -1.0049475
4: 1958-02-01 03:00:00 -1.0049475 -2.0049475
5: 1958-02-01 04:00:00 -2.0049475 -2.0049475
---
498020: 2014-11-24 19:00:00 -1.0852256 -0.7852256
498021: 2014-11-24 20:00:00 -0.7852256 -0.8852256
498022: 2014-11-24 21:00:00 -0.8852256 -1.0852256
498023: 2014-11-24 22:00:00 -1.0852256 -1.3852256
498024: 2014-11-24 23:00:00 -1.3852256 NA
但是随着 list(idate,itime+lagg*3600)
的增加,所有 24 的倍数的行现在都是 NA
itime 从 0:23 到 1:24 的小时和数据表无法匹配 itime
的小时 24 到任何结果。
> dt[c(24,48)]
idate itime windgeschwindigkeit windgeschwindigkeit_1
1: 1958-02-01 23:00:00 0.5950525 NA
2: 1958-02-02 23:00:00 4.0939842 NA
有什么办法可以解决这个问题,例如将 idate 和 itime 增加 1 小时? 非常感谢任何帮助。
我设法用 "work-around" 和 as.POSIXct
做到了,但效率不高:
setkeyv(dt, c("idate","itime"))
m_col = "windgeschwindigkeit"
pm_col = parse(text="windgeschwindigkeit")
lagg = 1
new_time <- dt[,IDateTime(as.POSIXct(idate)+itime+lagg*3600)]
dt[, paste0(m_col,"_",lagg) :=
dt[new_time, eval(pm_col), roll=-1]]
我数据头部的dput:
structure(list(idate = structure(c(-4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L, -4352L,
-4352L, -4352L, -4352L, -4352L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L, -4351L,
-4351L, -4351L, -4351L, -4351L), class = c("IDate", "Date")),
itime = structure(c(0L, 3600L, 7200L, 10800L, 14400L, 18000L,
21600L, 25200L, 28800L, 32400L, 36000L, 39600L, 43200L, 46800L,
50400L, 54000L, 57600L, 61200L, 64800L, 68400L, 72000L, 75600L,
79200L, 82800L, 0L, 3600L, 7200L, 10800L, 14400L, 18000L,
21600L, 25200L, 28800L, 32400L, 36000L, 39600L, 43200L, 46800L,
50400L, 54000L, 57600L, 61200L, 64800L, 68400L, 72000L, 75600L,
79200L, 82800L), class = "ITime"), windgeschwindigkeit = c(-0.904947510665982,
-0.904947510665982, -0.904947510665982, -1.00494751066598,
-2.00494751066598, -2.00494751066598, -2.90494751066598,
-2.50494751066598, -2.50494751066598, -1.40494751066598,
-1.50494751066598, -1.30494751066598, -1.00494751066598,
-0.704947510665983, -0.504947510665983, -0.504947510665983,
-0.204947510665982, -0.104947510665983, 0.0950524893340177,
1.09505248933402, 0.195052489334017, -0.204947510665982,
0.0950524893340177, 0.595052489334018, 1.79398421777773,
2.99398421777773, 3.39398421777773, 3.29398421777773, 2.99398421777773,
2.89398421777773, 1.89398421777773, 0.593984217777727, 0.293984217777727,
-0.706015782222273, -0.706015782222273, -0.806015782222273,
-0.406015782222273, 0.893984217777727, -0.206015782222273,
-0.606015782222273, -0.00601578222227328, 0.693984217777727,
1.29398421777773, 2.49398421777773, 3.79398421777773, 4.29398421777773,
3.99398421777773, 4.09398421777773)), .Names = c("idate",
"itime", "windgeschwindigkeit"), row.names = c(NA, -48L), class = c("data.table",
"data.frame"), sorted = c("idate", "itime"))
我刚刚推送了能够生成 lead/lag 个多周期向量的函数 shift()
。它总是 returns 一个列表。参见 this issue. Although to use it, you'd need v1.9.5, which is the current development version - Installation instructions here。
这样,IIUC,你想做的事情可以按如下方式完成:
require(data.table) ## v1.9.5+
dt[, lead_1 := shift(windgeschwindigkeit, 1L, type="lead"), by=.(idate)]
这是假设对应于 idate
的 itime
列的顺序都是正确的。如果没有,你可以这样做:
dt[order(idate, itime), lead_1 := shift(windgeschwindigkeit, 1L, type="lead"), by=.(idate)]