R:使用数据表聚合数据
R: Using a datatable to aggregate data
我刚开始使用数据表,需要一些帮助来汇总一些数据。
Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
859 04/02/2014 07:55 05/02/2014 15:37 10000 10000 2014 2 0 1
859 07/02/2014 03:16 07/02/2014 03:51 8960.755 8960.755 2014 2 0 2
859 11/02/2014 12:41 13/02/2014 11:56 13635.178 13606.901 2014 2 0 3
859 11/02/2014 13:34 11/02/2014 15:34 13635.178 13635.178 2014 2 13635.178 4
859 12/02/2014 13:46 14/02/2014 09:59 13660.246 13649.278 2014 2 13635.178 5
859 13/02/2014 15:33 13/02/2014 15:42 13606.901 13606.901 2014 2 13660.246 6
859 25/03/2014 14:52 26/03/2014 12:58 10000 10000 2014 3 0 7
对于每一行,我想汇总在该交易之前开仓并在该交易开仓之后关闭的所有交易。例如,第三行的交易先于第四行的交易开始,但仅在第四行的交易开始后才结束。因此,我然后为该交易(以及任何其他适当的交易(none,在本例中))获取 OpenedValueUSD,并将其放入 TransferredValue 列。
这是当前代码:
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime <
tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login ==
tradeData$Login]), by="Identifier"]
这应该会产生预期的结果:
tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime &
tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3
# Identifier TransferredValue
# 1: 1 0.00
# 2: 2 0.00
# 3: 3 0.00
# 4: 4 13635.18
# 5: 5 13635.18
# 6: 6 13660.25
# 7: 7 0.00
数据:
tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
"13/02/2014 15:33", "25/03/2014 14:52"),
CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
"13/02/2014 15:42", "26/03/2014 12:58"),
OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
ClosedValueUSD = c(10000.000, 8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
Month = c(2, 2, 2, 2, 2, 2, 3),
Identifier = c(1, 2, 3, 4, 5, 6, 7))
这是使用 foverlaps()
的另一种方法,它不需要按行分组。我会打电话给你的 data.table dt
.
将OpenTime
和CloseTime
转换为POSIXct格式,如@alex23lemm所示。
添加一个等于 OpenTime
的临时列 tmpTime
。我们将在 foverlaps()
.
中使用它
dt[, tmpTime := OpenTime]
setkey()
在 Login, OpenTime, CloseTime
列上。
setkey(dt, Login, OpenTime, CloseTime)
使用 foverlaps()
,我们现在将得到 Login, OpenTime, tmpTime
中的哪些区间 完全落在 Login, OpenTime, CloseTime
内。
olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"),
which=TRUE, nomatch=0L, type="within")
by.y
自动作为关键列。
删除自重叠,即删除那些 xid == yid
.
olaps = olaps[xid != yid]
# xid yid
# 1: 4 3
# 2: 5 3
# 3: 6 5
将对应于 yid
的值分配给 xid
行。并删除 tmpTime
.
dt[olaps$xid, TransferredValue :=
dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL]
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7
我刚开始使用数据表,需要一些帮助来汇总一些数据。
Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier
859 04/02/2014 07:55 05/02/2014 15:37 10000 10000 2014 2 0 1
859 07/02/2014 03:16 07/02/2014 03:51 8960.755 8960.755 2014 2 0 2
859 11/02/2014 12:41 13/02/2014 11:56 13635.178 13606.901 2014 2 0 3
859 11/02/2014 13:34 11/02/2014 15:34 13635.178 13635.178 2014 2 13635.178 4
859 12/02/2014 13:46 14/02/2014 09:59 13660.246 13649.278 2014 2 13635.178 5
859 13/02/2014 15:33 13/02/2014 15:42 13606.901 13606.901 2014 2 13660.246 6
859 25/03/2014 14:52 26/03/2014 12:58 10000 10000 2014 3 0 7
对于每一行,我想汇总在该交易之前开仓并在该交易开仓之后关闭的所有交易。例如,第三行的交易先于第四行的交易开始,但仅在第四行的交易开始后才结束。因此,我然后为该交易(以及任何其他适当的交易(none,在本例中))获取 OpenedValueUSD,并将其放入 TransferredValue 列。
这是当前代码:
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[OpenTime <
tradeData$OpenTime & CloseTime > tradeData$OpenTime & Login ==
tradeData$Login]), by="Identifier"]
这应该会产生预期的结果:
tradeData[,OpenTime:=as.POSIXct(OpenTime,format="%d/%m/%Y %H:%M")]
tradeData[,CloseTime:=as.POSIXct(CloseTime,format="%d/%m/%Y %H:%M")]
tradeData[,TransferredValue:=sum(tradeData$OpenedValueUSD[tradeData$OpenTime < OpenTime &
tradeData$CloseTime > OpenTime]), by = 'Identifier']
tradeData
# Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month
# 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2
# 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2
# 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2
# 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2
# 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2
# 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2
# 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3
# Identifier TransferredValue
# 1: 1 0.00
# 2: 2 0.00
# 3: 3 0.00
# 4: 4 13635.18
# 5: 5 13635.18
# 6: 6 13660.25
# 7: 7 0.00
数据:
tradeData <- data.table(Login = c(859, 859, 859, 859, 859, 859, 859),
OpenTime = c("04/02/2014 07:55", "07/02/2014 03:16", "11/02/2014 12:41", "11/02/2014 13:34", "12/02/2014 13:46",
"13/02/2014 15:33", "25/03/2014 14:52"),
CloseTime = c("05/02/2014 15:37", "07/02/2014 03:51", "13/02/2014 11:56", "11/02/2014 15:34", "14/02/2014 09:59",
"13/02/2014 15:42", "26/03/2014 12:58"),
OpenedValueUSD = c(10000.000, 8960.755, 13635.178, 13635.178, 13660.246, 13606.901, 10000.000),
ClosedValueUSD = c(10000.000, 8960.755, 13606.901, 13635.178, 13649.278, 13606.901, 10000.000),
Year = c(2014, 2014, 2014, 2014, 2014, 2014, 2014),
Month = c(2, 2, 2, 2, 2, 2, 3),
Identifier = c(1, 2, 3, 4, 5, 6, 7))
这是使用 foverlaps()
的另一种方法,它不需要按行分组。我会打电话给你的 data.table dt
.
将
OpenTime
和CloseTime
转换为POSIXct格式,如@alex23lemm所示。添加一个等于
中使用它OpenTime
的临时列tmpTime
。我们将在foverlaps()
.dt[, tmpTime := OpenTime]
setkey()
在Login, OpenTime, CloseTime
列上。setkey(dt, Login, OpenTime, CloseTime)
使用
foverlaps()
,我们现在将得到Login, OpenTime, tmpTime
中的哪些区间 完全落在Login, OpenTime, CloseTime
内。olaps = foverlaps(dt, dt, by.x=c("Login", "OpenTime", "tmpTime"), which=TRUE, nomatch=0L, type="within")
by.y
自动作为关键列。删除自重叠,即删除那些
xid == yid
.olaps = olaps[xid != yid] # xid yid # 1: 4 3 # 2: 5 3 # 3: 6 5
将对应于
yid
的值分配给xid
行。并删除tmpTime
.dt[olaps$xid, TransferredValue := dt$OpenedValueUSD[olaps$yid]][, tmpTime := NULL] # Login OpenTime CloseTime OpenedValueUSD ClosedValueUSD Year Month TransferredValue Identifier # 1: 859 2014-02-04 07:55:00 2014-02-05 15:37:00 10000.000 10000.000 2014 2 0.00 1 # 2: 859 2014-02-07 03:16:00 2014-02-07 03:51:00 8960.755 8960.755 2014 2 0.00 2 # 3: 859 2014-02-11 12:41:00 2014-02-13 11:56:00 13635.178 13606.901 2014 2 0.00 3 # 4: 859 2014-02-11 13:34:00 2014-02-11 15:34:00 13635.178 13635.178 2014 2 13635.18 4 # 5: 859 2014-02-12 13:46:00 2014-02-14 09:59:00 13660.246 13649.278 2014 2 13635.18 5 # 6: 859 2014-02-13 15:33:00 2014-02-13 15:42:00 13606.901 13606.901 2014 2 13660.25 6 # 7: 859 2014-03-25 14:52:00 2014-03-26 12:58:00 10000.000 10000.000 2014 3 0.00 7