比较多行时使用 data.table 向量化 for 循环
vectorizeing for loop with data.table when comparing across multiple rows
简而言之,我正在尝试矢量化我的 data.table
代码并删除 2 个 for
循环。具体来说,我正在比较两个不同的行,无法弄清楚如何矢量化我的代码。详情如下:
我正在尝试计算鱼在给定鱼坐标的情况下穿过一条线的次数。我只关心单向运动(例如,从北到南而不是从南到北)。实际数据是二维的,有数十万个观测值。我创建了一个一维的、可重现的例子。
我已经浏览了 data.table
FAQ and searched through SO using "vectorize data.table"。如果我不是 "asking the right question"(即,使用正确的术语搜索),我会适当地指出我应该搜索什么来解决我的问题。
这是我的示例以及我目前正在做的事情:
library(data.table)
dt = data.table(
fish = rep(c("a", "b"), each = 4),
time = rep(c(1:4), 2),
location = c(1, 1, 2, 2, 1, 1, 1, 1))
crossLine = 1.5 # Coordinates that I care about
dt[ , Cross := 0] ## did the fish cross the line during the previous time step?
fishes = dt[ , unique(fish)]
for(fishIndex in fishes){ # loop through each fish
sampleTime = dt[ fishIndex == fish, time]
nObs = length(sampleTime)
## In the real dataset, the no. of observations varies by fish
for(timeIndex in 1:(nObs - 1)){ #loop through each time point
if(dt[ fishIndex == fish & sampleTime[timeIndex] == time,
location <= crossLine] &
dt[ fishIndex == fish & sampleTime[timeIndex + 1] == time,
location > crossLine]
){dt[ fishIndex == fish & time == sampleTime[timeIndex + 1],
Cross := 1] # record if the fish crossed the line
}
}
}
我理想的解决方案看起来像这样:
moveCheck <- Vectorize(function(...))
dt[ , Cross := moveCheck(location, fish)]
fish
位于函数内部,以确保我在鱼之间转换时不会意外记录运动。
那么,我的问题是:使用 data.table
语法来提高此代码的性能并消除循环的方法是什么?
这对你有用吗(它对 OP 示例有用,但我不确定它的代表性如何)?
dt[, cross := c(0, diff(location >= crossLine) > 0), by = fish]
简而言之,我正在尝试矢量化我的 data.table
代码并删除 2 个 for
循环。具体来说,我正在比较两个不同的行,无法弄清楚如何矢量化我的代码。详情如下:
我正在尝试计算鱼在给定鱼坐标的情况下穿过一条线的次数。我只关心单向运动(例如,从北到南而不是从南到北)。实际数据是二维的,有数十万个观测值。我创建了一个一维的、可重现的例子。
我已经浏览了 data.table
FAQ and searched through SO using "vectorize data.table"。如果我不是 "asking the right question"(即,使用正确的术语搜索),我会适当地指出我应该搜索什么来解决我的问题。
这是我的示例以及我目前正在做的事情:
library(data.table)
dt = data.table(
fish = rep(c("a", "b"), each = 4),
time = rep(c(1:4), 2),
location = c(1, 1, 2, 2, 1, 1, 1, 1))
crossLine = 1.5 # Coordinates that I care about
dt[ , Cross := 0] ## did the fish cross the line during the previous time step?
fishes = dt[ , unique(fish)]
for(fishIndex in fishes){ # loop through each fish
sampleTime = dt[ fishIndex == fish, time]
nObs = length(sampleTime)
## In the real dataset, the no. of observations varies by fish
for(timeIndex in 1:(nObs - 1)){ #loop through each time point
if(dt[ fishIndex == fish & sampleTime[timeIndex] == time,
location <= crossLine] &
dt[ fishIndex == fish & sampleTime[timeIndex + 1] == time,
location > crossLine]
){dt[ fishIndex == fish & time == sampleTime[timeIndex + 1],
Cross := 1] # record if the fish crossed the line
}
}
}
我理想的解决方案看起来像这样:
moveCheck <- Vectorize(function(...))
dt[ , Cross := moveCheck(location, fish)]
fish
位于函数内部,以确保我在鱼之间转换时不会意外记录运动。
那么,我的问题是:使用 data.table
语法来提高此代码的性能并消除循环的方法是什么?
这对你有用吗(它对 OP 示例有用,但我不确定它的代表性如何)?
dt[, cross := c(0, diff(location >= crossLine) > 0), by = fish]