计算每个binary/boolean列的时间长度作为参考
Calculating the time length of each binary/boolean column as reference
我有两列。对于一系列数据,一个被列为 True/False。整个数据集还有一个时间步长列。我想编写代码,当布尔值列变为真时可以读取,时间从时间戳列开始计算,直到布尔值变回假。并对整个系列重复此操作,并将时间放入直方图的数据框中。为糟糕的尝试道歉,我真的不知道从哪里开始。请注意,运行 列被列为字符——也许我需要转换为布尔值才能正常工作?
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
datafinal <- data.frame()
for (i in dataset){
if running == f,
result <- sum(i:n)
datafinal <- c(datafinal, result)
}
将 running
列转换为布尔值并使用 for-loop 是一种方法。此外,您可以在数据框中进行操作。你已经有一个了!这是一个使用 tidyverse
库和一些日期操作的解决方案,这要归功于 lubridate
库。我鼓励您学习使用这些库来解决此类问题。
rleid()
data.table
库中的函数每当目标列 running
中的值发生变化时添加 +1。
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
# times to date time object
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
library(tidyverse)
solution = dataset %>%
mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
group_by(Grp) %>% # rows in the same state are grouped together
slice(1) %>% # keep first row
ungroup %>% # you don't need grouping anymore
mutate(timeLength = difftime(time, lag(time), units="secs"))
# calculate the differences between a row and previous one (lag(n=1))
输出:
# A tibble: 5 x 4
time running Grp timeLength
<dttm> <chr> <int> <drtn>
1 2022-01-01 00:00:10 t 1 NA secs
2 2022-01-01 00:00:30 f 2 20 secs
3 2022-01-01 00:00:50 t 3 20 secs
4 2022-01-01 00:01:00 f 4 10 secs
5 2022-01-01 00:01:10 t 5 10 secs
如果你想去掉第一个 NA 行,只需添加到管道 %>% filter(!is.na(timeLength))
。
更新以添加使用 for-loop 和嵌套 if-else 的方法。但请注意,代码更长且更难跟踪。
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
# empty array for tracking changes in rows
current = c()
# datafinal empty dataframe
datafinal = data.frame()
# better working with the rows index
for (i in seq(nrow(dataset))){
# extract current vale of running
current = c(current,dataset[i,]$running)
if (i>1){ # we can't operate with first row, right?
if (current[i] == current[i-1]){
next # pass iteration if they keep in same state (true or false)
}
else { # different state? let's operate
result = difftime(dataset[i,]$time, previous_time, units="secs")
}
# (note: if 'next' jump in if-loop this part doesn't jump)
# create the outcome row for iteration
new_row = cbind(dataset[i,],result)
# add row to final dataframe
datafinal = rbind(datafinal,new_row)
}
# keep first time of state when it changes or we initiate the loop
previous_time = dataset[i,]$time
}
我有两列。对于一系列数据,一个被列为 True/False。整个数据集还有一个时间步长列。我想编写代码,当布尔值列变为真时可以读取,时间从时间戳列开始计算,直到布尔值变回假。并对整个系列重复此操作,并将时间放入直方图的数据框中。为糟糕的尝试道歉,我真的不知道从哪里开始。请注意,运行 列被列为字符——也许我需要转换为布尔值才能正常工作?
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
datafinal <- data.frame()
for (i in dataset){
if running == f,
result <- sum(i:n)
datafinal <- c(datafinal, result)
}
将 running
列转换为布尔值并使用 for-loop 是一种方法。此外,您可以在数据框中进行操作。你已经有一个了!这是一个使用 tidyverse
库和一些日期操作的解决方案,这要归功于 lubridate
库。我鼓励您学习使用这些库来解决此类问题。
rleid()
data.table
库中的函数每当目标列 running
中的值发生变化时添加 +1。
running <- c("t","t","f","f","t","f","t","t")
time <- c("2022-01-01 00:00:10", "2022-01-01 00:00:20","2022-01-01 00:00:30","2022-01-01 00:00:40","2022-01-01 00:00:50","2022-01-01 00:01:00","2022-01-01 00:01:10","2022-01-01 00:01:20")
dataset <- data.frame(time, running)
# times to date time object
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
library(tidyverse)
solution = dataset %>%
mutate(Grp=data.table::rleid(running)) %>% # rows in the same state before change get same value
group_by(Grp) %>% # rows in the same state are grouped together
slice(1) %>% # keep first row
ungroup %>% # you don't need grouping anymore
mutate(timeLength = difftime(time, lag(time), units="secs"))
# calculate the differences between a row and previous one (lag(n=1))
输出:
# A tibble: 5 x 4
time running Grp timeLength
<dttm> <chr> <int> <drtn>
1 2022-01-01 00:00:10 t 1 NA secs
2 2022-01-01 00:00:30 f 2 20 secs
3 2022-01-01 00:00:50 t 3 20 secs
4 2022-01-01 00:01:00 f 4 10 secs
5 2022-01-01 00:01:10 t 5 10 secs
如果你想去掉第一个 NA 行,只需添加到管道 %>% filter(!is.na(timeLength))
。
更新以添加使用 for-loop 和嵌套 if-else 的方法。但请注意,代码更长且更难跟踪。
dataset$time = lubridate::ymd_hms(dataset$time,tz="UTC")
# empty array for tracking changes in rows
current = c()
# datafinal empty dataframe
datafinal = data.frame()
# better working with the rows index
for (i in seq(nrow(dataset))){
# extract current vale of running
current = c(current,dataset[i,]$running)
if (i>1){ # we can't operate with first row, right?
if (current[i] == current[i-1]){
next # pass iteration if they keep in same state (true or false)
}
else { # different state? let's operate
result = difftime(dataset[i,]$time, previous_time, units="secs")
}
# (note: if 'next' jump in if-loop this part doesn't jump)
# create the outcome row for iteration
new_row = cbind(dataset[i,],result)
# add row to final dataframe
datafinal = rbind(datafinal,new_row)
}
# keep first time of state when it changes or we initiate the loop
previous_time = dataset[i,]$time
}