在 r 中的标记列重置的 cumsum?
cumsum with reset at flagged column in r?
这是我第一次提出问题,请多多包涵。
我的数据集(df)是这样的:
animal azimuth south distance
pb1 187.561 1 1.992
pb1 147.219 1 8.567
pb1 71.032 0 5.754
pb1 119.502 1 10.451
pb2 101.702 1 9.227
pb2 85.715 0 8.821
我想创建一个额外的列 (df$cumdist
) 来增加累积距离,但在每个动物个体中并且仅当 df$south==1
时。我希望用 df$south==0
重置累计总和。
这是我希望的结果(手动完成):
animal azimuth south distance cumdist
pb1 187.561 1 1.992 1.992
pb1 147.219 1 8.567 10.559
pb1 71.032 0 5.754 0
pb1 119.502 1 10.451 10.451
pb2 101.702 1 9.227 9.227
pb2 85.715 0 8.821 0
这是我尝试实现 cumsum 的代码:
swim.az$cumdist <- cumsum(ifelse(swim.az$south==1, swim.az$distance, 0))
虽然它在 df$south==0
时成功停止添加,但它不会重置。此外,我知道我需要将其嵌入到 for 循环中以按动物划分子集。
非常感谢!
我们将'south'乘以'distance'('cumdist'),将'south'中0对应的'distance'中的值改为0,分组通过'animal'和逻辑向量(south == 0
)的累加和创建的组,得到'cumdist'、ungroup
的cumsum
并删除列不需要 (grp
)
library(dplyr)
dfN %>%
mutate(cumdist = south * distance) %>%
group_by(animal, grp = cumsum(south == 0)) %>%
mutate(cumdist = cumsum(cumdist)) %>%
ungroup %>%
select(-grp)
# A tibble: 6 x 5
# animal azimuth south distance cumdist
# <chr> <dbl> <int> <dbl> <dbl>
#1 pb1 188. 1 1.99 1.99
#2 pb1 147. 1 8.57 10.6
#3 pb1 71.0 0 5.75 0
#4 pb1 120. 1 10.5 10.5
#5 pb2 102. 1 9.23 9.23
#6 pb2 85.7 0 8.82 0
或与 base R
类似的方法
with(dfN, ave(distance * south, animal, cumsum(!south), FUN = cumsum))
#[1] 1.992 10.559 0.000 10.451 9.227 0.000
数据
dfN <- structure(list(animal = c("pb1", "pb1", "pb1", "pb1", "pb2",
"pb2"), azimuth = c(187.561, 147.219, 71.032, 119.502, 101.702,
85.715), south = c(1L, 1L, 0L, 1L, 1L, 0L), distance = c(1.992,
8.567, 5.754, 10.451, 9.227, 8.821)), class = "data.frame",
row.names = c(NA, -6L))
library(data.table)
setDT(df)
df[, cumdist := south*cumsum(distance), .(animal, rleid(south))]
# animal azimuth south distance cumdist
# 1: pb1 187.561 1 1.992 1.992
# 2: pb1 147.219 1 8.567 10.559
# 3: pb1 71.032 0 5.754 0.000
# 4: pb1 119.502 1 10.451 10.451
# 5: pb2 101.702 1 9.227 9.227
# 6: pb2 85.715 0 8.821 0.000
这是我第一次提出问题,请多多包涵。
我的数据集(df)是这样的:
animal azimuth south distance
pb1 187.561 1 1.992
pb1 147.219 1 8.567
pb1 71.032 0 5.754
pb1 119.502 1 10.451
pb2 101.702 1 9.227
pb2 85.715 0 8.821
我想创建一个额外的列 (df$cumdist
) 来增加累积距离,但在每个动物个体中并且仅当 df$south==1
时。我希望用 df$south==0
重置累计总和。
这是我希望的结果(手动完成):
animal azimuth south distance cumdist
pb1 187.561 1 1.992 1.992
pb1 147.219 1 8.567 10.559
pb1 71.032 0 5.754 0
pb1 119.502 1 10.451 10.451
pb2 101.702 1 9.227 9.227
pb2 85.715 0 8.821 0
这是我尝试实现 cumsum 的代码:
swim.az$cumdist <- cumsum(ifelse(swim.az$south==1, swim.az$distance, 0))
虽然它在 df$south==0
时成功停止添加,但它不会重置。此外,我知道我需要将其嵌入到 for 循环中以按动物划分子集。
非常感谢!
我们将'south'乘以'distance'('cumdist'),将'south'中0对应的'distance'中的值改为0,分组通过'animal'和逻辑向量(south == 0
)的累加和创建的组,得到'cumdist'、ungroup
的cumsum
并删除列不需要 (grp
)
library(dplyr)
dfN %>%
mutate(cumdist = south * distance) %>%
group_by(animal, grp = cumsum(south == 0)) %>%
mutate(cumdist = cumsum(cumdist)) %>%
ungroup %>%
select(-grp)
# A tibble: 6 x 5
# animal azimuth south distance cumdist
# <chr> <dbl> <int> <dbl> <dbl>
#1 pb1 188. 1 1.99 1.99
#2 pb1 147. 1 8.57 10.6
#3 pb1 71.0 0 5.75 0
#4 pb1 120. 1 10.5 10.5
#5 pb2 102. 1 9.23 9.23
#6 pb2 85.7 0 8.82 0
或与 base R
with(dfN, ave(distance * south, animal, cumsum(!south), FUN = cumsum))
#[1] 1.992 10.559 0.000 10.451 9.227 0.000
数据
dfN <- structure(list(animal = c("pb1", "pb1", "pb1", "pb1", "pb2",
"pb2"), azimuth = c(187.561, 147.219, 71.032, 119.502, 101.702,
85.715), south = c(1L, 1L, 0L, 1L, 1L, 0L), distance = c(1.992,
8.567, 5.754, 10.451, 9.227, 8.821)), class = "data.frame",
row.names = c(NA, -6L))
library(data.table)
setDT(df)
df[, cumdist := south*cumsum(distance), .(animal, rleid(south))]
# animal azimuth south distance cumdist
# 1: pb1 187.561 1 1.992 1.992
# 2: pb1 147.219 1 8.567 10.559
# 3: pb1 71.032 0 5.754 0.000
# 4: pb1 119.502 1 10.451 10.451
# 5: pb2 101.702 1 9.227 9.227
# 6: pb2 85.715 0 8.821 0.000