基于条件 R 的列值顺序增加
Sequential Increase in Column value based on a condition R
我有一个 R 数据框,它有一个 ID 列,其中包含一个 ID 的多个记录。当 ID 的标志设置为 1 时,我想创建一个从 1 开始并以 6(1,6,12 ...)为增量顺序增加的新时间轴列。如何使用 dplyr 在 R 中实现此目的?
下面是一个示例数据框
ID
Timepoint
Flag
A
0
0
A
6
0
A
12
0
A
18
1
A
24
0
A
30
0
A
36
0
预期数据帧
ID
Timepoint
Flag
New_Timepoint
A
0
0
A
6
0
A
12
0
A
18
1
1
A
24
0
6
A
30
0
12
A
36
0
18
一个选项是按 'ID' 分组,创建 'Timepoint' 的 lag
并将 n
指定为 'Flag' 的位置,其中值是 1 (-1)
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(New_Timepoint = dplyr::lag(replace(Timepoint, !Timepoint, 1),
n = which(Flag == 1)-1)) %>%
ungroup
-输出
# A tibble: 7 x 4
# ID Timepoint Flag New_Timepoint
# <chr> <int> <int> <dbl>
#1 A 0 0 NA
#2 A 6 0 NA
#3 A 12 0 NA
#4 A 18 1 1
#5 A 24 0 6
#6 A 30 0 12
#7 A 36 0 18
或者使用双cumsum
创建索引
df1 %>%
group_by(ID) %>%
mutate(New_Timepoint = Timepoint[na_if(cumsum(cumsum(Flag)), 0)]) %>%
ungroup
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A"),
Timepoint = c(0L,
6L, 12L, 18L, 24L, 30L, 36L),
Flag = c(0L, 0L, 0L, 1L, 0L, 0L,
0L)), class = "data.frame", row.names = c(NA, -7L))
另一个dplyr
选项
df %>%
group_by(ID) %>%
mutate(New_Timepoint = pmax(1, Timepoint - c(NA, Timepoint[Flag == 1])[cumsum(Flag) + 1])) %>%
ungroup()
给予
ID Timepoint Flag New_Timepoint
<chr> <int> <int> <dbl>
1 A 0 0 NA
2 A 6 0 NA
3 A 12 0 NA
4 A 18 1 1
5 A 24 0 6
6 A 30 0 12
7 A 36 0 18
我有一个 R 数据框,它有一个 ID 列,其中包含一个 ID 的多个记录。当 ID 的标志设置为 1 时,我想创建一个从 1 开始并以 6(1,6,12 ...)为增量顺序增加的新时间轴列。如何使用 dplyr 在 R 中实现此目的?
下面是一个示例数据框
ID | Timepoint | Flag |
---|---|---|
A | 0 | 0 |
A | 6 | 0 |
A | 12 | 0 |
A | 18 | 1 |
A | 24 | 0 |
A | 30 | 0 |
A | 36 | 0 |
预期数据帧
ID | Timepoint | Flag | New_Timepoint |
---|---|---|---|
A | 0 | 0 | |
A | 6 | 0 | |
A | 12 | 0 | |
A | 18 | 1 | 1 |
A | 24 | 0 | 6 |
A | 30 | 0 | 12 |
A | 36 | 0 | 18 |
一个选项是按 'ID' 分组,创建 'Timepoint' 的 lag
并将 n
指定为 'Flag' 的位置,其中值是 1 (-1)
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(New_Timepoint = dplyr::lag(replace(Timepoint, !Timepoint, 1),
n = which(Flag == 1)-1)) %>%
ungroup
-输出
# A tibble: 7 x 4
# ID Timepoint Flag New_Timepoint
# <chr> <int> <int> <dbl>
#1 A 0 0 NA
#2 A 6 0 NA
#3 A 12 0 NA
#4 A 18 1 1
#5 A 24 0 6
#6 A 30 0 12
#7 A 36 0 18
或者使用双cumsum
创建索引
df1 %>%
group_by(ID) %>%
mutate(New_Timepoint = Timepoint[na_if(cumsum(cumsum(Flag)), 0)]) %>%
ungroup
数据
df1 <- structure(list(ID = c("A", "A", "A", "A", "A", "A", "A"),
Timepoint = c(0L,
6L, 12L, 18L, 24L, 30L, 36L),
Flag = c(0L, 0L, 0L, 1L, 0L, 0L,
0L)), class = "data.frame", row.names = c(NA, -7L))
另一个dplyr
选项
df %>%
group_by(ID) %>%
mutate(New_Timepoint = pmax(1, Timepoint - c(NA, Timepoint[Flag == 1])[cumsum(Flag) + 1])) %>%
ungroup()
给予
ID Timepoint Flag New_Timepoint
<chr> <int> <int> <dbl>
1 A 0 0 NA
2 A 6 0 NA
3 A 12 0 NA
4 A 18 1 1
5 A 24 0 6
6 A 30 0 12
7 A 36 0 18