如何根据另一列的模式创建一个组?
How to create a group based on pattern from another column?
我有如下数据框,
dt <- data.frame(id = c("a","b","c","d","e","f","g","h","i","j"),
value = c(1,2,1,2,1,1,1,2,1,2))
> dt
id value
1 a 1
2 b 2
3 c 1
4 d 2
5 e 1
6 f 1
7 g 1
8 h 2
9 i 1
10 j 2
我希望根据列 value 创建一个列,以便每当它在列 中遇到 2 value 它将分配一个新的组号。输出看起来像,
dtgroup <- data.frame(id = c("a","b","c","d","e","f","g","h","i","j"),
value = c(1,2,1,2,1,1,1,2,1,2),
group = c(1,1,2,2,3,3,3,3,4,4))
> dtgroup
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
有什么想法吗?谢谢!
有cumsum
,如果value
没有NA
:
dt$group <- head(c(0,cumsum(dt$value==2))+1,-1)
dt
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
我们可以像下面这样使用findInterval
> transform(dt, group = 1 + findInterval(seq_along(value), which(value == 2), left.open = TRUE))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
或cut
> transform(dt, group = as.integer(cut(seq_along(value), c(-Inf, which(value == 2)))))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
另一种可能。当值为 1 且前一个值 (dplyr::lag
) 不为 1 时加一。
dt$group <- with(dt, cumsum(value == 1 & dplyr::lag(value != 1, default = 1)))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
我有如下数据框,
dt <- data.frame(id = c("a","b","c","d","e","f","g","h","i","j"),
value = c(1,2,1,2,1,1,1,2,1,2))
> dt
id value
1 a 1
2 b 2
3 c 1
4 d 2
5 e 1
6 f 1
7 g 1
8 h 2
9 i 1
10 j 2
我希望根据列 value 创建一个列,以便每当它在列 中遇到 2 value 它将分配一个新的组号。输出看起来像,
dtgroup <- data.frame(id = c("a","b","c","d","e","f","g","h","i","j"),
value = c(1,2,1,2,1,1,1,2,1,2),
group = c(1,1,2,2,3,3,3,3,4,4))
> dtgroup
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
有什么想法吗?谢谢!
有cumsum
,如果value
没有NA
:
dt$group <- head(c(0,cumsum(dt$value==2))+1,-1)
dt
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
我们可以像下面这样使用findInterval
> transform(dt, group = 1 + findInterval(seq_along(value), which(value == 2), left.open = TRUE))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
或cut
> transform(dt, group = as.integer(cut(seq_along(value), c(-Inf, which(value == 2)))))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4
另一种可能。当值为 1 且前一个值 (dplyr::lag
) 不为 1 时加一。
dt$group <- with(dt, cumsum(value == 1 & dplyr::lag(value != 1, default = 1)))
id value group
1 a 1 1
2 b 2 1
3 c 1 2
4 d 2 2
5 e 1 3
6 f 1 3
7 g 1 3
8 h 2 3
9 i 1 4
10 j 2 4