R - 有没有办法在重复观察后生成从 1 开始的行号?
R - Is there a way to generate row numbers that start back from 1 after repeated upcoming observation?
enter image description here标题肯定看不懂我的问题。这是我的简短数据:
|ID | group |
|---|-------|
| 1 | Banana|
| 2 | Apple |
| 3 | Apple |
| 4 | Apple |
| 5 | Banana|
| 6 | Banana|
| 7 | Apple |
| 8 | Apple |
现在我想创建一个按组编号的变量,但是它不应该在新的观察后再次从 1 开始。所以基本上它看起来像这样:
|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1 |
| 2 | Apple | 1 |
| 3 | Apple | 2 |
| 4 | Apple | 3 |
| 5 | Banana| 2 |
| 6 | Banana| 3 |
| 7 | Apple | 4 |
| 8 | Apple | 5 |
什么时候应该是这样的:
|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1 |
| 2 | Apple | 1 |
| 3 | Apple | 2 |
| 4 | Apple | 3 |
| 5 | Banana| 1 |
| 6 | Banana| 2 |
| 7 | Apple | 1 |
| 8 | Apple | 2 |
我不得不提的是,我有很多观察结果,而不仅仅是苹果和香蕉这两个群体。因此,不幸的是,我必须在其中命名“Apple”和“Banana”等组的代码没有帮助。我试过这样解决问题:
df1<- df1%>%
group_by(group) %>%
mutate(numbering = row_number())
但是这里的错误很明显。我也尝试解决这个问题,但是非常困难。如果有人有解决方案,我将不胜感激!
这里有 3 种方法 -
基础 R -
df <- transform(df, row_number = ave(ID, with(rle(group),
rep(seq_along(values), lengths)), FUN = seq_along))
df
# ID group row_number
#1 1 Banana 1
#2 2 Apple 1
#3 3 Apple 2
#4 4 Apple 3
#5 5 Banana 1
#6 6 Banana 2
#7 7 Apple 1
#8 8 Apple 2
dplyr
-
library(dplyr)
df %>%
group_by(grp = cumsum(group != lag(group, default = first(group)))) %>%
mutate(row_number = row_number()) %>%
ungroup %>%
select(-grp)
data.table
-
library(data.table)
setDT(df)[, row_number := seq_len(.N), rleid(group)]
数据
df <- structure(list(ID = 1:8, group = c("Banana", "Apple", "Apple",
"Apple", "Banana", "Banana", "Apple", "Apple")), row.names = c(NA,
-8L), class = "data.frame")
另一种方式:
df %>%
mutate(Temp=data.table::rleid(group)) %>%
group_by(Temp) %>%
mutate(row_number=row_number()) %>%
select(-Temp)
enter image description here标题肯定看不懂我的问题。这是我的简短数据:
|ID | group |
|---|-------|
| 1 | Banana|
| 2 | Apple |
| 3 | Apple |
| 4 | Apple |
| 5 | Banana|
| 6 | Banana|
| 7 | Apple |
| 8 | Apple |
现在我想创建一个按组编号的变量,但是它不应该在新的观察后再次从 1 开始。所以基本上它看起来像这样:
|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1 |
| 2 | Apple | 1 |
| 3 | Apple | 2 |
| 4 | Apple | 3 |
| 5 | Banana| 2 |
| 6 | Banana| 3 |
| 7 | Apple | 4 |
| 8 | Apple | 5 |
什么时候应该是这样的:
|ID | group | row_number |
|---|-------|------------|
| 1 | Banana| 1 |
| 2 | Apple | 1 |
| 3 | Apple | 2 |
| 4 | Apple | 3 |
| 5 | Banana| 1 |
| 6 | Banana| 2 |
| 7 | Apple | 1 |
| 8 | Apple | 2 |
我不得不提的是,我有很多观察结果,而不仅仅是苹果和香蕉这两个群体。因此,不幸的是,我必须在其中命名“Apple”和“Banana”等组的代码没有帮助。我试过这样解决问题:
df1<- df1%>%
group_by(group) %>%
mutate(numbering = row_number())
但是这里的错误很明显。我也尝试解决这个问题,但是非常困难。如果有人有解决方案,我将不胜感激!
这里有 3 种方法 -
基础 R -
df <- transform(df, row_number = ave(ID, with(rle(group),
rep(seq_along(values), lengths)), FUN = seq_along))
df
# ID group row_number
#1 1 Banana 1
#2 2 Apple 1
#3 3 Apple 2
#4 4 Apple 3
#5 5 Banana 1
#6 6 Banana 2
#7 7 Apple 1
#8 8 Apple 2
dplyr
-
library(dplyr)
df %>%
group_by(grp = cumsum(group != lag(group, default = first(group)))) %>%
mutate(row_number = row_number()) %>%
ungroup %>%
select(-grp)
data.table
-
library(data.table)
setDT(df)[, row_number := seq_len(.N), rleid(group)]
数据
df <- structure(list(ID = 1:8, group = c("Banana", "Apple", "Apple",
"Apple", "Banana", "Banana", "Apple", "Apple")), row.names = c(NA,
-8L), class = "data.frame")
另一种方式:
df %>%
mutate(Temp=data.table::rleid(group)) %>%
group_by(Temp) %>%
mutate(row_number=row_number()) %>%
select(-Temp)