根据组中第一个值的条件替换 df 中的后续值
Replace subsequent values in a df based on condition of first value in group
我在有序的 R 数据框中有这种类型的数据。
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
我试图得到的结果是这样的:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
基本上,如果他练习的下一项运动与前一项运动相同,则id练习的第一项运动保持不变,type_2保持不变,但一旦他稍后改变运动,他稍后将他的其余值更改为 multi。
我尝试在 dplyr
中使用 lag()
、lead()
和 if_else()
执行此操作,但结果永远不会如我所愿。
您可以使用 data.table
中的 rleid
为每个 id
中的 type
变量生成 运行 长度 ID。第一次更改后的所有内容变为 "Multi"
.
library(data.table)
setDT(df)[, type2 := replace(type, rleid(type) > 1, 'Multi'), id]
df
# date id type type2
# 1: 2019-02-18 1 Football Football
# 2: 2019-02-28 1 Football Football
# 3: 2019-03-13 1 Rugby Multi
# 4: 2019-09-29 1 Football Multi
# 5: 2019-10-09 2 Hockey Hockey
# 6: 2020-03-19 2 Tennis Multi
# 7: 2020-04-21 3 Hockey Hockey
# 8: 2020-06-19 4 Basketball Basketball
# 9: 2020-09-08 4 Basketball Basketball
#10: 2020-10-08 5 Rugby Rugby
如果你喜欢写成dplyr
-
library(dplyr)
df %>%
group_by(id) %>%
mutate(type2 = replace(type, rleid(type) > 1, 'Multi')) %>%
ungroup
我在有序的 R 数据框中有这种类型的数据。
set.seed(25)
date <- sort(as.Date(sample( as.numeric(as.Date("2019-01-01")): as.numeric(as.Date("2021-03-31")), 10,
replace = T),
origin = '1970-01-01'))
type <- c("Football", "Football", "Rugby", "Football", "Hockey", "Tennis", "Hockey", "Basketball", "Basketball", "Rugby")
id <- c("1","1","1","1","2","2","3","4","4","5")
df <- data.frame(date,id, type)
date id type
2019-04-09 1 Football
2019-04-13 1 Football
2019-04-20 1 Rugby
2019-04-21 1 Football
2019-05-31 2 Hockey
2020-02-09 2 Tennis
2020-03-08 3 Hockey
2020-03-24 4 Basketball
2020-08-18 4 Football
2020-11-01 5 Rugby
我试图得到的结果是这样的:
date id type type_2
2019-04-09 1 Football Football
2019-04-13 1 Football Football
2019-04-20 1 Rugby Multi
2019-04-21 1 Football Multi
2019-05-31 2 Hockey Hockey
2020-02-09 2 Tennis Multi
2020-03-08 3 Hockey Hockey
2020-03-24 4 Basketball Basketball
2020-08-18 4 Basketball Basketball
2020-11-01 5 Rugby Rugby
基本上,如果他练习的下一项运动与前一项运动相同,则id练习的第一项运动保持不变,type_2保持不变,但一旦他稍后改变运动,他稍后将他的其余值更改为 multi。
我尝试在 dplyr
中使用 lag()
、lead()
和 if_else()
执行此操作,但结果永远不会如我所愿。
您可以使用 data.table
中的 rleid
为每个 id
中的 type
变量生成 运行 长度 ID。第一次更改后的所有内容变为 "Multi"
.
library(data.table)
setDT(df)[, type2 := replace(type, rleid(type) > 1, 'Multi'), id]
df
# date id type type2
# 1: 2019-02-18 1 Football Football
# 2: 2019-02-28 1 Football Football
# 3: 2019-03-13 1 Rugby Multi
# 4: 2019-09-29 1 Football Multi
# 5: 2019-10-09 2 Hockey Hockey
# 6: 2020-03-19 2 Tennis Multi
# 7: 2020-04-21 3 Hockey Hockey
# 8: 2020-06-19 4 Basketball Basketball
# 9: 2020-09-08 4 Basketball Basketball
#10: 2020-10-08 5 Rugby Rugby
如果你喜欢写成dplyr
-
library(dplyr)
df %>%
group_by(id) %>%
mutate(type2 = replace(type, rleid(type) > 1, 'Multi')) %>%
ungroup