R 使用 case_when 按组跟踪列中的变化
R Using case_when to track changes in a column by group
我有一个课程注册数据集,我试图在其中跟踪学生在整个学期中是否放弃、添加或保留了一门课程,并确定他们的注册情况 'path'。 IE。我想记录他们是否注册了 BIOL101 并放弃了它以参加 BIOL202。我的数据框如下所示:
YRTR TECH_ID COU_ID SUBJ COU_NBR GENDER RACE sub_cou status path
20173 108 217 MUSC 2231 Male White MUSC 2231 retained
20173 108 218 MUSC 2281 Male White MUSC 2281 retained
20173 8429 574 ECON 2201 Male White ECON 2201 retained
20173 8429 720 BUSN 2120 Male White BUSN 2120 retained
20173 9883 60 ECON 2202 Male White ECON 2202 added
20173 15515 95 PHIL 1102 Female White PHIL 1102 retained
20183 8207 478 ART 1102 Female White ART 1102 retained
20183 8207 1306 ART 1130 Female White ART 1130 added
20183 8207 403 ART 1125 Female White ART 1125 dropped
我正在尝试填写最右侧的“路径”栏。这个想法是,如果学生保留在第一行中的课程中,路径将显示为 2231->2231
。具体来说,我正在查看 WITHIN 科目的课程转学。因此,在数据集的末尾,ID 8207 将有一个看起来像 1102->1102
的路径和另一个看起来像 1125->1130
我最初尝试将数据帧拆分为两个数据帧(一个在下降期之前,一个在下降期之后),然后像这样重新加入它们:
data5 <- merge(x=post_drop, y=pre_drop, by=c("TECH_ID", "YRTR", "SUBJ"), all=TRUE)
然后使用case_when分配路径:
data5$status.x=="retained" ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.x),
((data5$status.x=="added") & (data5$status.y=="dropped")) ~ paste0(data5$COU_NBR.y, "->", data5$COU_NBR.x),
((data5$status.x=="dropped") & (data5$status.y=="added")) ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.y)
)
但这并没有让我到达我想要的位置 - 它在路径中留下了很多 NA,也没有告诉我学生是否放弃了一个科目中的课程并且没有注册另一个科目(即放弃BIOL101 而不是服用另一个 BIOL class) 在这种情况下我想要类似 101->NA
的东西或者当 class 被简单地添加时(即它们没有在 BIOL class 中注册] 最初但决定注册 BIOL101) 格式如下 NA->101
已编辑解决方案 9 月 27 日
再次问好@alexvc 这是一个开始。了解一点你的数据。您忘记了学生丢掉 1 并添加 2 的情况,在这种情况下,“路径”变得混乱。我已经为您提供了一个解决方案,可以清楚地显示 path
。
library(dplyr)
library(tidyr)
data5 %>%
group_by(YRTR, TECH_ID, SUBJ, status) %>%
mutate(numbadd =
case_when(
status == "added" ~ 1,
TRUE ~ 0
),
numbdrop =
case_when(
status == "dropped" ~ 1,
TRUE ~ 0
),
rightside =
case_when(
numbadd == 1 ~ paste(COU_NBR, collapse = " and ")
),
leftside =
case_when(
numbdrop == 1 ~ paste(COU_NBR, collapse = " and ")
)
) %>%
group_by(YRTR, TECH_ID, SUBJ) %>%
mutate(total_add_drop = ifelse(status == "retained",
0,
sum(numbadd) + sum(numbdrop))) %>%
tidyr::fill(leftside, rightside, .direction = "downup") %>%
group_by(YRTR, TECH_ID, SUBJ, status) %>%
mutate(PATH =
case_when(
status == "retained" ~ paste(COU_NBR,
COU_NBR,
sep = " -> "),
status == "added" & total_add_drop == 1 ~ paste("NA",
COU_NBR,
sep = " -> "),
status == "dropped" & total_add_drop == 1 ~ paste(COU_NBR,
"NA",
sep = " -> "),
total_add_drop >= 2 ~ paste(leftside,
rightside,
sep = " -> "),
TRUE ~ "Theres a problem"
)) %>%
arrange(YRTR, TECH_ID) %>%
select(-COU_ID, -GENDER, -RACE, -rightside, -leftside, -numbadd, -numbdrop, -total_add_drop)
#> # A tibble: 17 x 7
#> # Groups: YRTR, TECH_ID, SUBJ, status [13]
#> YRTR TECH_ID SUBJ COU_NBR sub_cou status PATH
#> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
#> 1 20173 108 MUSC 2231 MUSC 2231 retained 2231 -> 2231
#> 2 20173 108 MUSC 2281 MUSC 2281 retained 2281 -> 2281
#> 3 20173 3889 ECON 2202 ECON 2202 dropped 2202 -> NA
#> 4 20173 8429 ECON 2201 ECON 2201 retained 2201 -> 2201
#> 5 20173 8429 BUSN 2120 BUSN 2120 retained 2120 -> 2120
#> 6 20173 9883 ECON 2202 ECON 2202 added NA -> 2202
#> 7 20173 15515 PHIL 1102 PHIL 1102 retained 1102 -> 1102
#> 8 20183 8207 ART 1102 ART 1102 retained 1102 -> 1102
#> 9 20183 8207 ART 1130 ART 1130 added 1125 -> 1130 and 2345
#> 10 20183 8207 ART 2345 ART 2345 added 1125 -> 1130 and 2345
#> 11 20183 8207 ART 1125 ART 1125 dropped 1125 -> 1130 and 2345
#> 12 20183 8209 ART 2345 ART 2345 added 1125 -> 2345
#> 13 20183 8209 ART 1125 ART 1125 dropped 1125 -> 2345
#> 14 20183 8270 PSYC 1001 PSYC 1001 dropped 1001 and 1002 -> 1003 and 1004
#> 15 20183 8270 PSYC 1003 PSYC 1003 added 1001 and 1002 -> 1003 and 1004
#> 16 20183 8270 PSYC 1002 PSYC 1002 dropped 1001 and 1002 -> 1003 and 1004
#> 17 20183 8270 PSYC 1004 PSYC 1004 added 1001 and 1002 -> 1003 and 1004
你的数据和额外的测试用例
data5 <- readr::read_table(
" YRTR TECH_ID COU_ID SUBJ COU_NBR GENDER RACE sub_cou status
20173 108 217 MUSC 2231 Male White MUSC 2231 retained
20173 108 218 MUSC 2281 Male White MUSC 2281 retained
20173 8429 574 ECON 2201 Male White ECON 2201 retained
20173 8429 720 BUSN 2120 Male White BUSN 2120 retained
20173 9883 60 ECON 2202 Male White ECON 2202 added
20173 3889 60 ECON 2202 Male White ECON 2202 dropped
20173 15515 95 PHIL 1102 Female White PHIL 1102 retained
20183 8207 478 ART 1102 Female White ART 1102 retained
20183 8207 1306 ART 1130 Female White ART 1130 added
20183 8207 1307 ART 2345 Female White ART 2345 added
20183 8207 403 ART 1125 Female White ART 1125 dropped
20183 8270 1306 PSYC 1001 Female Black PSYC 1001 dropped
20183 8270 1307 PSYC 1003 Female Black PSYC 1003 added
20183 8270 403 PSYC 1002 Female Black PSYC 1002 dropped
20183 8209 1307 ART 2345 Female White ART 2345 added
20183 8270 1306 PSYC 1004 Female Black PSYC 1004 added
20183 8209 403 ART 1125 Female White ART 1125 dropped")