R 使用 case_when 按组跟踪列中的变化

R Using case_when to track changes in a column by group

我有一个课程注册数据集,我试图在其中跟踪学生在整个学期中是否放弃、添加或保留了一门课程,并确定他们的注册情况 'path'。 IE。我想记录他们是否注册了 BIOL101 并放弃了它以参加 BIOL202。我的数据框如下所示:

YRTR    TECH_ID COU_ID  SUBJ    COU_NBR GENDER  RACE    sub_cou     status     path
20173   108      217    MUSC    2231    Male    White   MUSC 2231   retained
20173   108      218    MUSC    2281    Male    White   MUSC 2281   retained
20173   8429     574    ECON    2201    Male    White   ECON 2201   retained
20173   8429     720    BUSN    2120    Male    White   BUSN 2120   retained
20173   9883     60     ECON    2202    Male    White   ECON 2202   added
20173   15515    95     PHIL    1102    Female  White   PHIL 1102   retained
20183   8207     478    ART     1102    Female  White   ART 1102    retained
20183   8207     1306   ART     1130    Female  White   ART 1130    added
20183   8207     403    ART     1125    Female  White   ART 1125    dropped


我正在尝试填写最右侧的“路径”栏。这个想法是,如果学生保留在第一行中的课程中,路径将显示为 2231->2231。具体来说,我正在查看 WITHIN 科目的课程转学。因此,在数据集的末尾,ID 8207 将有一个看起来像 1102->1102 的路径和另一个看起来像 1125->1130

的路径

我最初尝试将数据帧拆分为两个数据帧(一个在下降期之前,一个在下降期之后),然后像这样重新加入它们:

data5 <- merge(x=post_drop, y=pre_drop, by=c("TECH_ID", "YRTR", "SUBJ"), all=TRUE)

然后使用case_when分配路径:

data5$status.x=="retained" ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.x),
((data5$status.x=="added") & (data5$status.y=="dropped")) ~ paste0(data5$COU_NBR.y, "->", data5$COU_NBR.x),
((data5$status.x=="dropped") & (data5$status.y=="added")) ~ paste0(data5$COU_NBR.x, "->", data5$COU_NBR.y)                
)

但这并没有让我到达我想要的位置 - 它在路径中留下了很多 NA,也没有告诉我学生是否放弃了一个科目中的课程并且没有注册另一个科目(即放弃BIOL101 而不是服用另一个 BIOL class) 在这种情况下我想要类似 101->NA 的东西或者当 class 被简单地添加时(即它们没有在 BIOL class 中注册] 最初但决定注册 BIOL101) 格式如下 NA->101

已编辑解决方案 9 月 27 日

再次问好@alexvc 这是一个开始。了解一点你的数据。您忘记了学生丢掉 1 并添加 2 的情况,在这种情况下,“路径”变得混乱。我已经为您提供了一个解决方案,可以清楚地显示 path

library(dplyr)
library(tidyr)

data5 %>%
   group_by(YRTR, TECH_ID, SUBJ, status) %>%
   mutate(numbadd =
             case_when(
                status == "added" ~ 1,
                TRUE ~ 0
             ),
          numbdrop =
             case_when(
                status == "dropped" ~ 1,
                       TRUE ~ 0
            ),
          rightside =
             case_when(
                numbadd == 1 ~ paste(COU_NBR, collapse = " and ")
                ),
          leftside =
             case_when(
                numbdrop == 1 ~ paste(COU_NBR, collapse = " and ")
             )
   ) %>%
   group_by(YRTR, TECH_ID, SUBJ) %>%
   mutate(total_add_drop = ifelse(status == "retained", 
                                  0, 
                                  sum(numbadd) + sum(numbdrop))) %>%
   tidyr::fill(leftside, rightside, .direction = "downup") %>%
   group_by(YRTR, TECH_ID, SUBJ, status) %>%
   mutate(PATH =
             case_when(
                status == "retained" ~ paste(COU_NBR, 
                                             COU_NBR, 
                                             sep = " -> "),
                status == "added" & total_add_drop == 1 ~ paste("NA", 
                                                                COU_NBR, 
                                                                sep = " -> "),
                status == "dropped" & total_add_drop == 1 ~ paste(COU_NBR, 
                                                                  "NA", 
                                                                  sep = " -> "),
                total_add_drop >= 2 ~ paste(leftside, 
                                            rightside, 
                                            sep = " -> "),
                TRUE ~ "Theres a problem"
             )) %>%
   arrange(YRTR, TECH_ID) %>%
   select(-COU_ID, -GENDER, -RACE, -rightside, -leftside, -numbadd, -numbdrop, -total_add_drop)

#> # A tibble: 17 x 7
#> # Groups:   YRTR, TECH_ID, SUBJ, status [13]
#>     YRTR TECH_ID SUBJ  COU_NBR sub_cou   status   PATH                          
#>    <dbl>   <dbl> <chr>   <dbl> <chr>     <chr>    <chr>                         
#>  1 20173     108 MUSC     2231 MUSC 2231 retained 2231 -> 2231                  
#>  2 20173     108 MUSC     2281 MUSC 2281 retained 2281 -> 2281                  
#>  3 20173    3889 ECON     2202 ECON 2202 dropped  2202 -> NA                    
#>  4 20173    8429 ECON     2201 ECON 2201 retained 2201 -> 2201                  
#>  5 20173    8429 BUSN     2120 BUSN 2120 retained 2120 -> 2120                  
#>  6 20173    9883 ECON     2202 ECON 2202 added    NA -> 2202                    
#>  7 20173   15515 PHIL     1102 PHIL 1102 retained 1102 -> 1102                  
#>  8 20183    8207 ART      1102 ART 1102  retained 1102 -> 1102                  
#>  9 20183    8207 ART      1130 ART 1130  added    1125 -> 1130 and 2345         
#> 10 20183    8207 ART      2345 ART 2345  added    1125 -> 1130 and 2345         
#> 11 20183    8207 ART      1125 ART 1125  dropped  1125 -> 1130 and 2345         
#> 12 20183    8209 ART      2345 ART 2345  added    1125 -> 2345                  
#> 13 20183    8209 ART      1125 ART 1125  dropped  1125 -> 2345                  
#> 14 20183    8270 PSYC     1001 PSYC 1001 dropped  1001 and 1002 -> 1003 and 1004
#> 15 20183    8270 PSYC     1003 PSYC 1003 added    1001 and 1002 -> 1003 and 1004
#> 16 20183    8270 PSYC     1002 PSYC 1002 dropped  1001 and 1002 -> 1003 and 1004
#> 17 20183    8270 PSYC     1004 PSYC 1004 added    1001 and 1002 -> 1003 and 1004

你的数据和额外的测试用例

data5 <- readr::read_table(
   "   YRTR    TECH_ID COU_ID  SUBJ    COU_NBR GENDER  RACE    sub_cou     status
   20173   108      217    MUSC    2231    Male    White   MUSC 2231   retained
   20173   108      218    MUSC    2281    Male    White   MUSC 2281   retained
   20173   8429     574    ECON    2201    Male    White   ECON 2201   retained
   20173   8429     720    BUSN    2120    Male    White   BUSN 2120   retained
   20173   9883     60     ECON    2202    Male    White   ECON 2202   added
   20173   3889     60     ECON    2202    Male    White   ECON 2202   dropped
   20173   15515    95     PHIL    1102    Female  White   PHIL 1102   retained
   20183   8207     478    ART     1102    Female  White   ART 1102    retained
   20183   8207     1306   ART     1130    Female  White   ART 1130    added
   20183   8207     1307   ART     2345    Female  White   ART 2345    added
   20183   8207     403    ART     1125    Female  White   ART 1125    dropped
   20183   8270     1306   PSYC    1001    Female  Black   PSYC 1001    dropped
   20183   8270     1307   PSYC    1003    Female  Black   PSYC 1003    added
   20183   8270     403    PSYC    1002    Female  Black   PSYC 1002    dropped
   20183   8209     1307   ART     2345    Female  White   ART 2345    added
   20183   8270     1306   PSYC    1004    Female  Black   PSYC 1004    added
   20183   8209     403    ART     1125    Female  White   ART 1125    dropped")