基于多个模式创建列

Create a column based on multiple patterns

我有一个包含 692 个学位的列,我需要将其分类为:证书、副学士、学士、硕士或更高。学位名称存在很多不一致之处。例如,BS 学位可能包括 BS、BS、护理学 BS、BSE、B.S。会计学、理学学士、遗传学理学学士等。每一个都需要归类为“单身汉”。

我尝试使用 str_detect 检测尽可能多的字符串,但不是很成功。我如何检测这些不同类型的学位?

What I have What I need
Bachelor of Science Bachelor
BA Bachelor
BFA Bachelor
Certificate in Nursing Certificate
Associates in Art Associate
AA Associate
MS Master or higher
Masters of Art Master or higher

也许是这样的?

library(tidyverse)

df <- 
tibble::tribble(
            ~What.I.have,       ~What.I.need,
   "Bachelor of Science",         "Bachelor",
                    "BA",         "Bachelor",
                   "BFA",         "Bachelor",
"Certificate in Nursing",      "Certificate",
     "Associates in Art",        "Associate",
                    "AA",        "Associate",
                    "MS", "Master or higher",
        "Masters of Art", "Master or higher"
)

df %>% mutate(new = case_when(str_detect(What.I.have, 'Bachelor|BA|BFA') ~ 'Bachelor',
                              str_detect(What.I.have, 'Certificate') ~ 'Certificate',
                              str_detect(What.I.have, 'Associates|AA') ~ 'Associate',
                              str_detect(What.I.have, 'Masters|MS') ~ 'Master or higher'))
#> # A tibble: 8 × 3
#>   What.I.have            What.I.need      new             
#>   <chr>                  <chr>            <chr>           
#> 1 Bachelor of Science    Bachelor         Bachelor        
#> 2 BA                     Bachelor         Bachelor        
#> 3 BFA                    Bachelor         Bachelor        
#> 4 Certificate in Nursing Certificate      Certificate     
#> 5 Associates in Art      Associate        Associate       
#> 6 AA                     Associate        Associate       
#> 7 MS                     Master or higher Master or higher
#> 8 Masters of Art         Master or higher Master or higher

reprex package (v2.0.1)

创建于 2022-01-04

@Onyambu 在评论中提出的建议也将为该数据产生正确的结果。

library(tidyverse)

df %>%
  transmute(new = case_when(
    str_detect(What.I.have, "^B") ~ "Bachelor",
    str_detect(What.I.have, "^C") ~ "Certificate",
    str_detect(What.I.have, "^A") ~ "Associate",
    str_detect(What.I.have, "^M") ~ "Master or higher"
  ))
#> # A tibble: 8 × 1
#>   new             
#>   <chr>           
#> 1 Bachelor        
#> 2 Bachelor        
#> 3 Bachelor        
#> 4 Certificate     
#> 5 Associate       
#> 6 Associate       
#> 7 Master or higher
#> 8 Master or higher

reprex package (v2.0.1)

创建于 2022-01-04