基于多个模式创建列
Create a column based on multiple patterns
我有一个包含 692 个学位的列,我需要将其分类为:证书、副学士、学士、硕士或更高。学位名称存在很多不一致之处。例如,BS 学位可能包括 BS、BS、护理学 BS、BSE、B.S。会计学、理学学士、遗传学理学学士等。每一个都需要归类为“单身汉”。
我尝试使用 str_detect 检测尽可能多的字符串,但不是很成功。我如何检测这些不同类型的学位?
What I have
What I need
Bachelor of Science
Bachelor
BA
Bachelor
BFA
Bachelor
Certificate in Nursing
Certificate
Associates in Art
Associate
AA
Associate
MS
Master or higher
Masters of Art
Master or higher
也许是这样的?
library(tidyverse)
df <-
tibble::tribble(
~What.I.have, ~What.I.need,
"Bachelor of Science", "Bachelor",
"BA", "Bachelor",
"BFA", "Bachelor",
"Certificate in Nursing", "Certificate",
"Associates in Art", "Associate",
"AA", "Associate",
"MS", "Master or higher",
"Masters of Art", "Master or higher"
)
df %>% mutate(new = case_when(str_detect(What.I.have, 'Bachelor|BA|BFA') ~ 'Bachelor',
str_detect(What.I.have, 'Certificate') ~ 'Certificate',
str_detect(What.I.have, 'Associates|AA') ~ 'Associate',
str_detect(What.I.have, 'Masters|MS') ~ 'Master or higher'))
#> # A tibble: 8 × 3
#> What.I.have What.I.need new
#> <chr> <chr> <chr>
#> 1 Bachelor of Science Bachelor Bachelor
#> 2 BA Bachelor Bachelor
#> 3 BFA Bachelor Bachelor
#> 4 Certificate in Nursing Certificate Certificate
#> 5 Associates in Art Associate Associate
#> 6 AA Associate Associate
#> 7 MS Master or higher Master or higher
#> 8 Masters of Art Master or higher Master or higher
由 reprex package (v2.0.1)
创建于 2022-01-04
@Onyambu 在评论中提出的建议也将为该数据产生正确的结果。
library(tidyverse)
df %>%
transmute(new = case_when(
str_detect(What.I.have, "^B") ~ "Bachelor",
str_detect(What.I.have, "^C") ~ "Certificate",
str_detect(What.I.have, "^A") ~ "Associate",
str_detect(What.I.have, "^M") ~ "Master or higher"
))
#> # A tibble: 8 × 1
#> new
#> <chr>
#> 1 Bachelor
#> 2 Bachelor
#> 3 Bachelor
#> 4 Certificate
#> 5 Associate
#> 6 Associate
#> 7 Master or higher
#> 8 Master or higher
由 reprex package (v2.0.1)
创建于 2022-01-04
我有一个包含 692 个学位的列,我需要将其分类为:证书、副学士、学士、硕士或更高。学位名称存在很多不一致之处。例如,BS 学位可能包括 BS、BS、护理学 BS、BSE、B.S。会计学、理学学士、遗传学理学学士等。每一个都需要归类为“单身汉”。
我尝试使用 str_detect 检测尽可能多的字符串,但不是很成功。我如何检测这些不同类型的学位?
What I have | What I need |
---|---|
Bachelor of Science | Bachelor |
BA | Bachelor |
BFA | Bachelor |
Certificate in Nursing | Certificate |
Associates in Art | Associate |
AA | Associate |
MS | Master or higher |
Masters of Art | Master or higher |
也许是这样的?
library(tidyverse)
df <-
tibble::tribble(
~What.I.have, ~What.I.need,
"Bachelor of Science", "Bachelor",
"BA", "Bachelor",
"BFA", "Bachelor",
"Certificate in Nursing", "Certificate",
"Associates in Art", "Associate",
"AA", "Associate",
"MS", "Master or higher",
"Masters of Art", "Master or higher"
)
df %>% mutate(new = case_when(str_detect(What.I.have, 'Bachelor|BA|BFA') ~ 'Bachelor',
str_detect(What.I.have, 'Certificate') ~ 'Certificate',
str_detect(What.I.have, 'Associates|AA') ~ 'Associate',
str_detect(What.I.have, 'Masters|MS') ~ 'Master or higher'))
#> # A tibble: 8 × 3
#> What.I.have What.I.need new
#> <chr> <chr> <chr>
#> 1 Bachelor of Science Bachelor Bachelor
#> 2 BA Bachelor Bachelor
#> 3 BFA Bachelor Bachelor
#> 4 Certificate in Nursing Certificate Certificate
#> 5 Associates in Art Associate Associate
#> 6 AA Associate Associate
#> 7 MS Master or higher Master or higher
#> 8 Masters of Art Master or higher Master or higher
由 reprex package (v2.0.1)
创建于 2022-01-04@Onyambu 在评论中提出的建议也将为该数据产生正确的结果。
library(tidyverse)
df %>%
transmute(new = case_when(
str_detect(What.I.have, "^B") ~ "Bachelor",
str_detect(What.I.have, "^C") ~ "Certificate",
str_detect(What.I.have, "^A") ~ "Associate",
str_detect(What.I.have, "^M") ~ "Master or higher"
))
#> # A tibble: 8 × 1
#> new
#> <chr>
#> 1 Bachelor
#> 2 Bachelor
#> 3 Bachelor
#> 4 Certificate
#> 5 Associate
#> 6 Associate
#> 7 Master or higher
#> 8 Master or higher
由 reprex package (v2.0.1)
创建于 2022-01-04