在 dplyr 中使用 str_detect 和 case_when 在 R
In dplyr using str_detect and case_when in R
这是我的 df:
mydf <- structure(list(Action = c("Passes accurate", "Passes accurate",
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)",
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions",
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
我有这个向量:passes <- c('Passes','passes','Assists','Crosses')
我正在尝试这样做:mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))
但是我只有第一行填满了passes
。例如,我应该用 passes
填充前 4 行。也是第7排。如何使用 case_when
函数实现此目的?
您需要使用 paste(collapse = "|")
以便您可以将向量分解为一个由“|”分隔的字符串,然后 grepl()
可以查找元素 1 或元素 2 或元素3等反对“行动”。
library(dplyr)
passes <- c('Passes','passes','Assists','Crosses')
mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
我为此使用了str_sub()
。
mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))
print(mydf)
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
您可以轻松做到:
library(tidyverse)
mydf %>%
mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>
一个选项也是使用fuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
by = c("Action" = "passes")) %>%
select(-passes)
-输出
# A tibble: 10 × 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>
这是我的 df:
mydf <- structure(list(Action = c("Passes accurate", "Passes accurate",
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)",
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions",
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
我有这个向量:passes <- c('Passes','passes','Assists','Crosses')
我正在尝试这样做:mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))
但是我只有第一行填满了passes
。例如,我应该用 passes
填充前 4 行。也是第7排。如何使用 case_when
函数实现此目的?
您需要使用 paste(collapse = "|")
以便您可以将向量分解为一个由“|”分隔的字符串,然后 grepl()
可以查找元素 1 或元素 2 或元素3等反对“行动”。
library(dplyr)
passes <- c('Passes','passes','Assists','Crosses')
mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
我为此使用了str_sub()
。
mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))
print(mydf)
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls NA
6 Lost balls (in opp. half) NA
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) NA
9 Interceptions NA
10 Positional attacks NA
您可以轻松做到:
library(tidyverse)
mydf %>%
mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))
# A tibble: 10 x 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>
一个选项也是使用fuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
by = c("Action" = "passes")) %>%
select(-passes)
-输出
# A tibble: 10 × 2
Action newcol
<chr> <chr>
1 Passes accurate passes
2 Passes accurate passes
3 Passes accurate passes
4 Passes accurate passes
5 Lost balls <NA>
6 Lost balls (in opp. half) <NA>
7 Passes (inaccurate) passes
8 Interceptions (in opp. half) <NA>
9 Interceptions <NA>
10 Positional attacks <NA>