在 dplyr 中使用 str_detect 和 case_when 在 R

In dplyr using str_detect and case_when in R

这是我的 df:

mydf <- structure(list(Action = c("Passes accurate", "Passes accurate", 
"Passes accurate", "Passes accurate", "Lost balls", "Lost balls (in opp. half)", 
"Passes (inaccurate)", "Interceptions (in opp. half)", "Interceptions", 
"Positional attacks")), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

我有这个向量:passes <- c('Passes','passes','Assists','Crosses')

我正在尝试这样做:mydf %>% mutate(newcol = case_when(str_detect(Action, passes) ~ 'passes'))

但是我只有第一行填满了passes。例如,我应该用 passes 填充前 4 行。也是第7排。如何使用 case_when 函数实现此目的?

您需要使用 paste(collapse = "|") 以便您可以将向量分解为一个由“|”分隔的字符串,然后 grepl() 可以查找元素 1 或元素 2 或元素3等反对“行动”。

library(dplyr)

passes <- c('Passes','passes','Assists','Crosses')

mydf %>% mutate(newcol = case_when(grepl(paste(passes, collapse = "|"), Action) ~ "passes"))

# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA    

我为此使用了str_sub()

mydf %>% mutate(newcol = case_when(str_sub(Action,1,6) == 'Passes' ~ "passes"))

print(mydf)
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   NA    
 6 Lost balls (in opp. half)    NA    
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) NA    
 9 Interceptions                NA    
10 Positional attacks           NA 

您可以轻松做到:

library(tidyverse)
mydf %>%
  mutate(newcol = if_else(str_detect(Action, paste0(passes, collapse = '|')), 'passes', NA_character_))

# A tibble: 10 x 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>  

一个选项也是使用fuzzyjoin

library(fuzzyjoin)
library(dplyr)
regex_left_join(mydf, tibble(passes, newcol = "passes"),
    by = c("Action" = "passes")) %>%
   select(-passes)

-输出

# A tibble: 10 × 2
   Action                       newcol
   <chr>                        <chr> 
 1 Passes accurate              passes
 2 Passes accurate              passes
 3 Passes accurate              passes
 4 Passes accurate              passes
 5 Lost balls                   <NA>  
 6 Lost balls (in opp. half)    <NA>  
 7 Passes (inaccurate)          passes
 8 Interceptions (in opp. half) <NA>  
 9 Interceptions                <NA>  
10 Positional attacks           <NA>