为 start_with() 的列组合 mutate case_when() 以替换某些字符

Question

我有一个复杂的数据框，看起来像 df1

library(tidyverse)

df <- tibble(position=c(100,200,300),
             correction=c("62M89S", 
                     "8M1D55M88S",
                     "1S25M1P36M89S"))

df1 <- df %>% 
  separate(correction, into = str_c("col", 1:5), 
           sep = "(?<=\D)(?=\d)", fill = "left", remove = FALSE)

df1
#> # A tibble: 3 × 7
#>   position correction    col1  col2  col3  col4  col5 
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>
#> 1      100 62M89S        <NA>  <NA>  <NA>  62M   89S  
#> 2      200 8M1D55M88S    <NA>  8M    1D    55M   88S  
#> 3      300 1S25M1P36M89S 1S    25M   1P    36M   89S

^{由 reprex package (v2.0.1)}

创建于 2022-03-02

我希望 starts_with("col") 的每一列仅替换字符串以 S、M 和 D 和 """[=36] 开头=] [空字符串] 和其余的其余为 0.

我希望我的数据看起来像这样

df1 #> # A tibble: 3 × 7 #> position correction col1 col2 col3 col4 col5 #> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> #> 1 100 62M89S <NA> <NA> <NA> 62 89 #> 2 200 8M1D55M88S <NA> 8 1 55 88 #> 3 300 1S25M1P36M89S 1 25 0 36 89

注意这里，包含 P 的单元格已转换为零。

这是我感到羞愧的拙劣努力

df1 %>% mutate(across(starts_with("col")), ~case_when(grepl("*M") | grepl("*S") | grepl("*D") ~ "", TRUE ~ 0))

Answer 1

这是使用 case_when 和 grepl 的一种可能性：

df1 %>% 
  mutate(
    across(starts_with("col"),~case_when(
      is.na(.) ~ NA_real_,
      grepl("[SMD]$", .) ~ parse_number(.),
      TRUE ~ 0
    )
  ))

# A tibble: 3 x 7
  position correction     col1  col2  col3  col4  col5
     <dbl> <chr>         <dbl> <dbl> <dbl> <dbl> <dbl>
1      100 62M89S           NA    NA    NA    62    89
2      200 8M1D55M88S       NA     8     1    55    88
3      300 1S25M1P36M89S     1    25     0    36    89

Answer 2

df1 %>% 
  mutate_at(vars(starts_with('col')), 
            ~ case_when(
                grepl('[SMD]$', .x) ~ sub('[SMD]', '', .x),
                grepl('P$'    , .x) ~ '0',
                TRUE                ~ .x)
  )

Answer 3

请使用 purrr 库中的 map_df() 函数和 stringr 中的 str_replace() 函数在下面找到另一个解决方案：

Reprex

代码

library(tidyverse)

df1 %>% select(starts_with("col")) %>% 
  map_df(., str_replace, ".P", "0") %>% 
  map_df(., str_replace, "\D$", "") %>% 
  bind_cols(df1 %>% select(-starts_with("col")),.)

输出

#> # A tibble: 3 x 7
#>   position correction    col1  col2  col3  col4  col5 
#>      <dbl> <chr>         <chr> <chr> <chr> <chr> <chr>
#> 1      100 62M89S        <NA>  <NA>  <NA>  62    89   
#> 2      200 8M1D55M88S    <NA>  8     1     55    88   
#> 3      300 1S25M1P36M89S 1     25    0     36    89

^{由 reprex package (v2.0.1)}

创建于 2022-03-02

为 start_with() 的列组合 mutate case_when() 以替换某些字符

Combine mutate case_when() for columns that start_with() to replace certain characters

r

dataframe

stringr

dplyr

tidyverse