如何 "subtract" 字符串出现在一个数据框列中,来自另一列中另一个字符串的末尾

How to "subtract" string appearing in one dataframe column from the end of another string in a different column

在具有两个字符列的数据框中,如何从出现在另一列中的字符串的 end 中“减去”出现在一列中的字符串?

例子

这是一些玩具数据:

library(tibble)

my_df <-
  tribble(~full_string,    ~substring_to_remove,
          "dogz",          "z",
          "catap",         "ap",
          "tigera",        "a",
          "mouseppt",      "ppt",
          "kangaroolllyt", "lllyt",
          "qlionq",         "q",
          "zebra",          "z")

my_df
#> # A tibble: 7 x 2
#>   full_string   substring_to_remove
#>   <chr>         <chr>              
#> 1 dogz          z                  
#> 2 catap         ap                 
#> 3 tigera        a                  
#> 4 mouseppt      ppt                
#> 5 kangaroolllyt lllyt              
#> 6 qlionq        q                  
#> 7 zebra         z

reprex package (v2.0.0)

于 2021-07-15 创建

期望的输出

我想创建另一个列 clean_string,它将从 full_stringend 中“减去”substring_to_remove 中的字符串。

## # A tibble: 7 x 3
##   full_string   substring_to_remove clean_string
##   <chr>         <chr>               <chr>       
## 1 dogz          z                   dog         
## 2 catap         ap                  cat         
## 3 tigera        a                   tiger       
## 4 mouseppt      ppt                 mouse       
## 5 kangaroolllyt lllyt               kangaroo    
## 6 qlionq        q                   qlion       
## 7 zebra         z                   zebra 

编辑


如果这个问题不太贪心,如果有一个data.table解决这个问题的方法会很有帮助,因为这个操作对于大型数据集来说非常耗时。

这应该可以解决问题:

library(tidyverse)
my_df <- my_df %>% 
  mutate(cleans_string = str_remove(full_string, paste0(substring_to_remove,"$")))

输出:

  full_string   substring_to_remove cleans_string
  <chr>         <chr>               <chr>       
1 dogz          z                   dog         
2 catap         ap                  cat         
3 tigera        a                   tiger       
4 mouseppt      ppt                 mouse       
5 kangaroolllyt lllyt               kangaroo    
6 qlionq        q                   qlion       
7 zebra         z                   zebra  

也许 endsWithsubstr 一起使用。

my_df$clean_string <- substr(my_df$full_string, 1, nchar(my_df$full_string) -
         endsWith(my_df$full_string, my_df$substring_to_remove) *
            nchar(my_df$substring_to_remove))

my_df
## A tibble: 7 x 3
#  full_string   substring_to_remove clean_string
#  <chr>         <chr>               <chr>       
#1 dogz          z                   dog         
#2 catap         ap                  cat         
#3 tigera        a                   tiger       
#4 mouseppt      ppt                 mouse       
#5 kangaroolllyt lllyt               kangaroo    
#6 qlionq        q                   qlion       
#7 zebra         z                   zebra       

或使用sub:

my_df$clean_string <- mapply(sub, paste0(my_df$substring_to_remove, "$"), ""
                           , my_df$full_string)

data.table版本

setDT(my_df)[,clean_string:=stringr::str_remove(full_string, paste0(substring_to_remove, "$"))]