在 R 中减去字符串

Subtracting strings in R

是否有一种简单的方法可以减去 tibble 或 data.frame 中跨列的字符串?

例如在下面的小标题中,有没有一种方法可以轻松地从 a 列和 c 列创建 b 列?类似于我如何从 a 和 b 创建 c? (即 c = a + b,所以 b = c - a)。

ex1 <- tibble(a = rep(c("orange", "green", "grey"), 2),
   b = rep(c("ball", "hockey puck"), each = 3),
   c = str_c(a, " ", b))

我希望该解决方案适用于 a 列和 b 列中每个字符串中任意数量的单词。

例如,我正在按照下面的代码思考一些事情(分解成单词并进行 pair-wise 比较),但它并不完全有效。

ex1 %>% 
  separate_rows(c) %>% 
  filter(b != c) %>% 
  group_by(a, b) %>% 
  summarize(a2 = str_c(c, collapse = " "))

有什么想法吗?

您可以编写一个函数来执行此操作

`%-%`=function(x,y)sub(paste0("\s*",y,"\s*",collapse="|"),"",x)
ex1$c%-%ex1$a # To obtain b ie c-a
[1] "ball"        "ball"        "ball"        "hockey puck" "hockey puck" "hockey puck"
ex1$c%-%ex1$b # To obtain a ie c-b
[1] "orange" "green"  "grey"   "orange" "green"  "grey"  

这些都应该有效:

ex1 %>% 
  rowwise() %>% 
  mutate( b = sub(a, "", c) %>% str_trim() )

# # A tibble: 6 x 3
#        a            b                  c
#    <chr>        <chr>              <chr>
# 1 orange         ball        orange ball
# 2  green         ball         green ball
# 3   grey         ball          grey ball
# 4 orange  hockey puck orange hockey puck
# 5  green  hockey puck  green hockey puck
# 6   grey  hockey puck   grey hockey puck

ex1 %>% mutate( b = str_replace(ex1$c, ex1$a, "") %>% str_trim() )

# # A tibble: 6 x 3
#        a           b                  c
#    <chr>       <chr>              <chr>
# 1 orange        ball        orange ball
# 2  green        ball         green ball
# 3   grey        ball          grey ball
# 4 orange hockey puck orange hockey puck
# 5  green hockey puck  green hockey puck
# 6   grey hockey puck   grey hockey puck