在 R 中减去字符串
Subtracting strings in R
是否有一种简单的方法可以减去 tibble 或 data.frame 中跨列的字符串?
例如在下面的小标题中,有没有一种方法可以轻松地从 a 列和 c 列创建 b 列?类似于我如何从 a 和 b 创建 c? (即 c = a + b,所以 b = c - a)。
ex1 <- tibble(a = rep(c("orange", "green", "grey"), 2),
b = rep(c("ball", "hockey puck"), each = 3),
c = str_c(a, " ", b))
我希望该解决方案适用于 a 列和 b 列中每个字符串中任意数量的单词。
例如,我正在按照下面的代码思考一些事情(分解成单词并进行 pair-wise 比较),但它并不完全有效。
ex1 %>%
separate_rows(c) %>%
filter(b != c) %>%
group_by(a, b) %>%
summarize(a2 = str_c(c, collapse = " "))
有什么想法吗?
您可以编写一个函数来执行此操作
`%-%`=function(x,y)sub(paste0("\s*",y,"\s*",collapse="|"),"",x)
ex1$c%-%ex1$a # To obtain b ie c-a
[1] "ball" "ball" "ball" "hockey puck" "hockey puck" "hockey puck"
ex1$c%-%ex1$b # To obtain a ie c-b
[1] "orange" "green" "grey" "orange" "green" "grey"
这些都应该有效:
ex1 %>%
rowwise() %>%
mutate( b = sub(a, "", c) %>% str_trim() )
# # A tibble: 6 x 3
# a b c
# <chr> <chr> <chr>
# 1 orange ball orange ball
# 2 green ball green ball
# 3 grey ball grey ball
# 4 orange hockey puck orange hockey puck
# 5 green hockey puck green hockey puck
# 6 grey hockey puck grey hockey puck
ex1 %>% mutate( b = str_replace(ex1$c, ex1$a, "") %>% str_trim() )
# # A tibble: 6 x 3
# a b c
# <chr> <chr> <chr>
# 1 orange ball orange ball
# 2 green ball green ball
# 3 grey ball grey ball
# 4 orange hockey puck orange hockey puck
# 5 green hockey puck green hockey puck
# 6 grey hockey puck grey hockey puck
是否有一种简单的方法可以减去 tibble 或 data.frame 中跨列的字符串?
例如在下面的小标题中,有没有一种方法可以轻松地从 a 列和 c 列创建 b 列?类似于我如何从 a 和 b 创建 c? (即 c = a + b,所以 b = c - a)。
ex1 <- tibble(a = rep(c("orange", "green", "grey"), 2),
b = rep(c("ball", "hockey puck"), each = 3),
c = str_c(a, " ", b))
我希望该解决方案适用于 a 列和 b 列中每个字符串中任意数量的单词。
例如,我正在按照下面的代码思考一些事情(分解成单词并进行 pair-wise 比较),但它并不完全有效。
ex1 %>%
separate_rows(c) %>%
filter(b != c) %>%
group_by(a, b) %>%
summarize(a2 = str_c(c, collapse = " "))
有什么想法吗?
您可以编写一个函数来执行此操作
`%-%`=function(x,y)sub(paste0("\s*",y,"\s*",collapse="|"),"",x)
ex1$c%-%ex1$a # To obtain b ie c-a
[1] "ball" "ball" "ball" "hockey puck" "hockey puck" "hockey puck"
ex1$c%-%ex1$b # To obtain a ie c-b
[1] "orange" "green" "grey" "orange" "green" "grey"
这些都应该有效:
ex1 %>%
rowwise() %>%
mutate( b = sub(a, "", c) %>% str_trim() )
# # A tibble: 6 x 3
# a b c
# <chr> <chr> <chr>
# 1 orange ball orange ball
# 2 green ball green ball
# 3 grey ball grey ball
# 4 orange hockey puck orange hockey puck
# 5 green hockey puck green hockey puck
# 6 grey hockey puck grey hockey puck
ex1 %>% mutate( b = str_replace(ex1$c, ex1$a, "") %>% str_trim() )
# # A tibble: 6 x 3
# a b c
# <chr> <chr> <chr>
# 1 orange ball orange ball
# 2 green ball green ball
# 3 grey ball grey ball
# 4 orange hockey puck orange hockey puck
# 5 green hockey puck green hockey puck
# 6 grey hockey puck grey hockey puck