按值从另一列拆分的字符串
String split by value from another column
您好,我有这个数据框 (DF1)
structure(list(Value = list("Peter", "John", c("Patric", "Harry")),Text = c("Hello Peter How are you","Is it John? Yes It is John, Harry","Hello Patric, how are you. Well, Harry thank you.")) , class = "data.frame", row.names = c(NA, -3L))
Value Text
1 Peter Hello Peter How are you
2 John Is it John? Yes It is John, Harry
3 c(Patric, Harry) Hello Patric, how are you. Well, Harry thank you.
而且我想按值中的名称拆分文本中的句子以获得此
Value Text Split
1 Peter Hello Peter How are you c("Hello", "Peter How are you")
2 John Is it John? Yes It is John, Harry c("Is it", "John? Yes It is John, Harry")
3 c(Patric, Harry) Hello Patric, how are you. Well, Harry thank you c("Hello", "Patric, how are you. Well,", "Harry thank you")
我试过了
DF1 %>% mutate(Split = strsplit(as.character(Text),as.character(Value)))
但是不行
数据
假设这是真实的结构:
df <- structure(list(Value = list("Peter", "John", c("Patric", "Harry")),
Text = c("Hello Peter How are you","Is it John? Yes It is John, Harry","Hello Patric, how are you. Well, Harry thank you.")),
class = "data.frame", row.names = c(NA, -3L))
第一个解决方案:双for循环
你可以用双for循环解决你的问题。这可能是一个更易读且更易于调试的解决方案。
library(stringr)
Split <- list()
for(i in seq_len(nrow(df))){
text <- df$Text[i]
value <- df$Value[[i]]
for(j in seq_along(value)){
text2 <- str_split(text[length(text)], paste0("(?<=.)(?=", value[[j]], ")"), n = 2)[[1]]
text <- c(text[-length(text)], text2)
}
Split[[i]] <- text
}
df$Split <- Split
如果你打印 df
看起来你有一个唯一的字符串,但事实并非如此。
df$Split
#> [[1]]
#> [1] "Hello " "Peter How are you"
#>
#> [[2]]
#> [1] "Is it " "John? Yes It is John, Harry"
#>
#> [[3]]
#> [1] "Hello " "Patric, how are you. Well, " "Harry thank you."
#>
第二种解决方案:tidyverse 和递归 fn
由于您最初尝试使用的是 dplyr
函数,因此您也可以使用递归函数以这种方式编写。此解决方案不使用 for 循环。
library(stringr)
library(purrr)
library(dplyr)
str_split_recursive <- function(string, pattern){
string <- str_split(string[length(string)], paste0("(?<=.)(?=", pattern[1], ")"), n = 2)[[1]]
pattern <- pattern[-1]
if(length(pattern) > 0) string <- c(string[-length(string)], str_split_recursive(string, pattern))
string
}
df <- df %>%
mutate(Split = map2(Text, Value, str_split_recursive))
您好,我有这个数据框 (DF1)
structure(list(Value = list("Peter", "John", c("Patric", "Harry")),Text = c("Hello Peter How are you","Is it John? Yes It is John, Harry","Hello Patric, how are you. Well, Harry thank you.")) , class = "data.frame", row.names = c(NA, -3L))
Value Text
1 Peter Hello Peter How are you
2 John Is it John? Yes It is John, Harry
3 c(Patric, Harry) Hello Patric, how are you. Well, Harry thank you.
而且我想按值中的名称拆分文本中的句子以获得此
Value Text Split
1 Peter Hello Peter How are you c("Hello", "Peter How are you")
2 John Is it John? Yes It is John, Harry c("Is it", "John? Yes It is John, Harry")
3 c(Patric, Harry) Hello Patric, how are you. Well, Harry thank you c("Hello", "Patric, how are you. Well,", "Harry thank you")
我试过了
DF1 %>% mutate(Split = strsplit(as.character(Text),as.character(Value)))
但是不行
数据
假设这是真实的结构:
df <- structure(list(Value = list("Peter", "John", c("Patric", "Harry")),
Text = c("Hello Peter How are you","Is it John? Yes It is John, Harry","Hello Patric, how are you. Well, Harry thank you.")),
class = "data.frame", row.names = c(NA, -3L))
第一个解决方案:双for循环
你可以用双for循环解决你的问题。这可能是一个更易读且更易于调试的解决方案。
library(stringr)
Split <- list()
for(i in seq_len(nrow(df))){
text <- df$Text[i]
value <- df$Value[[i]]
for(j in seq_along(value)){
text2 <- str_split(text[length(text)], paste0("(?<=.)(?=", value[[j]], ")"), n = 2)[[1]]
text <- c(text[-length(text)], text2)
}
Split[[i]] <- text
}
df$Split <- Split
如果你打印 df
看起来你有一个唯一的字符串,但事实并非如此。
df$Split
#> [[1]]
#> [1] "Hello " "Peter How are you"
#>
#> [[2]]
#> [1] "Is it " "John? Yes It is John, Harry"
#>
#> [[3]]
#> [1] "Hello " "Patric, how are you. Well, " "Harry thank you."
#>
第二种解决方案:tidyverse 和递归 fn
由于您最初尝试使用的是 dplyr
函数,因此您也可以使用递归函数以这种方式编写。此解决方案不使用 for 循环。
library(stringr)
library(purrr)
library(dplyr)
str_split_recursive <- function(string, pattern){
string <- str_split(string[length(string)], paste0("(?<=.)(?=", pattern[1], ")"), n = 2)[[1]]
pattern <- pattern[-1]
if(length(pattern) > 0) string <- c(string[-length(string)], str_split_recursive(string, pattern))
string
}
df <- df %>%
mutate(Split = map2(Text, Value, str_split_recursive))