使用重复的分隔符拆分字符串

Split string with repeated delimiters

我在 R 中有一个如下形式的字符串:

example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")

我希望获得两列:

namei1 namej1   | surname1
name2           | surnamei2 surnamej2
name3           | surname3

我尝试使用字符串拆分:

example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\,+[[:space:]]"
str_split(example, pattern)

但是,我从这里卡住了……

我们可以在 , 处拆分字符串,后跟零个或多个 spaces (\s*),然后根据 'name' 的出现创建分组变量string and split the vector (v1) into a list of vectors, rbind thelistelements and convert it to adata.frame`

v1 <- strsplit(example, ",\s*")[[1]]
setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\bname',
       v1)))), paste0("V", 1:2))
#       V1                  V2
#1 namei1 namej1            surname1
#2         name2 surnamei2 surnamej2
#3         name3            surname3

或者另一种选择是 scan 并将其转换为两列 matrix

as.data.frame( matrix(trimws(scan(text = example, sep=",",
      what = "", quiet = TRUE)), byrow = TRUE, ncol = 2))
#       V1                  V2
#1 namei1 namej1            surname1
#2         name2 surnamei2 surnamej2
#3         name3            surname3

或者另一种选择是 gsub,我们将 , 后跟 space 和 'name' 字符串替换为 \n 和 'name' 以及在 read.csv 中使用它根据分隔符 ,

进行拆分
read.csv(text = gsub(", name", "\nname", example), header= FALSE)
#         V1                   V2
#1 namei1 namej1             surname1
#2         name2  surnamei2 surnamej2
#3         name3             surname3
read.csv(text = gsub("([^,]+,[^,]+),", "\1\n", example), 
         header = FALSE, stringsAsFactors = FALSE)
#              V1                   V2
# 1 namei1 namej1             surname1
# 2         name2  surnamei2 surnamej2
# 3         name3             surname3
data.frame(split(unlist(strsplit(example, ", ")), c(0, 1)))
#             X0                  X1
#1 namei1 namej1            surname1
#2         name2 surnamei2 surnamej2
#3         name3            surname3