使用重复的分隔符拆分字符串
Split string with repeated delimiters
我在 R 中有一个如下形式的字符串:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
我希望获得两列:
namei1 namej1 | surname1
name2 | surnamei2 surnamej2
name3 | surname3
我尝试使用字符串拆分:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\,+[[:space:]]"
str_split(example, pattern)
但是,我从这里卡住了……
我们可以在 ,
处拆分字符串,后跟零个或多个 spaces (\s*
),然后根据 'name' 的出现创建分组变量string and split
the vector
(v1
) into a list
of vector
s, rbind the
listelements and convert it to a
data.frame`
v1 <- strsplit(example, ",\s*")[[1]]
setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\bname',
v1)))), paste0("V", 1:2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
或者另一种选择是 scan
并将其转换为两列 matrix
as.data.frame( matrix(trimws(scan(text = example, sep=",",
what = "", quiet = TRUE)), byrow = TRUE, ncol = 2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
或者另一种选择是 gsub
,我们将 ,
后跟 space 和 'name' 字符串替换为 \n
和 'name' 以及在 read.csv
中使用它根据分隔符 ,
进行拆分
read.csv(text = gsub(", name", "\nname", example), header= FALSE)
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
read.csv(text = gsub("([^,]+,[^,]+),", "\1\n", example),
header = FALSE, stringsAsFactors = FALSE)
# V1 V2
# 1 namei1 namej1 surname1
# 2 name2 surnamei2 surnamej2
# 3 name3 surname3
data.frame(split(unlist(strsplit(example, ", ")), c(0, 1)))
# X0 X1
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
我在 R 中有一个如下形式的字符串:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
我希望获得两列:
namei1 namej1 | surname1
name2 | surnamei2 surnamej2
name3 | surname3
我尝试使用字符串拆分:
example <- c("namei1 namej1, surname1, name2, surnamei2 surnamej2, name3, surname3")
pattern <- "\,+[[:space:]]"
str_split(example, pattern)
但是,我从这里卡住了……
我们可以在 ,
处拆分字符串,后跟零个或多个 spaces (\s*
),然后根据 'name' 的出现创建分组变量string and split
the vector
(v1
) into a list
of vector
s, rbind the
listelements and convert it to a
data.frame`
v1 <- strsplit(example, ",\s*")[[1]]
setNames(do.call(rbind.data.frame, split(v1, cumsum(grepl('\bname',
v1)))), paste0("V", 1:2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
或者另一种选择是 scan
并将其转换为两列 matrix
as.data.frame( matrix(trimws(scan(text = example, sep=",",
what = "", quiet = TRUE)), byrow = TRUE, ncol = 2))
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
或者另一种选择是 gsub
,我们将 ,
后跟 space 和 'name' 字符串替换为 \n
和 'name' 以及在 read.csv
中使用它根据分隔符 ,
read.csv(text = gsub(", name", "\nname", example), header= FALSE)
# V1 V2
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3
read.csv(text = gsub("([^,]+,[^,]+),", "\1\n", example),
header = FALSE, stringsAsFactors = FALSE)
# V1 V2
# 1 namei1 namej1 surname1
# 2 name2 surnamei2 surnamej2
# 3 name3 surname3
data.frame(split(unlist(strsplit(example, ", ")), c(0, 1)))
# X0 X1
#1 namei1 namej1 surname1
#2 name2 surnamei2 surnamej2
#3 name3 surname3