用多个定界符拆分列,保留定界符
Split column by multiple delimiters, keeping delimiters
如何使用 %、- 和 + 作为可能的分隔符将字符列拆分为 3 列,同时将分隔符保留在新列中?
示例数据:
data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100", "99.2%-500-200","90.1%+500-200"))
所需数据示例:
data.desired <- data.table(x1=c("92.1%", "90.4%", "92.8%","99.2%","90.1%")
, x2=c("+100","-1000","-200","-500","+500")
, x3=c("-200","+200","+100","-200","-200"))
很高兴为一个好的答案和一些帮助提供积分!
我们可以使用 tidyr
中的 separate
进行拆分,并使用积极的前瞻性来保留分隔符:
data %>% separate(x, c("x1", "x2", "x3"), sep = "(?=\+|-)")
# x1 x2 x3
# 1: 92.1% +100 -200
# 2: 90.4% -1000 +200
# 3: 92.8% -200 +100
# 4: 99.2% -500 -200
# 5: 90.1% +500 -200
也就是说,请注意,简单地除以 \+|-
我们会得到
data %>% separate(x, c("x1", "x2", "x3"), sep = "\+|-")
# x1 x2 x3
# 1: 92.1% 100 200
# 2: 90.4% 1000 200
# 3: 92.8% 200 100
# 4: 99.2% 500 200
# 5: 90.1% 500 200
使用 (?=\+|-)
在 "nothing" 处拆分,以防紧接着我们有 +
或 -
(不匹配)。
在 data.table
中相当于 tstrsplit
:
data[, c("x1","x2","x3") := tstrsplit(x, "(?<=.)(?=[+-])", perl=TRUE) ]
data
# x x1 x2 x3
#1: 92.1%+100-200 92.1% +100 -200
#2: 90.4%-1000+200 90.4% -1000 +200
#3: 92.8%-200+100 92.8% -200 +100
#4: 99.2%-500-200 99.2% -500 -200
#5: 90.1%+500-200 90.1% +500 -200
这是一个使用base R
的选项
cbind(data, read.csv(text = gsub("(?=[+-])", ",", data$x, perl = TRUE),
header = FALSE, stringsAsFactors = FALSE, col.names = c('x1', 'x2', 'x3')))
# x x1 x2 x3
#1: 92.1%+100-200 92.1% 100 -200
#2: 90.4%-1000+200 90.4% -1000 200
#3: 92.8%-200+100 92.8% -200 100
#4: 99.2%-500-200 99.2% -500 -200
#5: 90.1%+500-200 90.1% 500 -200
如何使用 %、- 和 + 作为可能的分隔符将字符列拆分为 3 列,同时将分隔符保留在新列中?
示例数据:
data <- data.table(x=c("92.1%+100-200","90.4%-1000+200", "92.8%-200+100", "99.2%-500-200","90.1%+500-200"))
所需数据示例:
data.desired <- data.table(x1=c("92.1%", "90.4%", "92.8%","99.2%","90.1%")
, x2=c("+100","-1000","-200","-500","+500")
, x3=c("-200","+200","+100","-200","-200"))
很高兴为一个好的答案和一些帮助提供积分!
我们可以使用 tidyr
中的 separate
进行拆分,并使用积极的前瞻性来保留分隔符:
data %>% separate(x, c("x1", "x2", "x3"), sep = "(?=\+|-)")
# x1 x2 x3
# 1: 92.1% +100 -200
# 2: 90.4% -1000 +200
# 3: 92.8% -200 +100
# 4: 99.2% -500 -200
# 5: 90.1% +500 -200
也就是说,请注意,简单地除以 \+|-
我们会得到
data %>% separate(x, c("x1", "x2", "x3"), sep = "\+|-")
# x1 x2 x3
# 1: 92.1% 100 200
# 2: 90.4% 1000 200
# 3: 92.8% 200 100
# 4: 99.2% 500 200
# 5: 90.1% 500 200
使用 (?=\+|-)
在 "nothing" 处拆分,以防紧接着我们有 +
或 -
(不匹配)。
在 data.table
中相当于 tstrsplit
:
data[, c("x1","x2","x3") := tstrsplit(x, "(?<=.)(?=[+-])", perl=TRUE) ]
data
# x x1 x2 x3
#1: 92.1%+100-200 92.1% +100 -200
#2: 90.4%-1000+200 90.4% -1000 +200
#3: 92.8%-200+100 92.8% -200 +100
#4: 99.2%-500-200 99.2% -500 -200
#5: 90.1%+500-200 90.1% +500 -200
这是一个使用base R
cbind(data, read.csv(text = gsub("(?=[+-])", ",", data$x, perl = TRUE),
header = FALSE, stringsAsFactors = FALSE, col.names = c('x1', 'x2', 'x3')))
# x x1 x2 x3
#1: 92.1%+100-200 92.1% 100 -200
#2: 90.4%-1000+200 90.4% -1000 200
#3: 92.8%-200+100 92.8% -200 100
#4: 99.2%-500-200 99.2% -500 -200
#5: 90.1%+500-200 90.1% 500 -200