通过正则表达式将向量拆分为数据帧
splitting vector by regular expression into dataframe
我有一个看起来像这样的矢量
head(val)
[1] "PD2323 [403-407]" "P05230 [455-459]"
我想将它拆分成一个包含 3 列和许多行的数据框。输出应如下所示:
head(output)
[,1] [,2] [,3]
[1,] "P20700" 403 407
[2,] "P05787" 455 459
[3,] "O14641" 168 178
但是,当我尝试设置它时,我最终得到了一个超过 3 列的矩阵
head(strsplit(val, "\s+"))
[[1]]
[1] "PD2323" "[403-407]"
[[2]]
[1] "P05230" "[455-459]"
[[3]]
[1] "AS14641" "[168-178]"
[[4]]
[1] "SS7Z3Z4" "[424-428]"
[[5]]
[1] "QQN4C6-2" "[671-679]"
[[6]]
[1] "DD9Y3B2" "[7-13]
起初这看起来很有希望,
do.call(rbind, head(strsplit(val, "\s+")))
[,1] [,2]
[1,] "PD2323" "[403-407]"
[2,] "P05230" "[455-459]"
[3,] "AS14641" "[168-178]"
[4,] "SS7Z3Z4" "[424-428]"
[5,] "QQN4C6-2" "[671-679]"
[6,] "DD9Y3B2" "[7-13]"
如果我现在删除 head 函数,出于某种原因我最终会得到 90 列的东西
dim(do.call(rbind, strsplit(val, "\s+")))
[1] 23369 90
Warning message:
In .Method(..., deparse.level = deparse.level) :
number of columns of result is not a multiple of vector length (arg 314)
我们可以用gsub
连同-
去掉方括号,用read.table
读入一个data.frame
d1 <- read.table(text=gsub("[][]|-", " ", val), header=FALSE, stringsAsFactors=FALSE)
d1
# V1 V2 V3
#1 PD2323 403 407
#2 P05230 455 459
数据
val <- c( "PD2323 [403-407]", "P05230 [455-459]")
我有一个看起来像这样的矢量
head(val)
[1] "PD2323 [403-407]" "P05230 [455-459]"
我想将它拆分成一个包含 3 列和许多行的数据框。输出应如下所示:
head(output)
[,1] [,2] [,3]
[1,] "P20700" 403 407
[2,] "P05787" 455 459
[3,] "O14641" 168 178
但是,当我尝试设置它时,我最终得到了一个超过 3 列的矩阵
head(strsplit(val, "\s+"))
[[1]]
[1] "PD2323" "[403-407]"
[[2]]
[1] "P05230" "[455-459]"
[[3]]
[1] "AS14641" "[168-178]"
[[4]]
[1] "SS7Z3Z4" "[424-428]"
[[5]]
[1] "QQN4C6-2" "[671-679]"
[[6]]
[1] "DD9Y3B2" "[7-13]
起初这看起来很有希望,
do.call(rbind, head(strsplit(val, "\s+")))
[,1] [,2]
[1,] "PD2323" "[403-407]"
[2,] "P05230" "[455-459]"
[3,] "AS14641" "[168-178]"
[4,] "SS7Z3Z4" "[424-428]"
[5,] "QQN4C6-2" "[671-679]"
[6,] "DD9Y3B2" "[7-13]"
如果我现在删除 head 函数,出于某种原因我最终会得到 90 列的东西
dim(do.call(rbind, strsplit(val, "\s+")))
[1] 23369 90
Warning message:
In .Method(..., deparse.level = deparse.level) :
number of columns of result is not a multiple of vector length (arg 314)
我们可以用gsub
连同-
去掉方括号,用read.table
data.frame
d1 <- read.table(text=gsub("[][]|-", " ", val), header=FALSE, stringsAsFactors=FALSE)
d1
# V1 V2 V3
#1 PD2323 403 407
#2 P05230 455 459
数据
val <- c( "PD2323 [403-407]", "P05230 [455-459]")