分隔成 r 中的列 - 分隔符更改
separating into columns in r - delimiter changes
我想将单个列中包含的值拆分为新列。
我在文件中有一些数据如下所示:
> df
V1
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
我想将每个值分隔到一个新列中:V1、V2、V3、V4、V5 和 V6
我试过了:
df2 <- data.frame(do.call('rbind', strsplit(as.character(df$V1), ' ', fixed = FALSE)))
我最终得到这样的输出:
X1 X2 X3 X4 X5 X6
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
X7 X8
1 14050 00006303657102064942660780914135165036
2 00006319625527159782351492300309533775 12867
3 00006327933867965144524703512179615086 12867
一些 v1 值最终出现在其他列中。这可能是因为行尾没有 space。我怎样才能正确执行这个?
谢谢
library(tidyr)
library(dplyr)
df <- read.table(
header = FALSE,
text = "
00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
00006319625527159782351492300309533775 12867 15473 13678 13497 15397
00006327933867965144524703512179615086 12867 14245 15397 15473 15474
",
sep = "\n"
)
df %>%
separate(
V1,
into = paste0("V", 1:7),
# 'extra' allows the number of columns to differ by row
extra = "drop"
)
V1 V2 V3 V4 V5 V6 V7
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 <NA>
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474 <NA>
旧的 plyr
也有效:
txt <- readLines(n = 3)
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
library(plyr)
rbind.fill(
lapply(
strsplit(txt, " "),
function(y) {
as.data.frame(t(y),stringsAsFactors=FALSE) # via @Arun
}
)
)
# V1 V2 V3 V4 V5 V6 V7 V8
# 1 1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
# 2 2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 <NA>
# 3 3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474 <NA>
我想将单个列中包含的值拆分为新列。
我在文件中有一些数据如下所示:
> df
V1
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
我想将每个值分隔到一个新列中:V1、V2、V3、V4、V5 和 V6
我试过了:
df2 <- data.frame(do.call('rbind', strsplit(as.character(df$V1), ' ', fixed = FALSE)))
我最终得到这样的输出:
X1 X2 X3 X4 X5 X6
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
X7 X8
1 14050 00006303657102064942660780914135165036
2 00006319625527159782351492300309533775 12867
3 00006327933867965144524703512179615086 12867
一些 v1 值最终出现在其他列中。这可能是因为行尾没有 space。我怎样才能正确执行这个?
谢谢
library(tidyr)
library(dplyr)
df <- read.table(
header = FALSE,
text = "
00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
00006319625527159782351492300309533775 12867 15473 13678 13497 15397
00006327933867965144524703512179615086 12867 14245 15397 15473 15474
",
sep = "\n"
)
df %>%
separate(
V1,
into = paste0("V", 1:7),
# 'extra' allows the number of columns to differ by row
extra = "drop"
)
V1 V2 V3 V4 V5 V6 V7
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 <NA>
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474 <NA>
旧的 plyr
也有效:
txt <- readLines(n = 3)
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474
library(plyr)
rbind.fill(
lapply(
strsplit(txt, " "),
function(y) {
as.data.frame(t(y),stringsAsFactors=FALSE) # via @Arun
}
)
)
# V1 V2 V3 V4 V5 V6 V7 V8
# 1 1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
# 2 2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 <NA>
# 3 3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474 <NA>