分隔成 r 中的列 - 分隔符更改

separating into columns in r - delimiter changes

我想将单个列中包含的值拆分为新列。

我在文件中有一些数据如下所示: > df V1 1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050 2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474

我想将每个值分隔到一个新列中:V1、V2、V3、V4、V5 和 V6

我试过了:
df2 <- data.frame(do.call('rbind', strsplit(as.character(df$V1), ' ', fixed = FALSE)))

我最终得到这样的输出:

X1 X2 X3 X4 X5 X6 1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474 X7 X8 1 14050 00006303657102064942660780914135165036 2 00006319625527159782351492300309533775 12867 3 00006327933867965144524703512179615086 12867

一些 v1 值最终出现在其他列中。这可能是因为行尾没有 space。我怎样才能正确执行这个?

谢谢

library(tidyr)
library(dplyr)

df <- read.table(
  header = FALSE, 
  text = "
00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
00006319625527159782351492300309533775 12867 15473 13678 13497 15397
00006327933867965144524703512179615086 12867 14245 15397 15473 15474
",
  sep = "\n"
  )

df %>%
  separate(
    V1, 
    into = paste0("V", 1:7),
    # 'extra' allows the number of columns to differ by row
    extra = "drop"
    )

                                      V1    V2    V3    V4    V5    V6    V7
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397  <NA>
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474  <NA>

旧的 plyr 也有效:

txt <- readLines(n = 3)
1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050 
2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397 
3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474

library(plyr)
rbind.fill(
  lapply(
    strsplit(txt, " "), 
    function(y) {
      as.data.frame(t(y),stringsAsFactors=FALSE) # via @Arun 
    }
  )
)
#   V1                                     V2    V3    V4    V5    V6    V7    V8
# 1  1 00006303657102064942660780914135165036 12867 15476 15473 15474 15397 14050
# 2  2 00006319625527159782351492300309533775 12867 15473 13678 13497 15397  <NA>
# 3  3 00006327933867965144524703512179615086 12867 14245 15397 15473 15474  <NA>