从文本文件中读取特定列：R

Question

我正在尝试读取文本文件并创建一些特定列（大约 12 个）（位于特定长度）的数据框（称为数据集），如下所示：

  x <- fread("file1.txt",colClasses = "character", sep = "\n", header = FALSE, verbose = FALSE,strip.white = FALSE)
  y <- fread("file2.txt",colClasses = "character", sep = "\n", header = FALSE, verbose = FALSE,strip.white = FALSE)
  # combine them
  x = rbind(x,y)


  # We basically read the whole file as a string and then read substrings 
  # corresponding to each variable start and finish lengths.
  Var1= sapply(as.list(x$V1), stri_sub, from = 80, to = 82)
  Var1= as.data.frame(Var1)

  Var2= sapply(as.list(x$V1), stri_sub, 83, 89)
  Var2= as.data.frame(Var2)

  dataset = cbind(Var1,Var2)

运行这两个文本文件分别有 200K 和 300K 行，大约需要 1 分钟。他们每行有 1800 个字符。运行有没有更快的方法？我将阅读大约 200 个这样的文件。

Answer 1

我认为你可以通过以下方式简化你的代码

x <- Reduce(rbind, lapply(1:2, function(k) fread(paste0("file",k,".txt"),
                                                 colClasses = "character", 
                                                 sep = "\n", 
                                                 header = FALSE, 
                                                 verbose = FALSE,
                                                 strip.white = FALSE)))

dataset <- data.frame(Var1= substr(x$V1, 80, 82), Var2 = substr(x$V1,83,89))

当您在整列中使用 substr 时，第二行可能会节省更多时间。

从文本文件中读取特定列：R

Read Specific columns from text file: R

performance

r

fread