如何通过导入文本文件将文本文件转换为数据框？

Question

我有一个巨大的文本文件，其中包含一个长字符串，我正试图将其作为数据框导入 R。

包含数据的文本文件来自

https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/new.data

本质上，文本文件是一个长字符串，值由空格分隔，我想知道是否可以使用 R 将所有空格转换为逗号，以便我可以使用 read_csv?

我尝试将其导入为 tsv，但没有成功。当我尝试使用 read_delim 时，由于文本文件的格式，它也没有用。

有人知道如何将此文本文件导入到简单的数据框中吗？文本文件的前两行如下所示，第一行加粗以与第二行区分开来。

1 15943882 63 1 -9 -9 -9 -27 1 145 1 233 -9 50 20 1 0 1 2 2 3 1981 0 0 0 0 0 1 10.5 6 13 150 60 190 90 145 85 0 0 2.3 3 -9 -9 0 -9 -9 -9 -9 -9 -9 6 -9 -9 -9 2 16 1981 0 1 1 1 -9 1 -9 1 -9 1 1 1 1 1 1 1 -9 -9 0 -9 -9 -9 -9 -9 -9 -9 -9 -9 0 0 0 0 名称 2 15964847 67 1 -9 -9 -9 -27 4 160 1 286 -9 40 40 0 0 1 2 3 5 1981 0 1 0 0 0 1 9.5 6 13 108 64 160 90 160 90 1 0 1.5 2 -9 -9 3 -9 -9 -9 -9 -9 -9 3 - 9 -9 -9 2 5 1981 2 1 2 2 -9 2 -9 1 -9 1 1 1 1 1 1 1 -9 -9 0 -9 -9 -9 -9 -9 -9 -9 -9 -9 0 0 0 0 姓名

谢谢！

Answer 1

可能有更有效的方法，但这似乎可以解决问题。您必须自己命名列（数据似乎没有列名）。

library(dplyr)
library(tibble)
library(readr)

# Determined by looking at the file.  Not sure if 
# there's a way to determine this automatically
line_per_chunk <- 12L

# Read the whole file into a character vector
data <- read_lines('new.data')

# Combine every group of 12 lines into a single string
# (using a space as a delimiter to match the rest of the file)
joined_data <- data %>%
  # Make the character vector a data frame, with a row number column
  enframe(name = 'row', value = 'raw_data') %>%
  # Based on   
  group_by(chunk = (row -1) %/% line_per_chunk) %>%
  summarise(joined = paste(raw_data, collapse = ' '))

# Based on 
results <- read.table(textConnection(joined_data[["joined"]]), sep = ' ')

results

如何通过导入文本文件将文本文件转换为数据框？

How to turn a textfile into a dataframe by importing the text file?

import

r

dataframe