使用 FREAD 将 CSV 文件导入 R 时跳过以特定值开头的行

Skipping rows starting with specific values while importing a CSV file into R using FREAD

我正在尝试从 R 中的 URL 导入 CSV 文件。该文件包含以特定字符串随机开头的行 - '<<<<<<< HEAD', '=======' or '>>>>>>> master'。包含这些字符的行位于随机行位置。我想避免这些行并导入文档的其余部分。有办法吗?我更喜欢使用 FREAD 来导入数据。感谢输入。

默认情况下不加载数据。它在遇到上述字符串的第一个实例(CSV 的第 347 行)时抛出错误。我试图从中下载数据的 URL 是 "https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv" 并且它抛出的错误如下:

[0%] Downloaded 0 bytes...
Warning message:
In data.table::fread("https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv",  :
  Stopped early on line 347. Expected 7 fields but found 1. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<<<<<<<< HEAD>>

我用来下载数据的代码语句是:

covid_ds <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv')

您可以使用 read.csvfill = TRUE 读取数据,仅保留 date 列中具有日期格式数据的那些行,以便 '<<<<<<< HEAD''=======' 被删除并使用 type_convert 将它们更改为各自的类型。

data <- read.csv('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', fill = TRUE)
data <- data[grepl('\d+-\d+-\d+', data$date), ]
data <- readr::type_convert(data)
data

#    date       province country       lat  long type      cases
#   <date>     <chr>    <chr>       <dbl> <dbl> <chr>     <int>
# 1 2020-01-22 NA       Afghanistan  33.9  67.7 confirmed     0
# 2 2020-01-23 NA       Afghanistan  33.9  67.7 confirmed     0
# 3 2020-01-24 NA       Afghanistan  33.9  67.7 confirmed     0
# 4 2020-01-25 NA       Afghanistan  33.9  67.7 confirmed     0
# 5 2020-01-26 NA       Afghanistan  33.9  67.7 confirmed     0
# 6 2020-01-27 NA       Afghanistan  33.9  67.7 confirmed     0
# 7 2020-01-28 NA       Afghanistan  33.9  67.7 confirmed     0
# 8 2020-01-29 NA       Afghanistan  33.9  67.7 confirmed     0
# 9 2020-01-30 NA       Afghanistan  33.9  67.7 confirmed     0
#10 2020-01-31 NA       Afghanistan  33.9  67.7 confirmed     0
# … with 287,772 more rows

data.table::fread 你可以使用 blank.lines.skip=TRUE.

data <- data.table::fread('https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/coronavirus.csv', blank.lines.skip=TRUE)