在 R 中读取 CSV 并按名称过滤列

Question

假设我有一个包含数十或数百列的 CSV，而我只想拉入大约 2 或 3 列。我知道 colClasses 所描述的解决方案 here 但代码变得非常难以阅读。

我想要 pandas' read_csv 中的 usecols 之类的东西。

加载所有内容然后再选择不是解决方案（文件太大，内存放不下）。

Answer 1

我将使用包 data.table，然后使用 fread() 通过参数 select 或 drop 将列指定为 keep/drop。来自 ?fread

select Vector of column names or numbers to keep, drop the rest.

drop Vector of column names or numbers to drop, keep the rest.

最好！

Answer 2

一种方法是使用包 sqldf。如果您知道 SQL，则可以只过滤您想要的部分来读取大文件。

我将使用内置数据集 iris 使示例可重现，首先将其保存到磁盘。

write.csv(iris, "iris.csv", row.names = FALSE)

现在是问题。
参数 row.names 就像在 write.csv 指令中一样。
注意 Sepal.Length 周围的反引号。这是由于列名称中的点字符。

library(sqldf)

sql <- "select `Sepal.Length`, Species from file"
sub_iris <- read.csv.sql("iris.csv", sql = sql, row.names = FALSE)

head(sub_iris)
#  Sepal.Length  Species
#1          5.1 "setosa"
#2          4.9 "setosa"
#3          4.7 "setosa"
#4          4.6 "setosa"
#5          5.0 "setosa"
#6          5.4 "setosa"

最后清理。

unlink("iris.csv")

在 R 中读取 CSV 并按名称过滤列

Read CSV in R and filter columns by name

csv

r

readr