减少恐惧：错过重复的列名 ([=10R)

Question

我有一个包含一堆填充列（当然命名为 filler）的文件，我正在尝试用 fread.

读取它

我正在使用 drop 参数，但它只删除它遇到的第一个（大概是 left-right，但这无关紧要）实例；我希望它摆脱所有这些。

快速示例：

header 共 .csv：

id,first_name,last_name,filler,birth_year,filler,position,filler,wage

names(dt) 来自在 fread 中使用 drop:

id,first_name,last_name,birth_year,filler,position,filler,wage

此外，如果我尝试：

DT <- fread("file.csv", drop = rep("filler", 5L))

我收到一个错误：

Error in fread(paste0(substr(tt, 3, 4), "staff.csv"), drop = rep("filler", : Duplicates detected in drop

有什么指点吗？

Answer 1

您可以使用 scan() 读取文件的第一行，然后将该数据用作 fread()

中的 drop 索引

## example text for fread()
x <- "id,first_name,last_name,filler,birth_year,filler,position,filler,wage
1,2,3,4,5,6,7,8,9"
## read the first line and find the filler
f <- scan(text = x, what = "", sep = ",", nlines = 1) == "filler"
## pass to fread()
fread(x, drop = which(f))
#    id first_name last_name birth_year position wage
# 1:  1          2         3          5        7    9

减少恐惧：错过重复的列名 ([=10R)

drop in fread: misses repetitions of col name (data.table R)

r

data.table