Drop Columns 已填写的列

Question

我正在将大量数据从 CSV 加载到 R 中。 CSV 包含空 headers 的列，我想从我的数据集中删除这些列。我已经进行了一些搜索，但还没有找到执行此操作的好方法。 R 用 X2、X3 等填充列。我不想基于 X 删除，因为它们可能会删除我需要的一些列。我有大约 200 个 CVS 需要加载到这里。

如有任何帮助，我们将不胜感激。

这是我用来解决这个问题的示例 csv 的文本文件。在我的实际数据中，空白列并不总是出现在所有其他单元格中。

Name,,Ages,,Color,,Year
Michael,some data,2,some other data,Blue,a third data,2001
Tiffany,some data,3,some other data,Red,a third data,2002
Bryan,some data,4,some other data,Green,a third data,2003
Sarah,some data,5,some other data,Orange,a third data,2004

这是我想要返回的内容。

Name,Ages,Color,Year
Michael,2,Blue,2001
Tiffany,3,Red,2002
Bryan,4,Green,2003
Sarah,5,Orange,2004

Answer 1

我们可以创建一个正则表达式来匹配以 X 开头 (^) 后跟 . 或数字直到结尾 ($) 的列名grep 中的字符串，指定 invert = TRUE 以取反

i1 <- grep('^X([0-9.]+)?$', names(df1), invert = TRUE)
df1[i1]

-输出

#      Name Ages  Color Year
#1 Michael    2   Blue 2001
#2 Tiffany    3    Red 2002
#3   Bryan    4  Green 2003
#4   Sarah    5 Orange 2004

或使用dplyr

library(dplyr)
df1 %>% 
      select(!matches('X([0-9.]+)?$'))

如果我们有很多文件要从工作目录中读取，请在 list 中读取它，然后执行相同的操作

files <- list.files(pattern = '\.csv$', full.names = TRUE)
lst1 <- lapply(files, function(x) {
            x1 <- read.csv(x)
           i1 <- grep('^X([0-9.]+)?$', names(x1), invert = TRUE)
           x1[i1]
      })

数据

df1 <- structure(list(Name = c("Michael", "Tiffany", "Bryan", "Sarah"
), X = c("some data", "some data", "some data", "some data"), 
    Ages = 2:5, X.1 = c("some other data", "some other data", 
    "some other data", "some other data"), Color = c("Blue", 
    "Red", "Green", "Orange"), X.2 = c("a third data", "a third data", 
    "a third data", "a third data"), Year = 2001:2004),
    class = "data.frame", row.names = c(NA, 
-4L))

Drop Columns 已填写的列

Drop Columns columns that have been filled in

r

header

drop

数据