从 R 中的 fread() 中删除数据框中的第一列名称

Question

我正在尝试从通过 fread() 生成的列名中删除名字。第一列名称仅作为行名称的标题。稍后在工作流程中，这个 "title" 确实弄乱了我的数据，因为它被视为行之一，所以不知何故，我需要忽略它或 non-existent。

我的 DGE_file 的子集如下所示：

            GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik                  1                  0
2: 0610009E02Rik                  0                  0

我试过像这样删除第一列名称：

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)

colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)

这足以理解会产生错误：

> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) : 
  Can't assign 10000 names to a 10001 column data.table

我已经尝试用 NA 替换它，但它在下游处理中产生了一个我无法解决的错误。

如何在下游处理中删除标题 "gene" 或使其成为 "invisible"？

Answer 1

您可以阅读没有 header 和第一行的文件，然后设置列名。但是，在我看来，使用没有名称的列名称或 NA 作为名称可能会有问题。

require(magrittr) # for piping
require(data.table) #For reading with fread

# Read in the dge file
#Without header and skiping the first line
DGE_file <- fread(file="DGE.txt",
                  skip = 1,
                  header=FALSE,
                  stringsAsFactors = TRUE)

#Set the column names (for "invisible" name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c("", "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

或

#Set the column names (for NA as the first name)
DGE_file <- DGE_file %>% 
  purrr::set_names(c(NA, "ATGGCGAACCTACATCCC",
                     "ATGGCGAGGACTCAAAGT"))

添加名称的 base R 解决方案可能如下所示：

#Read the file with header 
DGE_file <- fread(file="DGE.txt",
                  header=TRUE,
                  stringsAsFactors = TRUE)

#Set an "inivisible" as a name
names(DGE_file)[1] <- ""

#Or set an NA as a name
names(DGE_file)[1] <- NA

Answer 2

以下应该有效

library(Matrix)
library("data.table")

# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
# Set the first column name to the empty string.
names(DGE_file)[1] <- ""

从 R 中的 fread() 中删除数据框中的第一列名称

Remove the first column name in a data frame from fread() in R

r

fread

dataframe

rowname