从 R 中的 fread() 中删除数据框中的第一列名称
Remove the first column name in a data frame from fread() in R
我正在尝试从通过 fread() 生成的列名中删除名字。第一列名称仅作为行名称的标题。稍后在工作流程中,这个 "title" 确实弄乱了我的数据,因为它被视为行之一,所以不知何故,我需要忽略它或 non-existent。
我的 DGE_file 的子集如下所示:
GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik 1 0
2: 0610009E02Rik 0 0
我试过像这样删除第一列名称:
library(Matrix)
library("data.table")
# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)
这足以理解会产生错误:
> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) :
Can't assign 10000 names to a 10001 column data.table
我已经尝试用 NA 替换它,但它在下游处理中产生了一个我无法解决的错误。
如何在下游处理中删除标题 "gene" 或使其成为 "invisible"?
您可以阅读没有 header 和第一行的文件,然后设置列名。但是,在我看来,使用没有名称的列名称或 NA
作为名称可能会有问题。
require(magrittr) # for piping
require(data.table) #For reading with fread
# Read in the dge file
#Without header and skiping the first line
DGE_file <- fread(file="DGE.txt",
skip = 1,
header=FALSE,
stringsAsFactors = TRUE)
#Set the column names (for "invisible" name)
DGE_file <- DGE_file %>%
purrr::set_names(c("", "ATGGCGAACCTACATCCC",
"ATGGCGAGGACTCAAAGT"))
或
#Set the column names (for NA as the first name)
DGE_file <- DGE_file %>%
purrr::set_names(c(NA, "ATGGCGAACCTACATCCC",
"ATGGCGAGGACTCAAAGT"))
添加名称的 base R
解决方案可能如下所示:
#Read the file with header
DGE_file <- fread(file="DGE.txt",
header=TRUE,
stringsAsFactors = TRUE)
#Set an "inivisible" as a name
names(DGE_file)[1] <- ""
#Or set an NA as a name
names(DGE_file)[1] <- NA
以下应该有效
library(Matrix)
library("data.table")
# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
# Set the first column name to the empty string.
names(DGE_file)[1] <- ""
我正在尝试从通过 fread() 生成的列名中删除名字。第一列名称仅作为行名称的标题。稍后在工作流程中,这个 "title" 确实弄乱了我的数据,因为它被视为行之一,所以不知何故,我需要忽略它或 non-existent。
我的 DGE_file 的子集如下所示:
GENE ATGGCGAACCTACATCCC ATGGCGAGGACTCAAAGT
1: 0610009B22Rik 1 0
2: 0610009E02Rik 0 0
我试过像这样删除第一列名称:
library(Matrix)
library("data.table")
# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
colnames(DGE_file)<-colnames(DGE_file)[-1]
DGE_file<- as.matrix(DGE_file)
这足以理解会产生错误:
> colnames(DGE_file)<-colnames(DGE_file)[-1]
Error in setnames(x, value) :
Can't assign 10000 names to a 10001 column data.table
我已经尝试用 NA 替换它,但它在下游处理中产生了一个我无法解决的错误。
如何在下游处理中删除标题 "gene" 或使其成为 "invisible"?
您可以阅读没有 header 和第一行的文件,然后设置列名。但是,在我看来,使用没有名称的列名称或 NA
作为名称可能会有问题。
require(magrittr) # for piping
require(data.table) #For reading with fread
# Read in the dge file
#Without header and skiping the first line
DGE_file <- fread(file="DGE.txt",
skip = 1,
header=FALSE,
stringsAsFactors = TRUE)
#Set the column names (for "invisible" name)
DGE_file <- DGE_file %>%
purrr::set_names(c("", "ATGGCGAACCTACATCCC",
"ATGGCGAGGACTCAAAGT"))
或
#Set the column names (for NA as the first name)
DGE_file <- DGE_file %>%
purrr::set_names(c(NA, "ATGGCGAACCTACATCCC",
"ATGGCGAGGACTCAAAGT"))
添加名称的 base R
解决方案可能如下所示:
#Read the file with header
DGE_file <- fread(file="DGE.txt",
header=TRUE,
stringsAsFactors = TRUE)
#Set an "inivisible" as a name
names(DGE_file)[1] <- ""
#Or set an NA as a name
names(DGE_file)[1] <- NA
以下应该有效
library(Matrix)
library("data.table")
# Read in the dge file
DGE_file<- fread(file="DGE.txt", stringsAsFactors = TRUE)
# Set the first column name to the empty string.
names(DGE_file)[1] <- ""