更改列表中数据框中的列

Question

我有一个包含 78 个数据框 (list_of_df) 的列表，它们都具有相同的第一列和所有带注释的集成转录本 id:s，但是它们的扩展名为“.1”，即（ "ENST00000448914.1" 等等），我想删除它以便将它们与纯 ENST-ID 相匹配。

我试过使用 lapply 和 sapply 像这样：

lapply(list_of_df, function(x)  
                 cbind(x,sapply(x$target_id, function(y) unlist(strsplit(y,split=".",fixed=T))[1])) )

但这需要很长时间，有没有人知道如何做到这一点？

Answer 1

我们遍历 data.frames 的 list，并使用 sub 删除第一列中紧跟数字的 .。

lapply(list_of_df, function(x) {
          x[,1] <-sub('\.\d+', '', x[,1])
           x })

#[[1]]
#   target_id value
#1 ENST000049    39
#2 ENST010393    42

#[[2]]
#   target_id value
#1 ENST123434   423
#2  ENST00838    23

注意：即使 OP 的第一列是 factor，这也应该有效。

数据

list_of_df <- list(data.frame(target_id= c("ENST000049.1", 
   "ENST010393.14"), value= c(39, 42), stringsAsFactors=FALSE), 
  data.frame(target_id=c("ENST123434.42", "ENST00838.22"), 
   value= c(423, 23), stringsAsFactors=FALSE))

Answer 2

您可以将代码简化为：

lapply(list_of_df, function(x) x[,1] = unlist(strsplit(x[,1], split=".", fixed=TRUE))[1])

如果你的列有 factor 作为 class，你可以将 x[,1] 包装在 as.character:

lapply(list_of_df, function(x) x[,1] = unlist(strsplit(as.character(x[,1]), split=".", fixed=TRUE))[1])

您还可以使用 stringi 包：

library(stringi)
lapply(list_of_df, function(x) x[,1] = stri_split_fixed(x[,1], ".", n=1, tokens_only=TRUE))

更改列表中数据框中的列

Change column in data frames in list

r

list

lapply

dataframe

数据