lapply 和 dplyr 组合处理嵌套数据帧

Question

我的文件夹目录中有一个数据框列表，我想对其进行处理以进行分析。我首先通过使用 lapply 函数内部来阅读它们，然后我想处理它的列并通过分组对其行进行排序。因此大多数时候我需要组合 dplyr 和 lapply 函数来更快地处理我的数据。我浏览了网络并查阅了一些书籍，但大多数示例都很简单，没有涵盖这两个功能的组合。

这是我正在使用的示例代码：

files <- mixedsort(dir(pattern = "*.txt",full.names = FALSE)) # to read data

data <-  lapply(files,function(x){
tmp <- read.table(file=x, fill=T, sep = "\t", dec=".", header=F,stringsAsFactors=F)
df <- tmp [!grepl(c("AC"),tmp $V1),]
new.df <- select(df, V1:V26)
new.df <- apply(new.df, function(x){ x[11:26] <- x[11:26]/10000;x })

我收到以下错误：

Error in match.fun(FUN) : argument "FUN" is missing, with no default

这是看起来像我的数据的可重现示例。假设我想处理 dat 的第 2 和第 3 列，并按 let 列分组。当我尝试在上面的 data 代码中放置下面的 fun 命令时，我得到了错误。任何指导将不胜感激。

dat <- lapply(1:3, function(x)data.frame(let=sample(letters,4),a=sort(runif(20,0,10000),decreasing=TRUE), b=sort(runif(20,0,10000),decreasing=TRUE), c=rnorm(20),d=rnorm(20)))

fun <- lapply(dat, function(x){x[2:3] <-x[2:3] /10000; x})

Answer 1

如您问题的评论中所述，apply 函数导致了错误。但是我不认为 apply 是你想要的，因为它聚合了你的数据框。

只使用 dplyr-syntax 你的问题可以这样解决：

tmp %>%
  filter(!grepl("AC",V1)) %>%
  select(V1:V26) %>%
  mutate_each(funs(./1000), V11:V26)

lapply 和 dplyr 组合处理嵌套数据帧

lapply and dplyr combination to process nested data frames

r

lapply

dplyr