在 R 中应用 sapply 时如何保持数据帧格式?

How to keep data frame format when applying sapply in R?

我编写了一个函数,将我的数据框拆分为 3 列的序列,每列(代表样本重复)并对这些重复应用另一个函数。如果此重复序列中的三个样本中至少有两个高于特定阈值,则后者将所有值替换为 "NA",在本例中为 16。

这是示例代码:

# Install and load packages
if (!require(plyr)) install.packages('plyr')
library(plyr)
if (!require(dplyr)) install.packages('dplyr')
library(dplyr)

# Create example data frame
df <- data.frame (ID  = c('data1', 'data2', 'data3'), 
    sample1 = c(2, 18, 3),
    sample2 = c(4, 17, 16),
    sample3 = c(3, 11, 2),
    sample4 = c(22, 11, 35),
    sample5 = c(10, 8, 22),
    sample6 = c(17, 9, 11))

# Function for threshold settings
setThreshold <- function(df) {
    thresholded_replicates <- data.frame(
        sapply(split.default(df[2:ncol(df)], 
            rep(seq_along(df), 
            each = 3, 
            length.out = ncol(df)-1)
            ), function(df) {
                 df <- df %>%
                 mutate(rowsum = apply(df, 1, function(x) sum(x > 16))) %>% 
                 mutate_at(1:ncol(df), funs(ifelse(rowsum < 2, NA, .))) %>%
                 select(-rowsum)
               return(df)
        }
    ))
    return(thresholded_replicates)
}

df_th <- setThreshold(df)

输入数据框如下所示:

> df
        ID sample1 sample2 sample3 sample4 sample5 sample6
1 data1       2       4       3      22      10      17
2 data2      18      17      11      11       8       9
3 data3       3      16       2      35      22      11

应用函数后数据框下方:

> df_th
                X1         X2
sample1 NA, 18, NA 22, NA, 35
sample2 NA, 17, NA 10, NA, 22
sample3 NA, 11, NA 17, NA, 11

该函数运行良好,它将复制行中的所有值替换为 "NA",其中不包含至少两个大于 16 的值。但是,数据框的格式混淆了,生成的数据框应如下所示:

     sample1 sample2 sample3 sample4 sample5 sample6
1      NA      NA      NA      22      10      17
2      18      17      11      NA      NA      NA
3      NA      NA      NA      35      22      11

如何实现?

这是完整的基础 R 版本,我们使用 lapplyrowSums 将行转换为 NA

do.call(cbind, lapply(split.default(df[2:ncol(df)], rep(seq_along(df), each = 3, 
       length.out = ncol(df)-1)), function(x) {x[rowSums(x > 16) < 2, ] <- NA;x}))

#  1.sample1 1.sample2 1.sample3 2.sample4 2.sample5 2.sample6
#1        NA        NA        NA        22        10        17
#2        18        17        11        NA        NA        NA
#3        NA        NA        NA        35        22        11