将自定义函数应用于数据框的每个子集并生成数据框

Question

这里可能会被问很多次，但自从我的函数 returns 数据框以来，我无法将它与任何内容联系起来。

我有我的自定义函数，它构建模型并输出一个数据框，其中一列是斜率 (coeff2)，另一列是截距 (coeff1)，一列是输入记录数，等等。理想情况下，我在函数中构建自己的数据框并从函数中输出它。现在我想根据列对我的输入数据框进行子集化，并在其上应用我的函数。

示例：-

f.get_reg <- function(df) {
  linear.model <-lm(df$DM ~ df$FW,)
  N <- length(df$DM)
  slope <- coef(linear.model)[2]
  intercept <- coef(linear.model)[1]
  S <- summary(linear.model)$sigma
  df.out <- data.frame (N,slope, intercept, S)
  return (df.out)
}



sample_id     FW       DM  StdDev_DM Median_DM Count  X90 X60 crit Z.scores
     6724 116.39 16.20690    0.9560414   16.0293    60 3.35 3.2  3.2        1
     6724 116.39 16.20690    0.9560414   16.0293    60 3.35 3.2  3.2        1
     6724 110.24 16.73077    0.9560414   16.0293    60 3.35 3.2  3.2        1
     6728 110.24 16.73077    0.9560414   16.0293    60 3.35 3.2  3.2        1
     6728 112.81 16.15542    0.9560414   16.0293    60 3.35 3.2  3.2        1
     6728 112.81 16.15542    0.9560414   16.0293    60 3.35 3.2  3.2        1

现在我想将我的函数应用到 sample_ids 的每个唯一子集，并且只输出一个数据框和一个记录作为每个子集的输出。

Answer 1

dplyr

您可以在 dplyr 中使用 do:

library(dplyr)
df %>%
    group_by(sample_id) %>%
    do(f.get_reg(.))

给出：

  sample_id     N       slope intercept            S
      (int) (int)       (dbl)     (dbl)        (dbl)
1      6724     3 -0.08518211  26.12125 7.716050e-15
2      6728     3 -0.22387160  41.41037 5.551115e-17

data.table

在data.table中使用.SD：

library(data.table)

df <- data.table(df)
df[,f.get_reg(.SD),sample_id]

结果相同：

   sample_id N       slope intercept            S
1:      6724 3 -0.08518211  26.12125 7.716050e-15
2:      6728 3 -0.22387160  41.41037 5.551115e-17

基础 R

使用by：

resultList <- by(df,df$sample_id,f.get_reg)
sample_id <- names(resultList)
result <- do.call(rbind,resultList)
result$sample_id <- sample_id
rownames(result) <- NULL

给出：

  N       slope intercept            S sample_id
1 3 -0.08518211  26.12125 7.716050e-15      6724
2 3 -0.22387160  41.41037 5.551115e-17      6728

将自定义函数应用于数据框的每个子集并生成数据框

Apply custom function to each subset of a data frame and result a dataframe

r

plyr

tapply

dplyr