组合多个汇总调用 dplyr

combinining multiple summarize calls dplyr

给定 df

ww <- data.frame(
    GM = c("A", "A", "A", "A", "A", "A",
            "B", "B", "B", "B", "B", "B",
            "C", "C", "C", "C", "C", "C"),
    stanza = rep(c("Past", "Mid", "End"), 6),   
    change = c(1, 1.1, 1.4, 1, 1.3, 1.5, 1, 1.2, 1.4,
               1.1, 1.2, 1.3, .9, 1.2, 1.3, .9, 1.3, 1.5))

我想计算每个 GM 的过去的平均值,并将 'change' 中的每个值除以 GM 的特定平均值。我可以用两个 dplyr 调用和一个连接函数来做到这一点,如下所示:

past <- ww %>%
    group_by(GM) %>%
    filter(stanza == "Past") %>%
    summarize(past.mean = mean(change))

ww <- left_join(ww, past, by = "GM")

ww %>%
    group_by(GM, stanza) %>%
    summarize(pr.change = change/past.mean)

但必须有一种方法可以在一个 dplyr 调用中完成此操作。

无需加入,直接在一个管链中计算即可:

ww %>% 
  group_by(GM) %>% 
  mutate(pr.change = change / mean(change[stanza == "Past"])) %>%
  ungroup()

输出

   GM    stanza change pr.change
   <chr> <chr>   <dbl>     <dbl>
 1 A     Past      1       1    
 2 A     Mid       1.1     1.1  
 3 A     End       1.4     1.4  
 4 A     Past      1       1    
 5 A     Mid       1.3     1.3  
 6 A     End       1.5     1.5  
 7 B     Past      1       0.952
 8 B     Mid       1.2     1.14 
 9 B     End       1.4     1.33 
10 B     Past      1.1     1.05 
11 B     Mid       1.2     1.14 
12 B     End       1.3     1.24 
13 C     Past      0.9     1    
14 C     Mid       1.2     1.33 
15 C     End       1.3     1.44 
16 C     Past      0.9     1    
17 C     Mid       1.3     1.44 
18 C     End       1.5     1.67 

使用base R

transform(ww, pr.change = change/ave(replace(change,
    stanza != 'Past', NA), GM, FUN = function(x) mean(x, na.rm = TRUE)))

-输出

 GM stanza change pr.change
1   A   Past    1.0  1.000000
2   A    Mid    1.1  1.100000
3   A    End    1.4  1.400000
4   A   Past    1.0  1.000000
5   A    Mid    1.3  1.300000
6   A    End    1.5  1.500000
7   B   Past    1.0  0.952381
8   B    Mid    1.2  1.142857
9   B    End    1.4  1.333333
10  B   Past    1.1  1.047619
11  B    Mid    1.2  1.142857
12  B    End    1.3  1.238095
13  C   Past    0.9  1.000000
14  C    Mid    1.2  1.333333
15  C    End    1.3  1.444444
16  C   Past    0.9  1.000000
17  C    Mid    1.3  1.444444
18  C    End    1.5  1.666667

一个data.table解决方案:

library(data.table)
setDT(ww)
ww[, pr.change := change / mean(change[stanza == "Past"]), GM]
    GM stanza change pr.change
 1:  A   Past    1.0  1.000000
 2:  A    Mid    1.1  1.100000
 3:  A    End    1.4  1.400000
 4:  A   Past    1.0  1.000000
 5:  A    Mid    1.3  1.300000
 6:  A    End    1.5  1.500000
 7:  B   Past    1.0  0.952381
 8:  B    Mid    1.2  1.142857
 9:  B    End    1.4  1.333333
10:  B   Past    1.1  1.047619
11:  B    Mid    1.2  1.142857
12:  B    End    1.3  1.238095
13:  C   Past    0.9  1.000000
14:  C    Mid    1.2  1.333333
15:  C    End    1.3  1.444444
16:  C   Past    0.9  1.000000
17:  C    Mid    1.3  1.444444
18:  C    End    1.5  1.666667