dplyr 将列除以另一列中指定的某些元素的平均值

Question

我有一个数据框：

df1 <- data.frame(site = c(rep("a", 6), rep("b", 6), rep("c", 6))
                  ,intensity = c(25, 26, 27, 28, 29, 20, 21, 22, 23, 22, 21, 19, 24, 31, 32, 33, 33, 35)
                  ,category = rep(c("up", "down", "nochange"), times = 6)
                  )

看起来像这样：

     site intensity category  
[1]     a        25       up
[2]     a        26     down
[3]     a        27 nochange
[4]     a        28       up 
[5]     a        29     down 
[6]     a        20 nochange 
[7]     b        21       up 
[8]     b        22     down 
[9]     b        23 nochange 
[10]    b        22       up
[11]    b        21     down 
[12]    b        19 nochange 
[13]    c        24       up 
[14]    c        31     down 
[15]    c        32 nochange 
[16]    c        33       up 
[17]    c        33     down 
[18]    c        35 nochange

对于每个 site，我想计算 mean(intensity)，但只针对一个类别 nochange。然后从 site 的所有 intensity 值中减去这个平均值的值。所以，一步一步，它将是：

group_by(site)
只为 category == "nochange

mean(intensity)

将 intensity（在所有 categories 中）除以在点 2

mean(intensity)

因此，对于我的示例 df1 ，我将有 3 种方式：站点 a mean = 23.5 、站点 b； mean = 21，站点 c； mean = 33.5

我的输出 df_out 将如下所示：

     site   intensity category  
[1]     a         1.5       up
[2]     a         2.5     down
[3]     a         3.5 nochange
[4]     a         4.5       up 
[5]     a         5.5     down 
[6]     a        -3.5 nochange 
[7]     b         0.0       up 
[8]     b         1.0     down 
[9]     b         2.0 nochange 
[10]    b         1.0       up
[11]    b         0.0     down 
[12]    b        -2.0 nochange 
[13]    c        -9.5       up 
[14]    c        -2.5     down 
[15]    c        -1.5 nochange 
[16]    c        -0.5       up 
[17]    c        -0.5     down 
[18]    c         1.5 nochange

感谢任何帮助。

Answer 1

按 'site' 分组后，用 'category' 上的逻辑表达式对 intensity 进行子集化，得到 mean 并从原始 'intensity' 中减去值

library(dplyr)
df1 %>%
   group_by(site) %>% 
   mutate(intensity = intensity - mean(intensity[category == "nochange"])) %>%
   ungroup

-输出

# A tibble: 18 × 3
   site  intensity category
   <chr>     <dbl> <chr>   
 1 a           1.5 up      
 2 a           2.5 down    
 3 a           3.5 nochange
 4 a           4.5 up      
 5 a           5.5 down    
 6 a          -3.5 nochange
 7 b           0   up      
 8 b           1   down    
 9 b           2   nochange
10 b           1   up      
11 b           0   down    
12 b          -2   nochange
13 c          -9.5 up      
14 c          -2.5 down    
15 c          -1.5 nochange
16 c          -0.5 up      
17 c          -0.5 down    
18 c           1.5 nochange

dplyr 将列除以另一列中指定的某些元素的平均值

dplyr divide column by a mean of some of its elements specified in another column

r

dplyr