dplyr 将列除以另一列中指定的某些元素的平均值
dplyr divide column by a mean of some of its elements specified in another column
我有一个数据框:
df1 <- data.frame(site = c(rep("a", 6), rep("b", 6), rep("c", 6))
,intensity = c(25, 26, 27, 28, 29, 20, 21, 22, 23, 22, 21, 19, 24, 31, 32, 33, 33, 35)
,category = rep(c("up", "down", "nochange"), times = 6)
)
看起来像这样:
site intensity category
[1] a 25 up
[2] a 26 down
[3] a 27 nochange
[4] a 28 up
[5] a 29 down
[6] a 20 nochange
[7] b 21 up
[8] b 22 down
[9] b 23 nochange
[10] b 22 up
[11] b 21 down
[12] b 19 nochange
[13] c 24 up
[14] c 31 down
[15] c 32 nochange
[16] c 33 up
[17] c 33 down
[18] c 35 nochange
对于每个 site
,我想计算 mean(intensity)
,但只针对一个类别 nochange
。然后从 site
的所有 intensity
值中减去这个平均值的值。所以,一步一步,它将是:
group_by(site)
- 只为
category == "nochange
计算 mean(intensity)
- 将
intensity
(在所有 categories
中)除以在点 2 中创建的 mean(intensity)
值
因此,对于我的示例 df1
,我将有 3 种方式:站点 a mean = 23.5
、站点 b; mean = 21
,站点 c; mean = 33.5
我的输出 df_out
将如下所示:
site intensity category
[1] a 1.5 up
[2] a 2.5 down
[3] a 3.5 nochange
[4] a 4.5 up
[5] a 5.5 down
[6] a -3.5 nochange
[7] b 0.0 up
[8] b 1.0 down
[9] b 2.0 nochange
[10] b 1.0 up
[11] b 0.0 down
[12] b -2.0 nochange
[13] c -9.5 up
[14] c -2.5 down
[15] c -1.5 nochange
[16] c -0.5 up
[17] c -0.5 down
[18] c 1.5 nochange
感谢任何帮助。
按 'site' 分组后,用 'category' 上的逻辑表达式对 intensity
进行子集化,得到 mean
并从原始 'intensity' 中减去值
library(dplyr)
df1 %>%
group_by(site) %>%
mutate(intensity = intensity - mean(intensity[category == "nochange"])) %>%
ungroup
-输出
# A tibble: 18 × 3
site intensity category
<chr> <dbl> <chr>
1 a 1.5 up
2 a 2.5 down
3 a 3.5 nochange
4 a 4.5 up
5 a 5.5 down
6 a -3.5 nochange
7 b 0 up
8 b 1 down
9 b 2 nochange
10 b 1 up
11 b 0 down
12 b -2 nochange
13 c -9.5 up
14 c -2.5 down
15 c -1.5 nochange
16 c -0.5 up
17 c -0.5 down
18 c 1.5 nochange
我有一个数据框:
df1 <- data.frame(site = c(rep("a", 6), rep("b", 6), rep("c", 6))
,intensity = c(25, 26, 27, 28, 29, 20, 21, 22, 23, 22, 21, 19, 24, 31, 32, 33, 33, 35)
,category = rep(c("up", "down", "nochange"), times = 6)
)
看起来像这样:
site intensity category
[1] a 25 up
[2] a 26 down
[3] a 27 nochange
[4] a 28 up
[5] a 29 down
[6] a 20 nochange
[7] b 21 up
[8] b 22 down
[9] b 23 nochange
[10] b 22 up
[11] b 21 down
[12] b 19 nochange
[13] c 24 up
[14] c 31 down
[15] c 32 nochange
[16] c 33 up
[17] c 33 down
[18] c 35 nochange
对于每个 site
,我想计算 mean(intensity)
,但只针对一个类别 nochange
。然后从 site
的所有 intensity
值中减去这个平均值的值。所以,一步一步,它将是:
group_by(site)
- 只为
category == "nochange
计算 - 将
intensity
(在所有categories
中)除以在点 2 中创建的
mean(intensity)
mean(intensity)
值
因此,对于我的示例 df1
,我将有 3 种方式:站点 a mean = 23.5
、站点 b; mean = 21
,站点 c; mean = 33.5
我的输出 df_out
将如下所示:
site intensity category
[1] a 1.5 up
[2] a 2.5 down
[3] a 3.5 nochange
[4] a 4.5 up
[5] a 5.5 down
[6] a -3.5 nochange
[7] b 0.0 up
[8] b 1.0 down
[9] b 2.0 nochange
[10] b 1.0 up
[11] b 0.0 down
[12] b -2.0 nochange
[13] c -9.5 up
[14] c -2.5 down
[15] c -1.5 nochange
[16] c -0.5 up
[17] c -0.5 down
[18] c 1.5 nochange
感谢任何帮助。
按 'site' 分组后,用 'category' 上的逻辑表达式对 intensity
进行子集化,得到 mean
并从原始 'intensity' 中减去值
library(dplyr)
df1 %>%
group_by(site) %>%
mutate(intensity = intensity - mean(intensity[category == "nochange"])) %>%
ungroup
-输出
# A tibble: 18 × 3
site intensity category
<chr> <dbl> <chr>
1 a 1.5 up
2 a 2.5 down
3 a 3.5 nochange
4 a 4.5 up
5 a 5.5 down
6 a -3.5 nochange
7 b 0 up
8 b 1 down
9 b 2 nochange
10 b 1 up
11 b 0 down
12 b -2 nochange
13 c -9.5 up
14 c -2.5 down
15 c -1.5 nochange
16 c -0.5 up
17 c -0.5 down
18 c 1.5 nochange