R 中每个 ID 的平均事件的标准差
Standard deviation of average events per ID in R
背景
我有这个数据集d
:
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
里面有 2 个人(ID
s),他们每个人都有一些事件。
问题
我正在尝试获取每个人的平均事件数(计数),以及该平均值的标准偏差,所有结果都在一个结果中(它可以是数据框,也可以不是,无关紧要)。
换句话说,我正在寻找这样的东西:
| Mean | SD |
|------|------|
| 4.00 | 2.83 |
我试过的
我认为我离得不远,只是我有 2 段独立的代码来执行这些计算。这是平均值:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event))
# A tibble: 1 x 1
ratio
<dbl>
1 4
这是 SD:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(sd = sd(event))
# A tibble: 1 x 1
sd
<dbl>
1 2.83
但是当我尝试像这样将它们通过管道连接在一起时...
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event)) %>%
summarise(sd = sd(event))
...我得到一个错误:
Error in `h()`:
! Problem with `summarise()` column `sd`.
i `sd = sd(event)`.
x object 'event' not found
有什么见解吗?
您必须将对 summarise()
的最后两次调用放在同一个调用中。 summarise()
之后唯一剩下的列将是您命名的列和分组列,因此在您第二次总结之后,event
列不再存在。
library(dplyr)
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
d %>%
group_by(ID) %>%
# the next summarise will be within ID
summarise(event = length(event)) %>%
# this summarise is overall
summarise(sd = sd(event),
ratio = mean(event))
#> # A tibble: 1 × 2
#> sd ratio
#> <dbl> <dbl>
#> 1 2.83 4
代码有点混乱,因为您要重命名事件变量,并在组内执行第一个 summarise()
,而在不分组的情况下执行第二个。这段代码会更容易阅读并得到相同的结果:
d %>%
count(ID) %>%
summarise(sd = sd(n),
ratio = mean(n))
由 reprex package (v2.0.1)
于 2022-05-25 创建
背景
我有这个数据集d
:
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
里面有 2 个人(ID
s),他们每个人都有一些事件。
问题
我正在尝试获取每个人的平均事件数(计数),以及该平均值的标准偏差,所有结果都在一个结果中(它可以是数据框,也可以不是,无关紧要)。
换句话说,我正在寻找这样的东西:
| Mean | SD |
|------|------|
| 4.00 | 2.83 |
我试过的
我认为我离得不远,只是我有 2 段独立的代码来执行这些计算。这是平均值:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event))
# A tibble: 1 x 1
ratio
<dbl>
1 4
这是 SD:
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(sd = sd(event))
# A tibble: 1 x 1
sd
<dbl>
1 2.83
但是当我尝试像这样将它们通过管道连接在一起时...
d %>%
group_by(ID) %>%
summarise(event = length(event)) %>%
summarise(ratio = mean(event)) %>%
summarise(sd = sd(event))
...我得到一个错误:
Error in `h()`:
! Problem with `summarise()` column `sd`.
i `sd = sd(event)`.
x object 'event' not found
有什么见解吗?
您必须将对 summarise()
的最后两次调用放在同一个调用中。 summarise()
之后唯一剩下的列将是您命名的列和分组列,因此在您第二次总结之后,event
列不再存在。
library(dplyr)
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
d %>%
group_by(ID) %>%
# the next summarise will be within ID
summarise(event = length(event)) %>%
# this summarise is overall
summarise(sd = sd(event),
ratio = mean(event))
#> # A tibble: 1 × 2
#> sd ratio
#> <dbl> <dbl>
#> 1 2.83 4
代码有点混乱,因为您要重命名事件变量,并在组内执行第一个 summarise()
,而在不分组的情况下执行第二个。这段代码会更容易阅读并得到相同的结果:
d %>%
count(ID) %>%
summarise(sd = sd(n),
ratio = mean(n))
由 reprex package (v2.0.1)
于 2022-05-25 创建