R 中每个 ID 的平均事件的标准差

Standard deviation of average events per ID in R

背景

我有这个数据集d:

d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
                event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
                stringsAsFactors=FALSE)

里面有 2 个人(IDs),他们每个人都有一些事件。

问题

我正在尝试获取每个人的平均事件数(计数),以及该平均值的标准偏差,所有结果都在一个结果中(它可以是数据框,也可以不是,无关紧要)。

换句话说,我正在寻找这样的东西:

| Mean |  SD  |
|------|------|
| 4.00 | 2.83 |

我试过的

我认为我离得不远,只是我有 2 段独立的代码来执行这些计算。这是平均值:

d %>%
  group_by(ID) %>%
  summarise(event = length(event)) %>%
  summarise(ratio = mean(event))

# A tibble: 1 x 1
  ratio
  <dbl>
1     4

这是 SD:

d %>%
  group_by(ID) %>%
  summarise(event = length(event)) %>%  
  summarise(sd = sd(event))

# A tibble: 1 x 1
     sd
  <dbl>
1  2.83

但是当我尝试像这样将它们通过管道连接在一起时...

d %>%
  group_by(ID) %>%
  summarise(event = length(event)) %>%
  summarise(ratio = mean(event)) %>%
  summarise(sd = sd(event))

...我得到一个错误:

Error in `h()`:
! Problem with `summarise()` column `sd`.
i `sd = sd(event)`.
x object 'event' not found

有什么见解吗?

您必须将对 summarise() 的最后两次调用放在同一个调用中。 summarise() 之后唯一剩下的列将是您命名的列和分组列,因此在您第二次总结之后,event 列不再存在。

library(dplyr)

d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
                event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
                stringsAsFactors=FALSE)

d %>%
  group_by(ID) %>%
  # the next summarise will be within ID
  summarise(event = length(event)) %>% 
  # this summarise is overall
  summarise(sd = sd(event),
            ratio = mean(event))

#> # A tibble: 1 × 2
#>      sd ratio
#>   <dbl> <dbl>
#> 1  2.83     4

代码有点混乱,因为您要重命名事件变量,并在组内执行第一个 summarise(),而在不分组的情况下执行第二个。这段代码会更容易阅读并得到相同的结果:

d %>%
  count(ID) %>% 
  summarise(sd = sd(n),
            ratio = mean(n))

reprex package (v2.0.1)

于 2022-05-25 创建