在 R 中,数据框中每个 ID 的不同事件的平均数量是多少?

In R, what's the average number of distinct events per ID in a dataframe?

背景

这是一个 R 数据帧 d:

d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
                event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
                stringsAsFactors=FALSE)

看起来是这样的:

您看到 ID 代表 2 个人,每个人都有 >1 个事件。 ID=a 有 6 个事件,但只有 5 个不同的事件,而 ID=b 有 2 个事件,都不同.

问题

我想计算 d 中每个人的独特/独特 event 的平均数量。在这种情况下,算法是这样的:

(5 个独特事件 + 2 个独特事件)/2 个不同的 ID = 3.5 每个人的独特事件,这就是我正在寻找的答案.

我试过的

到目前为止我已经尝试过这样的事情:

d %>%
  group_by(ID) %>%
  summarise(mean = mean(tally(unique(event))))

但这会引发错误。

n_distinct 会给你不同事件的计数,你可以计算每个 ID 然后计算比率。

library(dplyr)

d %>%
  group_by(ID) %>%
  summarise(distinct_event = n_distinct(event)) %>%
  summarise(ratio = mean(distinct_event))

#  ratio
#  <dbl>
#1   3.5

data.table

library(data.table)
library(magrittr)
df <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
                event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
                stringsAsFactors=FALSE)

setDT(df)[, list(uniqueN(event)), by = ID] %>% 
  .[, list(ratio = mean(V1))]
#>    ratio
#> 1:   3.5

reprex package (v2.0.1)

创建于 2021-10-01

我们可以在 base R

中做到这一点
mean(aggregate(event ~ ID, d, FUN = function(x) length(unique(x)))$event)

-输出

[1] 3.5