在 R 中,数据框中每个 ID 的不同事件的平均数量是多少?
In R, what's the average number of distinct events per ID in a dataframe?
背景
这是一个 R 数据帧 d
:
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
看起来是这样的:
您看到 ID
代表 2 个人,每个人都有 >1 个事件。 ID
=a 有 6 个事件,但只有 5 个不同的事件,而 ID
=b 有 2 个事件,都不同.
问题
我想计算 d
中每个人的独特/独特 event
的平均数量。在这种情况下,算法是这样的:
(5 个独特事件 + 2 个独特事件)/2 个不同的 ID
= 3.5 每个人的独特事件,这就是我正在寻找的答案.
我试过的
到目前为止我已经尝试过这样的事情:
d %>%
group_by(ID) %>%
summarise(mean = mean(tally(unique(event))))
但这会引发错误。
n_distinct
会给你不同事件的计数,你可以计算每个 ID
然后计算比率。
library(dplyr)
d %>%
group_by(ID) %>%
summarise(distinct_event = n_distinct(event)) %>%
summarise(ratio = mean(distinct_event))
# ratio
# <dbl>
#1 3.5
data.table
library(data.table)
library(magrittr)
df <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
setDT(df)[, list(uniqueN(event)), by = ID] %>%
.[, list(ratio = mean(V1))]
#> ratio
#> 1: 3.5
由 reprex package (v2.0.1)
创建于 2021-10-01
我们可以在 base R
中做到这一点
mean(aggregate(event ~ ID, d, FUN = function(x) length(unique(x)))$event)
-输出
[1] 3.5
背景
这是一个 R 数据帧 d
:
d <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
看起来是这样的:
您看到 ID
代表 2 个人,每个人都有 >1 个事件。 ID
=a 有 6 个事件,但只有 5 个不同的事件,而 ID
=b 有 2 个事件,都不同.
问题
我想计算 d
中每个人的独特/独特 event
的平均数量。在这种情况下,算法是这样的:
(5 个独特事件 + 2 个独特事件)/2 个不同的 ID
= 3.5 每个人的独特事件,这就是我正在寻找的答案.
我试过的
到目前为止我已经尝试过这样的事情:
d %>%
group_by(ID) %>%
summarise(mean = mean(tally(unique(event))))
但这会引发错误。
n_distinct
会给你不同事件的计数,你可以计算每个 ID
然后计算比率。
library(dplyr)
d %>%
group_by(ID) %>%
summarise(distinct_event = n_distinct(event)) %>%
summarise(ratio = mean(distinct_event))
# ratio
# <dbl>
#1 3.5
data.table
library(data.table)
library(magrittr)
df <- data.frame(ID = c("a","a","a","a","a","a","b","b"),
event = c("G12","R2","O99","B4","B4","A24","L5","J15"),
stringsAsFactors=FALSE)
setDT(df)[, list(uniqueN(event)), by = ID] %>%
.[, list(ratio = mean(V1))]
#> ratio
#> 1: 3.5
由 reprex package (v2.0.1)
创建于 2021-10-01我们可以在 base R
mean(aggregate(event ~ ID, d, FUN = function(x) length(unique(x)))$event)
-输出
[1] 3.5