计算出现次数和未出现次数

Question

我有一个如下所示的数据框：

head(df)

id    id_child
1       1
1       2
1       3
2       1
4       1
4       2

我想创建一个变量来计算每个 parent 的 children 的数量。所以我想要这样的东西：

head(nb_chilren)

id    id_child      
1       3
2       1
3       0
4       2

如果可能的话，我希望将第 3 个人显示为 0 child，即使她在第一帧中不存在。

注意：id是连续的，在真实数据中是1到10628。

有什么建议吗？我想我必须使用split()功能，但我真的不知道如何使用它。

Answer 1

一个dplyr选项可以是：

df %>%
 group_by(id = factor(id, levels = min(id):max(id)), .drop = FALSE) %>%
 summarise(id_child = n_distinct(id_child))

  id    id_child
  <fct>    <int>
1 1            3
2 2            1
3 3            0
4 4            2

Answer 2

将 id 转换为 factor，级别从最小值 id 到最大值。

df$id <- factor(df$id, levels = min(df$id):max(df$id))

然后您可以在基数 R 中使用 table :

stack(table(df$id))[2:1]

或 count 在 dplyr 中：

library(dplyr)
df %>% count(id, .drop = FALSE)

#  id n
#1  1 3
#2  2 1
#3  3 0
#4  4 2

Answer 3

这是 table

的解决方案

table(factor(df[[1]], levels = Reduce(':', range(df[[1]]))))
#1 2 3 4 
#3 1 0 2

data.frame格式：

tbl <- table(id = factor(df[[1]], levels = Reduce(':', range(df[[1]]))))
as.data.frame(tbl)
#  id Freq
#1  1    3
#2  2    1
#3  3    0
#4  4    2

Answer 4

这是基于 r 的解决方案：

 id1=c()
 
 for(i in 1:max(df$id)){
 id1[i]=length(df$id[df$id==i])}

 df1=data.frame(id=1:max(df$id),nchild=id1)

计算出现次数和未出现次数

Counting occurrences and occurrences which do not appear

r

counting