Evaluation Error : Need at least one column for 'n_distinct()'

Question

我正在使用 R 编程语言。我有一个包含 2 列的数据框 (my_file)：my_date（例如 2000-01-15，因子格式）和“blood_type”（也是因子格式）。我正在尝试使用 dplyr 库按组（按月）生成不同的计数。

我想出了如何进行非区别计数：

library(dplyr)

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n())

但这不适用于不同的计数：

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct())

Evaluation Error : Need at least one column for 'n_distinct()'

我试图显式引用该列，但这会生成一个空文件：

new_file <- my_file %>%
mutate(date = as.Date(my_date)) %>%
group_by(blood_type, month = format(date, "%Y-%m")) %>%
summarise(count = n_distinct(my_file$blood_type))

有人可以告诉我我做错了什么吗？

谢谢

Answer 1

如果您想计算每个月的不同 blood_type，请不要将其包含在 group_by 中。尝试：

library(dplyr)

new_file <- my_file %>%
  mutate(date = as.Date(my_date)) %>%
  group_by(month = format(date, "%Y-%m")) %>%
  summarise(count = n_distinct(blood_type))

Answer 2

使用data.table

library(data.table)
setDT(my_file)[, .(count = uniqueN(blood_type), 
        .(month = format(as.IDate(my_date), '%Y-%m'))]

Evaluation Error : Need at least one column for 'n_distinct()'

Evaluation Error : Need at least one column for 'n_distinct()'

group-by

r

count

distinct

dplyr