对列执行分组以计算 R 中另一列的出现次数
Perform group by on a column to calculate count of occurrences of another column in R
我有一个类似于下面提供的示例数据集的数据集:
| Name | Response_days | state |
|------|---------------|-------|
| John | 0 | NY |
| John | 6 | NY |
| John | 9 | NY |
| Mike | 3 | CA |
| Mike | 7 | CA |
同样表示为:
Name = c("John","John", "John", "Mike", "Mike")
Response_days = c(0,6,9,3,7)
state= c("NY","NY","NY", "CA","CA")
df= data.frame(Name, Response_days, state, stringsAsFactors = TRUE)
df$Response_days= as.integer(df$Response_days)
我想对数据进行子集化,只查看 Response_days>5。之后我想按 'Name' 分组并计算 'Response_days' 的出现次数。我已经尝试了下面提到的代码,但它抛出了一个错误。
df1=subset(df, df$Response_days>5) %>% group_by(Name) %>%
summarise(count= count(Response_days))
我得到的错误是错误:
Problem with `summarise()` input `count`.
x no applicable method for 'count' applied to an object of class "c('double', 'numeric')"
i Input `count` is `count(Response_days)`.
i The error occurred in group 1: Name = "John".
有人能解释一下我哪里错了吗?另外,我的最终输出应该如下所示:
| Name | Response_days |
|------|---------------|
| John | 2 |
| Mike | 1 |
在 dplyr
-
中有几种方法可以做到这一点
library(dplyr)
#1.
df %>% filter(Response_days>5) %>% count(Name, name = 'Count')
#2.
df %>% group_by(Name) %>% summarise(count = sum(Response_days > 5))
在基数 R 中:
#1.
aggregate(Response_days~Name, subset(df, Response_days>5), length)
#2.
aggregate(Response_days~Name, df, function(x) sum(x > 5))
我们可以使用data.table
library(data.table)
setDT(df)[Response_days > 5, .(count = .N), Name]
或使用base R
table(subset(df, Response_days > 5)$Name)
我有一个类似于下面提供的示例数据集的数据集:
| Name | Response_days | state |
|------|---------------|-------|
| John | 0 | NY |
| John | 6 | NY |
| John | 9 | NY |
| Mike | 3 | CA |
| Mike | 7 | CA |
同样表示为:
Name = c("John","John", "John", "Mike", "Mike")
Response_days = c(0,6,9,3,7)
state= c("NY","NY","NY", "CA","CA")
df= data.frame(Name, Response_days, state, stringsAsFactors = TRUE)
df$Response_days= as.integer(df$Response_days)
我想对数据进行子集化,只查看 Response_days>5。之后我想按 'Name' 分组并计算 'Response_days' 的出现次数。我已经尝试了下面提到的代码,但它抛出了一个错误。
df1=subset(df, df$Response_days>5) %>% group_by(Name) %>%
summarise(count= count(Response_days))
我得到的错误是错误:
Problem with `summarise()` input `count`.
x no applicable method for 'count' applied to an object of class "c('double', 'numeric')"
i Input `count` is `count(Response_days)`.
i The error occurred in group 1: Name = "John".
有人能解释一下我哪里错了吗?另外,我的最终输出应该如下所示:
| Name | Response_days |
|------|---------------|
| John | 2 |
| Mike | 1 |
在 dplyr
-
library(dplyr)
#1.
df %>% filter(Response_days>5) %>% count(Name, name = 'Count')
#2.
df %>% group_by(Name) %>% summarise(count = sum(Response_days > 5))
在基数 R 中:
#1.
aggregate(Response_days~Name, subset(df, Response_days>5), length)
#2.
aggregate(Response_days~Name, df, function(x) sum(x > 5))
我们可以使用data.table
library(data.table)
setDT(df)[Response_days > 5, .(count = .N), Name]
或使用base R
table(subset(df, Response_days > 5)$Name)