如何结合 count() 和 group_by() 来计算具有特定值的响应,按受访者分组?
How to combine count() and group_by() to count responses with a certain value, grouped by respondent?
我有一组数据,其中对一系列重复问题的回答是感兴趣的结果。因此,我想计算“我不知道”回复的数量,按受访者 ID 对这些计数进行分组,并将其作为新列追加。所以基本上,我的数据如下所示:
ID
response
1
Yes
1
I don't know
2
No
2
I don't know
我希望它们看起来像这样:
ID
response
idkcount
1
Yes
1
1
I don't know
1
2
No
1
2
I don't know
1
这是我最近写的代码:
df$idkcount <- group_by(as_tibble(df$ID)) %>% count(df$response == "I don't know")
但是无论我用这两个命令尝试什么,我似乎都会收到一条错误消息。我错过了什么?
使用 group_by
和 mutate
你可以:
注意:我将您的示例数据稍微修改为更一般的情况。
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L),
response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(idkcount = sum(response == "I don't know", na.rm = TRUE)) %>%
ungroup()
#> # A tibble: 6 × 3
#> ID response idkcount
#> <int> <chr> <int>
#> 1 1 Yes 3
#> 2 1 I don't know 3
#> 3 1 I don't know 3
#> 4 1 I don't know 3
#> 5 2 No 1
#> 6 2 I don't know 1
my_df <- data.frame("id" = c(1, 1, 2, 2, 3),
"response" = c("I don't know", "I don't know", "no", "I don't know", "maybe"),
stringsAsFactors = FALSE)
my_df <- my_df %>% group_by(id) %>% mutate(count = length(which(response == "I don't know")))
可能的解决方案(我正在使用@stefan 的数据集):
library(tidyverse)
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L),
response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)
df %>%
count(ID, response, name = "idkcount")
#> ID response idkcount
#> 1 1 I don't know 3
#> 2 1 Yes 1
#> 3 2 I don't know 1
#> 4 2 No 1
我有一组数据,其中对一系列重复问题的回答是感兴趣的结果。因此,我想计算“我不知道”回复的数量,按受访者 ID 对这些计数进行分组,并将其作为新列追加。所以基本上,我的数据如下所示:
ID | response |
---|---|
1 | Yes |
1 | I don't know |
2 | No |
2 | I don't know |
我希望它们看起来像这样:
ID | response | idkcount |
---|---|---|
1 | Yes | 1 |
1 | I don't know | 1 |
2 | No | 1 |
2 | I don't know | 1 |
这是我最近写的代码:
df$idkcount <- group_by(as_tibble(df$ID)) %>% count(df$response == "I don't know")
但是无论我用这两个命令尝试什么,我似乎都会收到一条错误消息。我错过了什么?
使用 group_by
和 mutate
你可以:
注意:我将您的示例数据稍微修改为更一般的情况。
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L),
response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)
library(dplyr)
df %>%
group_by(ID) %>%
mutate(idkcount = sum(response == "I don't know", na.rm = TRUE)) %>%
ungroup()
#> # A tibble: 6 × 3
#> ID response idkcount
#> <int> <chr> <int>
#> 1 1 Yes 3
#> 2 1 I don't know 3
#> 3 1 I don't know 3
#> 4 1 I don't know 3
#> 5 2 No 1
#> 6 2 I don't know 1
my_df <- data.frame("id" = c(1, 1, 2, 2, 3),
"response" = c("I don't know", "I don't know", "no", "I don't know", "maybe"),
stringsAsFactors = FALSE)
my_df <- my_df %>% group_by(id) %>% mutate(count = length(which(response == "I don't know")))
可能的解决方案(我正在使用@stefan 的数据集):
library(tidyverse)
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L),
response = c("Yes", "I don't know", "I don't know", "I don't know", "No", "I don't know")
)
df %>%
count(ID, response, name = "idkcount")
#> ID response idkcount
#> 1 1 I don't know 3
#> 2 1 Yes 1
#> 3 2 I don't know 1
#> 4 2 No 1