汇总 R 中的多个字段并抑制小于 x 的值

Question

我正在处理一个数据框，其中包含对有关对一组资源感兴趣的问题的数千个回复。我想通过计算正面响应的数量（编码为“1”）来总结有多少参与者对给定资源感兴趣。

作为最后一步，我想取消回复 <5 名参与者的任何答案。

我已经创建了有效的代码，但是当我处理几十个字段时它很笨拙。所以，我正在寻找更简化方法的建议，也许使用管道或 dplyr？

示例输入

ID	Resource1	Resource2	Resource3	Resource4
1	1	0	1	1
2	0	0	0	1
3	1	0	0	0
4	0	0	0	0
5	1	1	1	1

期望输出

	Interested	Not Interested
Resource1	3	2
Resource2	1	4
Resource3	2	3
Resource4	3	2

我的（丑）代码

###Select and summarise relevent columns
resource1 <- df %>% drop_na(resource1) %>% group_by(resource1) %>% summarise(n=n()) %>% rename(resp=resource1, r1 =n)
resource2 <- df %>% drop_na(resource2) %>% group_by(resource2) %>% summarise(n=n()) %>% rename(resp=resource2, r2 =n)
resource3 <- df %>% drop_na(resource3) %>% group_by(resource3) %>% summarise(n=n()) %>% rename(resp=resource3, r3 =n)
resource4 <- df %>% drop_na(resource4) %>% group_by(resource4) %>% summarise(n=n()) %>% rename(resp=resource4, r4 =n)

###Merge summarised data
resource_sum <-join_all(list(resource1,resource2,resource3,resource4), by=c("resp"))

###Replace all values less than 5 with NA per suppression rules. 
resource_sum <- apply(resource_sum, function(x) ifelse(x<5, "NA", x))
resource_sum <-as.data.frame(resource_sum)

Answer 1

我们可以用 pivot_longer 重塑成 'long' 格式，然后按 summarise 进行分组以获得 1 和 0 的计数

library(dplyr)
library(tidyr)
library(tibble)
df %>% 
   pivot_longer(cols = -ID) %>%
   group_by(name) %>%
   summarise(Interested = sum(value), NotInterested = n() - Interested) %>%
   column_to_rownames('name')

-输出

            Interested NotInterested
Resource1          3             2
Resource2          1             4
Resource3          2             3
Resource4          3             2

或使用base R

v1 <- colSums(df[-1])
cbind(Interested = v1, NotInterested = nrow(df) - v1)

-输出

          Interested NotInterested
Resource1          3             2
Resource2          1             4
Resource3          2             3
Resource4          3             2

数据

df <- structure(list(ID = 1:5, Resource1 = c(1L, 0L, 1L, 0L, 1L),
 Resource2 = c(0L, 
0L, 0L, 0L, 1L), Resource3 = c(1L, 0L, 0L, 0L, 1L), Resource4 = c(1L, 
1L, 0L, 0L, 1L)), class = "data.frame", row.names = c(NA, -5L
))

Answer 2

您可以使用 table 来获取 0 和 1 值的计数。要将函数 (table) 应用于多个列，您可以使用 sapply -

t(sapply(df[-1], table))

#          0 1
#Resource1 2 3
#Resource2 4 1
#Resource3 3 2
#Resource4 2 3

汇总 R 中的多个字段并抑制小于 x 的值

Summarize multiple fields in R and suppressing values less than x

r

pipe

dplyr

数据