如何使用 group_by() 和 summarize() 统计数据点出现的次数？

Question

p <- data.frame(x = c("A", "B", "C", "A", "B"), 
                y = c("A", "B", "D", "A", "B"), 
                z = c("B", "C", "B", "D", "E"))
p

d <- p %>%  
  group_by(x) %>% 
  summarize(occurance1 = count(x),
            occurance2 = count(y),
            occurance3 = count(z),
            total = occurance1 + occurance2 + occurance3)
d

输出：

A tibble: 3 x 5

  x     occurance1 occurance2 occurance3 total

  <chr>      <int>      <int>      <int> <int>

1 A              2          2          1     5

2 B              2          2          1     5

3 C              1          1          1     3

我有一个类似于上面的数据集，我试图在其中获取每一列中不同因素的计数。第一个完美运行，可能是因为它按 (x) 分组，但我运行遇到其他两行的各种问题。如您所见，它在 y 中根本不计算“D”，而是将其计算为“C”，而 z 中没有“A”，但 A 的计数为 1。帮助？

Answer 1

count 需要 data.frame/tibble 作为输入而不是向量。为了完成这项工作，我们可能需要使用 pivot_longer 重塑为 'long' 格式并在列上应用 count，然后使用 adorn_totals 获取总列

library(dplyr)
library(tidyr)
library(janitor)
p %>% 
    pivot_longer(cols = everything()) %>% 
    count(name, value) %>% 
    pivot_wider(names_from = value, values_from = n, values_fill = 0) %>% 
    janitor::adorn_totals('col')

-输出

  name A B C D E Total
    x 2 2 1 0 0     5
    y 2 2 0 1 0     5
    z 0 2 1 1 1     5

Answer 2

除了 akrun 的解决方案之外，还有一个没有 janitor 使用 select_if:

p %>% 
  pivot_longer(
    cols = everything(),
    names_to = "name",
    values_to = "values"
  ) %>% 
  count(name,values) %>% 
  pivot_wider(names_from = values, values_from = n, values_fill = 0) %>% 
  ungroup() %>% 
  mutate(Total = rowSums(select_if(., is.integer), na.rm = TRUE))

  name      A     B     C     D     E Total
  <chr> <int> <int> <int> <int> <int> <dbl>
1 x         2     2     1     0     0     5
2 y         2     2     0     1     0     5
3 z         0     2     1     1     1     5

如何使用 group_by() 和 summarize() 统计数据点出现的次数？

How to use group_by() and summarize() to count the occurances of datapoints?

group-by

r

count

summarize