如何使用 group_by() 和 summarize() 统计数据点出现的次数?
How to use group_by() and summarize() to count the occurances of datapoints?
p <- data.frame(x = c("A", "B", "C", "A", "B"),
y = c("A", "B", "D", "A", "B"),
z = c("B", "C", "B", "D", "E"))
p
d <- p %>%
group_by(x) %>%
summarize(occurance1 = count(x),
occurance2 = count(y),
occurance3 = count(z),
total = occurance1 + occurance2 + occurance3)
d
输出:
A tibble: 3 x 5
x occurance1 occurance2 occurance3 total
<chr> <int> <int> <int> <int>
1 A 2 2 1 5
2 B 2 2 1 5
3 C 1 1 1 3
我有一个类似于上面的数据集,我试图在其中获取每一列中不同因素的计数。第一个完美运行,可能是因为它按 (x) 分组,但我 运行 遇到其他两行的各种问题。如您所见,它在 y 中根本不计算“D”,而是将其计算为“C”,而 z 中没有“A”,但 A 的计数为 1。帮助?
count
需要 data.frame/tibble
作为输入而不是向量。为了完成这项工作,我们可能需要使用 pivot_longer
重塑为 'long' 格式并在列上应用 count
,然后使用 adorn_totals
获取总列
library(dplyr)
library(tidyr)
library(janitor)
p %>%
pivot_longer(cols = everything()) %>%
count(name, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = 0) %>%
janitor::adorn_totals('col')
-输出
name A B C D E Total
x 2 2 1 0 0 5
y 2 2 0 1 0 5
z 0 2 1 1 1 5
除了 akrun 的解决方案之外,还有一个没有 janitor
使用 select_if
:
p %>%
pivot_longer(
cols = everything(),
names_to = "name",
values_to = "values"
) %>%
count(name,values) %>%
pivot_wider(names_from = values, values_from = n, values_fill = 0) %>%
ungroup() %>%
mutate(Total = rowSums(select_if(., is.integer), na.rm = TRUE))
name A B C D E Total
<chr> <int> <int> <int> <int> <int> <dbl>
1 x 2 2 1 0 0 5
2 y 2 2 0 1 0 5
3 z 0 2 1 1 1 5
p <- data.frame(x = c("A", "B", "C", "A", "B"),
y = c("A", "B", "D", "A", "B"),
z = c("B", "C", "B", "D", "E"))
p
d <- p %>%
group_by(x) %>%
summarize(occurance1 = count(x),
occurance2 = count(y),
occurance3 = count(z),
total = occurance1 + occurance2 + occurance3)
d
输出:
A tibble: 3 x 5
x occurance1 occurance2 occurance3 total
<chr> <int> <int> <int> <int>
1 A 2 2 1 5
2 B 2 2 1 5
3 C 1 1 1 3
我有一个类似于上面的数据集,我试图在其中获取每一列中不同因素的计数。第一个完美运行,可能是因为它按 (x) 分组,但我 运行 遇到其他两行的各种问题。如您所见,它在 y 中根本不计算“D”,而是将其计算为“C”,而 z 中没有“A”,但 A 的计数为 1。帮助?
count
需要 data.frame/tibble
作为输入而不是向量。为了完成这项工作,我们可能需要使用 pivot_longer
重塑为 'long' 格式并在列上应用 count
,然后使用 adorn_totals
获取总列
library(dplyr)
library(tidyr)
library(janitor)
p %>%
pivot_longer(cols = everything()) %>%
count(name, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = 0) %>%
janitor::adorn_totals('col')
-输出
name A B C D E Total
x 2 2 1 0 0 5
y 2 2 0 1 0 5
z 0 2 1 1 1 5
除了 akrun 的解决方案之外,还有一个没有 janitor
使用 select_if
:
p %>%
pivot_longer(
cols = everything(),
names_to = "name",
values_to = "values"
) %>%
count(name,values) %>%
pivot_wider(names_from = values, values_from = n, values_fill = 0) %>%
ungroup() %>%
mutate(Total = rowSums(select_if(., is.integer), na.rm = TRUE))
name A B C D E Total
<chr> <int> <int> <int> <int> <int> <dbl>
1 x 2 2 1 0 0 5
2 y 2 2 0 1 0 5
3 z 0 2 1 1 1 5