如何根据 R 中的类别计算列中值的数量?

How to count number of values in columns based on a category in R?

假设我们有一个数据框df

book_id book_category book_word_hi book_word_bye book_word_yes
1       drama         3            0             4
2       action        1            4             5
3       drama         5            3             2

我想计算 book_word 列中值的数量,并将每个 book_category.

的值汇总到 table 中

所以这里的输出应该是这样的:

drama: 17 
action: 10

有人知道怎么做吗?

使用函数summarise_at link to docs:

df %>%
  summarise_at(c("book_word_hi","book_word_bye","book_word_yes"), sum, na.rm = FALSE)

也可以结合 group_by

这里也可以有其他的列如book_word_foo也会被统计:

library(tidyverse)

data <- tribble(
  ~book_id, ~book_category, ~book_word_hi, ~book_word_bye, ~book_word_yes,
  1, "drama", 3, 0, 4,
  2, "action", 1, 4, 5,
  2, "drama", 5, 3, 2,
)

data %>%
  pivot_longer(-c(book_id, book_category)) %>%
  group_by(book_category) %>%
  summarise(n = sum(value))
#> # A tibble: 2 × 2
#>   book_category     n
#>   <chr>         <dbl>
#> 1 action           10
#> 2 drama            17

reprex package (v2.0.0)

于 2022-05-05 创建

这是基于 R 的简短 one-liner,不需要任何额外的包。

tapply(rowSums(df[3:5]), df[2], sum)
#> book_category
#> action  drama 
#>     10     17 

首先总结starts_with字符串“book_word”的所有列。然后 group_bysum 增加每个 book_category 的值。

library(dplyr)

df %>% 
  mutate(book_sum = rowSums(across(starts_with("book_word")))) %>% 
  group_by(book_category) %>% 
  summarize(sum = sum(book_sum))

# A tibble: 2 × 2
  book_category   sum
  <chr>         <int>
1 action           10
2 drama            17

使用 base R

中的 aggregate
aggregate(book_sum ~ book_category, transform(data, book_sum = rowSums(data[3:5])), sum)
  book_category book_sum
1        action       10
2         drama       17