创建两个分类和数值变量的数据透视表 Table

Create a Pivot Table of Two Categorical and Numerical Variables

我有以下假设的数据框

Region <- c("District A", "District B","District A","District A","District B")
Gender <- c("Male","Male","Female", "Male","Female")
Age <- c(20, 21, 23, 34, 22)
AmountSold <- c(50, 10, 20, 4, 12)
RegionSales <- data.frame(Region, Gender, Age, AmountSold)

我想创建一个数据透视表 table 或 table 来显示每个性别和地区的平均销售量以及每个性别和地区的平均年龄。我如何在 R 中做到这一点?

这将是我使用 dplyr 包的方法:

library(dplyr)

RegionSales %>%
  group_by(Region, Gender) %>%
  summarize(mean_age = mean(Age), mean_amount = mean(AmountSold))

输出:

# A tibble: 4 x 4
# Groups:   Region [2]
  Region     Gender mean_age mean_amount
  <chr>      <chr>     <dbl>       <dbl>
1 District A Female       23          20
2 District A Male         27          27
3 District B Female       22          12
4 District B Male         21          10

忽略 NA 值的选项:

RegionSales %>%
  group_by(Region, Gender) %>%
  summarize(mean_age = mean(Age, na.rm = T),
            mean_amount = mean(AmountSold, na.rm = T))

使用dplyr,另一种选择是在across

中指定变量
library(dplyr)
RegionSales %>%
    group_by(Region, Gender) %>%
    summarise(across(c(Age, AmountSold),
             ~ mean(., na.rm = TRUE), .names = "mean_{.col}"))

使用 aggregate 的基本选项可能会有所帮助

> aggregate(. ~ Region + Gender, RegionSales, mean)
      Region Gender Age AmountSold
1 District A Female  23         20
2 District B Female  22         12
3 District A   Male  27         27
4 District B   Male  21         10