R table 函数：如何求和而不是计数？

Question

假设我在 R table 中有如下所示的数据：

Id  Name Price sales Profit Month Category Mode
1   A     2     5     8       1     X       K
1   A     2     6     9       2     X       K
1   A     2     5     8       3     X       K
1   B     2     4     6       1     Y       L
1   B     2     3     4       2     Y       L
1   B     2     5     7       3     Y       L
2   C     2     5    11       1     X       M
2   C     2     5    11       2     X       L
2   C     2     5    11       3     X       K
2   D     2     8    10       1     Y       M
2   D     2     8    10       2     Y       K
2   D     2     5    7        3     Y       K
3   E     2     5    9        1     Y       M
3   E     2     5    9        2     Y       L
3   E     2     5    9        3     Y       M
3   F     2     4    7        1     Z       M
3   F     2     5    8        2     Z       L
3   F     2     5    8        3     Z       M

如果我对这些数据使用 table 函数，例如：

table(df$Category, df$Mode)

它会告诉我在每个模式下哪个类别有多少观察值。就好比统计每种模式下每个类别的物品数量。

但是，如果我希望 table 在每个 Category 下显示哪个 Mode 赚了多少 Profit（总和或平均值）而不是总数怎么办？

有没有办法用 table 函数或 R 中的其他函数来做到这一点？

Answer 1

我们可以使用 base R 中的 xtabs。默认情况下，xtabs 获取 sum

xtabs(Profit~Category+Mode, df)
#           Mode
#Category  K  L  M
#       X 36 11 11
#       Y 17 26 28
#       Z  0  8 15

或者另一个更灵活地应用不同 FUN 的 base R 选项是 tapply。

with(df, tapply(Profit, list(Category, Mode), FUN=sum))
#  K  L  M
#X 36 11 11
#Y 17 26 28
#Z NA  8 15

或者我们可以使用dcast将'long'格式转换为'wide'格式。它更灵活，因为我们可以将 fun.aggregate 指定为 sum、mean、median 等

library(reshape2)
dcast(df, Category~Mode, value.var='Profit', sum)
# Category  K  L  M
#1        X 36 11 11
#2        Y 17 26 28
#3        Z  0  8 15

如果您需要 'long' 格式，这里有一个选项 data.table。我们将 'data.frame' 转换为 'data.table' (setDT(df))，按 'Category' 和 'Mode' 分组，我们得到 'Profit' 的 sum .

library(data.table)
setDT(df)[, list(Profit= sum(Profit)) , by = .(Category, Mode)]

Answer 2

另一种可能性是使用 aggregate() 函数：

profit_dat <- aggregate(Profit ~ Category + Mode, data=df, sum)
#> profit_dat
#  Category Mode Profit
#1        X    K     36
#2        Y    K     17
#3        X    L     11
#4        Y    L     26
#5        Z    L      8
#6        X    M     11
#7        Y    M     28
#8        Z    M     15

Answer 3

我更喜欢使用 dplyr（和 ggplot2）进行大多数数据分析：

library(dplyr)

group_by(df, Category, Mode) %>%
  summarise(sum = sum, count=n())

https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

R table 函数：如何求和而不是计数？

R table function: how to sum instead of counting?

aggregate

r