此 dplyr group_by 代码的 Base R 等价物是什么？

Question

The R4DS book 具有以下代码块：

library(tidyverse)
by_age2 <- gss_cat %>%
  filter(!is.na(age)) %>%
  count(age, marital) %>%
  group_by(age) %>%
  mutate(prop = n / sum(n))

在 base R 中是否有与此代码等效的简单代码？把filter换成gss_cat[!is.na(gss_cat$age),]就可以了，不过以后我运行就麻烦了。这显然是 by、tapply 或 aggregate 的工作，但我一直无法找到正确的方法。 by(gss_2, with(gss_2, list(age, marital)), length) 是朝着正确方向迈出的一步，但结果很糟糕。

Answer 1

我们可以在 subsetting 之后的 table 输出上使用 proportions 来删除 NA (complete.cases) 和 selecting列

数据来自forcats包。所以，加载包并获取数据

library(forcats)
data(gss_cat)

使用上面提到的table/proportions

by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age), 
       select = c(age, marital))), 1)

-输出

head(by_age2_base, 3)
    marital
age    No answer Never married   Separated    Divorced     Widowed     Married
  18 0.000000000   0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
  19 0.000000000   0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
  20 0.000000000   0.904382470 0.003984064 0.007968127 0.000000000 0.083665339

-与OP的输出比较

head(by_age2, 3)
# A tibble: 3 x 4
# Groups:   age [2]
    age marital           n   prop
  <int> <fct>         <int>  <dbl>
1    18 Never married    89 0.978 
2    18 Married           2 0.0220
3    19 Never married   234 0.940

如果我们需要'long'格式的输出，用as.data.frame

将table转换成data.frame

by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)

或者另一个选项是 aggregate/ave（使用 R 4.1.0）

subset(gss_cat, complete.cases(age), select = c(age, marital)) |> 
    {\(dat) aggregate(cbind(n = age) ~ age + marital, 
      data = dat, FUN = length)}() |> 
   transform(prop = ave(n, age, FUN = \(x) x/sum(x)))

此 dplyr group_by 代码的 Base R 等价物是什么？

What is the Base R equivalent of this dplyr group_by code?

r

tapply

dplyr