此 dplyr group_by 代码的 Base R 等价物是什么?
What is the Base R equivalent of this dplyr group_by code?
The R4DS book 具有以下代码块:
library(tidyverse)
by_age2 <- gss_cat %>%
filter(!is.na(age)) %>%
count(age, marital) %>%
group_by(age) %>%
mutate(prop = n / sum(n))
在 base R 中是否有与此代码等效的简单代码?把filter
换成gss_cat[!is.na(gss_cat$age),]
就可以了,不过以后我运行就麻烦了。这显然是 by
、tapply
或 aggregate
的工作,但我一直无法找到正确的方法。 by(gss_2, with(gss_2, list(age, marital)), length)
是朝着正确方向迈出的一步,但结果很糟糕。
我们可以在 subset
ting 之后的 table
输出上使用 proportions
来删除 NA
(complete.cases
) 和 select
ing列
数据来自forcats
包。所以,加载包并获取数据
library(forcats)
data(gss_cat)
使用上面提到的table/proportions
by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age),
select = c(age, marital))), 1)
-输出
head(by_age2_base, 3)
marital
age No answer Never married Separated Divorced Widowed Married
18 0.000000000 0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
19 0.000000000 0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
20 0.000000000 0.904382470 0.003984064 0.007968127 0.000000000 0.083665339
-与OP的输出比较
head(by_age2, 3)
# A tibble: 3 x 4
# Groups: age [2]
age marital n prop
<int> <fct> <int> <dbl>
1 18 Never married 89 0.978
2 18 Married 2 0.0220
3 19 Never married 234 0.940
如果我们需要'long'格式的输出,用as.data.frame
将table
转换成data.frame
by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)
或者另一个选项是 aggregate/ave
(使用 R 4.1.0
)
subset(gss_cat, complete.cases(age), select = c(age, marital)) |>
{\(dat) aggregate(cbind(n = age) ~ age + marital,
data = dat, FUN = length)}() |>
transform(prop = ave(n, age, FUN = \(x) x/sum(x)))
The R4DS book 具有以下代码块:
library(tidyverse)
by_age2 <- gss_cat %>%
filter(!is.na(age)) %>%
count(age, marital) %>%
group_by(age) %>%
mutate(prop = n / sum(n))
在 base R 中是否有与此代码等效的简单代码?把filter
换成gss_cat[!is.na(gss_cat$age),]
就可以了,不过以后我运行就麻烦了。这显然是 by
、tapply
或 aggregate
的工作,但我一直无法找到正确的方法。 by(gss_2, with(gss_2, list(age, marital)), length)
是朝着正确方向迈出的一步,但结果很糟糕。
我们可以在 subset
ting 之后的 table
输出上使用 proportions
来删除 NA
(complete.cases
) 和 select
ing列
数据来自forcats
包。所以,加载包并获取数据
library(forcats)
data(gss_cat)
使用上面提到的table/proportions
by_age2_base <- proportions(table(subset(gss_cat, complete.cases(age),
select = c(age, marital))), 1)
-输出
head(by_age2_base, 3)
marital
age No answer Never married Separated Divorced Widowed Married
18 0.000000000 0.978021978 0.000000000 0.000000000 0.000000000 0.021978022
19 0.000000000 0.939759036 0.000000000 0.012048193 0.004016064 0.044176707
20 0.000000000 0.904382470 0.003984064 0.007968127 0.000000000 0.083665339
-与OP的输出比较
head(by_age2, 3)
# A tibble: 3 x 4
# Groups: age [2]
age marital n prop
<int> <fct> <int> <dbl>
1 18 Never married 89 0.978
2 18 Married 2 0.0220
3 19 Never married 234 0.940
如果我们需要'long'格式的输出,用as.data.frame
table
转换成data.frame
by_age2_base_long <- subset(as.data.frame(by_age2_base), Freq > 0)
或者另一个选项是 aggregate/ave
(使用 R 4.1.0
)
subset(gss_cat, complete.cases(age), select = c(age, marital)) |>
{\(dat) aggregate(cbind(n = age) ~ age + marital,
data = dat, FUN = length)}() |>
transform(prop = ave(n, age, FUN = \(x) x/sum(x)))