R:面板数据中组/子集的汇总统计 - 代码和布局
R: Summary statistics for groups / subsets within panel data - code and layout
我有以下数据(发表在评论中):
现在我想要汇总统计数据。我只需要平均值和观察次数。汇总统计应根据产品的评级进行分组,评级 1 和 5 的均值检验应该有所不同。最后应该如下所示:
我遇到了 describeBy 函数。但是,问题是我无法得到我想要的布局(见图),而且我无法将评级 1 和 5 的平均值以及整个样本的平均值包括在平均值测试中的差异。
此外,我还尝试使用 stargazer 包。但是我遇到了类似的问题。
有人可以帮我吗?
您可以使用此 dplyr/tidy 管道:
library(tidyverse)
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)
输出:
name Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean_Revenue 200 400 250 300 200 0.742 289.
2 mean_Costs 45 26.7 40 30 20 0.196 33.3
3 mean_Age 2 3 4 4 2 0.552 3
4 Observations 2 3 2 1 1 NA NA
22 年 4 月 22 日更新
- 原始答案并未将方差分析限制为评级 1 和 5
# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
y_ = y[rating %in% ratings]
x_ = rating[rating %in% ratings]
anova(lm(y_~x_))$`Pr(>F)`[[1]]
}
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)
我有以下数据(发表在评论中):
现在我想要汇总统计数据。我只需要平均值和观察次数。汇总统计应根据产品的评级进行分组,评级 1 和 5 的均值检验应该有所不同。最后应该如下所示:
我遇到了 describeBy 函数。但是,问题是我无法得到我想要的布局(见图),而且我无法将评级 1 和 5 的平均值以及整个样本的平均值包括在平均值测试中的差异。
此外,我还尝试使用 stargazer 包。但是我遇到了类似的问题。
有人可以帮我吗?
您可以使用此 dplyr/tidy 管道:
library(tidyverse)
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)
输出:
name Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 mean_Revenue 200 400 250 300 200 0.742 289.
2 mean_Costs 45 26.7 40 30 20 0.196 33.3
3 mean_Age 2 3 4 4 2 0.552 3
4 Observations 2 3 2 1 1 NA NA
22 年 4 月 22 日更新
- 原始答案并未将方差分析限制为评级 1 和 5
# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
y_ = y[rating %in% ratings]
x_ = rating[rating %in% ratings]
anova(lm(y_~x_))$`Pr(>F)`[[1]]
}
dt %>%
group_by(Rating) %>%
summarize(mean_Revenue = mean(Revenue),
mean_Costs = mean(Costs),
mean_Age = mean(Age),
Observations=n()
) %>%
pivot_longer(cols = !Rating) %>%
pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>%
mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>%
left_join(
dt %>%
pivot_longer(cols=Revenue:Age) %>%
group_by(name = paste0("mean_",name)) %>%
summarize(Total_means=mean(value))
)