R:面板数据中组/子集的汇总统计 - 代码和布局

R: Summary statistics for groups / subsets within panel data - code and layout

我有以下数据(发表在评论中):

现在我想要汇总统计数据。我只需要平均值和观察次数。汇总统计应根据产品的评级进行分组,评级 1 和 5 的均值检验应该有所不同。最后应该如下所示:

我遇到了 describeBy 函数。但是,问题是我无法得到我想要的布局(见图),而且我无法将评级 1 和 5 的平均值以及整个样本的平均值包括在平均值测试中的差异。

此外,我还尝试使用 stargazer 包。但是我遇到了类似的问题。

有人可以帮我吗?

您可以使用此 dplyr/tidy 管道:

library(tidyverse)

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(`Anova F-Test (p-value)` = c(sapply(dt %>% select(Revenue:Age), function(y) anova(lm(y~dt$Rating))$`Pr(>F)`[[1]]),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )

输出:

  name         Rating1 Rating2 Rating3 Rating4 Rating5 `Anova F-Test (p-value)` Total_means
  <chr>          <dbl>   <dbl>   <dbl>   <dbl>   <dbl>                    <dbl>       <dbl>
1 mean_Revenue     200   400       250     300     200                    0.742       289. 
2 mean_Costs        45    26.7      40      30      20                    0.196        33.3
3 mean_Age           2     3         4       4       2                    0.552         3  
4 Observations       2     3         2       1       1                   NA            NA  

22 年 4 月 22 日更新

  • 原始答案并未将方差分析限制为评级 1 和 5
# small function to get anova
get_anova <-function(y,rating, ratings=c(1,5)) {
  y_ = y[rating %in% ratings]
  x_ = rating[rating %in% ratings]
  anova(lm(y_~x_))$`Pr(>F)`[[1]]
}

dt %>%
  group_by(Rating) %>% 
  summarize(mean_Revenue = mean(Revenue),
            mean_Costs = mean(Costs),
            mean_Age = mean(Age),
            Observations=n()
  ) %>% 
  pivot_longer(cols = !Rating) %>% 
  pivot_wider(id_cols = "name",names_from = Rating,values_from = value,names_glue = "Rating{.name}") %>% 
  mutate(anova = c(sapply(dt %>% select(Revenue:Age), function(y) get_anova(y,rating=dt$Rating)),NA)) %>% 
  left_join(
    dt %>%  
      pivot_longer(cols=Revenue:Age) %>% 
      group_by(name = paste0("mean_",name)) %>% 
      summarize(Total_means=mean(value))
  )