r 按组和组内汇总

Question

如果这是我的数据集，按Subject和Test排列

ID    Subjects   Test   Score     Results
1     English    1      78        Pass
2     English    1      98        Pass    

2     English    2      81        Pass
3     English    2      81        Pass

2     English    3      15        Fail 
3     English    3      74        Pass

4     Physics    1      34        Fail
2     Physics    1      79        Pass

4     Physics    2      74        Fail
3     Physics    2      81        Pass   
3     Physics    2      81        Pass

4     Physics    3      48        Fail    
2     Physics    3      15        Fail
3     Physics    3      74        Pass

我有兴趣创建这样的摘要

           Test1                   Test2                  Test3
Subject    FailAverge   %Fail      FailAverge   %Fail     FailAverge   %Fail
English    0            0          0            0         15           50
Physics    34           50         74           33%       31.5         66

按测试尝试分组的摘要(1,2,3)
每个主题的摘要
在每次测试尝试期间，失败的百分比和失败者的平均分数。例如，在 测试尝试 3 和物理 期间，三个学生中有两个学生不及格，因此失败百分比为 (2/3)*100，失败者的平均分数为 (48+15)/ 2

非常感谢任何帮助，谢谢。

Answer 1

我尝试使用 tidyverse 原则。要获得准确的格式，您可能需要一些 table 软件包（例如 GT），但下面的内容让您接近。

我将数据汇总到一个新的数据框中，然后使用更宽的数据透视表将行变成列，最后做了一些小的整理。

#recreate the table
df <- tribble(
~ID,    ~Subjects,   ~Test,   ~Score,     ~Results,
1,     "English",    1,      78,        "Pass",
2,     "English",    1,      98,        "Pass",    
2,     "English",    2,      81,        "Pass",
3,     "English",    2,      81,        "Pass",
2,     "English",    3,      15,        "Fail", 
3,     "English",    3,      74,        "Pass",
4,     "Physics",    1,      34,        "Fail",
2,     "Physics",    1,      79,        "Pass",
4,     "Physics",    2,      74,        "Fail",
3,     "Physics",    2,      81,        "Pass",   
3,     "Physics",    2,      81,        "Pass",
4,     "Physics",    3,      48,        "Fail",    
2,    "Physics",   3,      15,        "Fail",
3,     "Physics",    3,      74,        "Pass") 

#create table to summarize the grouped data
df_fail <- df %>% 
  group_by(Subjects,Test) %>% 
  summarize(FailAverage=mean(Score[Results=="Fail"]),
            Failper=mean(Results=="Fail",na.rm=TRUE))


#pivot wider the values, arrange the columns in order and then did some renaming
df_fail %>% pivot_wider(names_from = c(Test),
                        values_from = c(FailAverage,Failper)) %>%
  relocate(Subjects,contains("1"),contains("2"),contains("3")) %>%
  rename_with(.cols = c(-Subjects),.fn = ~gsub("_", "_test", .x))

r 按组和组内汇总

r summary by group and within groups

r

summary

dplyr

tidyverse