r 按组和组内汇总
r summary by group and within groups
如果这是我的数据集,按Subject
和Test
排列
ID Subjects Test Score Results
1 English 1 78 Pass
2 English 1 98 Pass
2 English 2 81 Pass
3 English 2 81 Pass
2 English 3 15 Fail
3 English 3 74 Pass
4 Physics 1 34 Fail
2 Physics 1 79 Pass
4 Physics 2 74 Fail
3 Physics 2 81 Pass
3 Physics 2 81 Pass
4 Physics 3 48 Fail
2 Physics 3 15 Fail
3 Physics 3 74 Pass
我有兴趣创建这样的摘要
Test1 Test2 Test3
Subject FailAverge %Fail FailAverge %Fail FailAverge %Fail
English 0 0 0 0 15 50
Physics 34 50 74 33% 31.5 66
- 按测试尝试分组的摘要(1,2,3)
- 每个主题的摘要
- 在每次测试尝试期间,失败的百分比和失败者的平均分数。例如,在 测试尝试 3 和物理 期间,三个学生中有两个学生不及格,因此失败百分比为 (2/3)*100,失败者的平均分数为 (48+15)/ 2
非常感谢任何帮助,谢谢。
我尝试使用 tidyverse 原则。要获得准确的格式,您可能需要一些 table 软件包(例如 GT),但下面的内容让您接近。
我将数据汇总到一个新的数据框中,然后使用更宽的数据透视表将行变成列,最后做了一些小的整理。
#recreate the table
df <- tribble(
~ID, ~Subjects, ~Test, ~Score, ~Results,
1, "English", 1, 78, "Pass",
2, "English", 1, 98, "Pass",
2, "English", 2, 81, "Pass",
3, "English", 2, 81, "Pass",
2, "English", 3, 15, "Fail",
3, "English", 3, 74, "Pass",
4, "Physics", 1, 34, "Fail",
2, "Physics", 1, 79, "Pass",
4, "Physics", 2, 74, "Fail",
3, "Physics", 2, 81, "Pass",
3, "Physics", 2, 81, "Pass",
4, "Physics", 3, 48, "Fail",
2, "Physics", 3, 15, "Fail",
3, "Physics", 3, 74, "Pass")
#create table to summarize the grouped data
df_fail <- df %>%
group_by(Subjects,Test) %>%
summarize(FailAverage=mean(Score[Results=="Fail"]),
Failper=mean(Results=="Fail",na.rm=TRUE))
#pivot wider the values, arrange the columns in order and then did some renaming
df_fail %>% pivot_wider(names_from = c(Test),
values_from = c(FailAverage,Failper)) %>%
relocate(Subjects,contains("1"),contains("2"),contains("3")) %>%
rename_with(.cols = c(-Subjects),.fn = ~gsub("_", "_test", .x))
如果这是我的数据集,按Subject
和Test
排列
ID Subjects Test Score Results
1 English 1 78 Pass
2 English 1 98 Pass
2 English 2 81 Pass
3 English 2 81 Pass
2 English 3 15 Fail
3 English 3 74 Pass
4 Physics 1 34 Fail
2 Physics 1 79 Pass
4 Physics 2 74 Fail
3 Physics 2 81 Pass
3 Physics 2 81 Pass
4 Physics 3 48 Fail
2 Physics 3 15 Fail
3 Physics 3 74 Pass
我有兴趣创建这样的摘要
Test1 Test2 Test3
Subject FailAverge %Fail FailAverge %Fail FailAverge %Fail
English 0 0 0 0 15 50
Physics 34 50 74 33% 31.5 66
- 按测试尝试分组的摘要(1,2,3)
- 每个主题的摘要
- 在每次测试尝试期间,失败的百分比和失败者的平均分数。例如,在 测试尝试 3 和物理 期间,三个学生中有两个学生不及格,因此失败百分比为 (2/3)*100,失败者的平均分数为 (48+15)/ 2
非常感谢任何帮助,谢谢。
我尝试使用 tidyverse 原则。要获得准确的格式,您可能需要一些 table 软件包(例如 GT),但下面的内容让您接近。
我将数据汇总到一个新的数据框中,然后使用更宽的数据透视表将行变成列,最后做了一些小的整理。
#recreate the table
df <- tribble(
~ID, ~Subjects, ~Test, ~Score, ~Results,
1, "English", 1, 78, "Pass",
2, "English", 1, 98, "Pass",
2, "English", 2, 81, "Pass",
3, "English", 2, 81, "Pass",
2, "English", 3, 15, "Fail",
3, "English", 3, 74, "Pass",
4, "Physics", 1, 34, "Fail",
2, "Physics", 1, 79, "Pass",
4, "Physics", 2, 74, "Fail",
3, "Physics", 2, 81, "Pass",
3, "Physics", 2, 81, "Pass",
4, "Physics", 3, 48, "Fail",
2, "Physics", 3, 15, "Fail",
3, "Physics", 3, 74, "Pass")
#create table to summarize the grouped data
df_fail <- df %>%
group_by(Subjects,Test) %>%
summarize(FailAverage=mean(Score[Results=="Fail"]),
Failper=mean(Results=="Fail",na.rm=TRUE))
#pivot wider the values, arrange the columns in order and then did some renaming
df_fail %>% pivot_wider(names_from = c(Test),
values_from = c(FailAverage,Failper)) %>%
relocate(Subjects,contains("1"),contains("2"),contains("3")) %>%
rename_with(.cols = c(-Subjects),.fn = ~gsub("_", "_test", .x))