我怎样才能更广泛地旋转并转换我的数据框?
How can I pivot wider and transform my data frame?
我有一个这样的数据框:
tibble(
School = c(1, 1, 2, 3, 3, 4),
City = c("A","A", "B", "C", "C", "B"),
Grade = c("7th", "7th", "7th", "6th", "8th", "8th"),
Number_Students = c(20, 23, 25, 21, 28, 34),
Type_school = c("public", "public", "private", "public", "public", "private")
)
ID
School
City
Grade
Number_Students
Type_school
1
1
A
7th
20
public
2
1
A
7th
23
public
3
2
B
7th
25
private
4
3
C
6th
21
public
5
3
C
8th
28
public
6
4
B
8th
34
private
分析单位是教室,但我想把它变成一个数据框,其中分析单位是学校,但要进行一些计算。像这样:
tibble(
School = c(1, 2, 3, 4),
City = c("A", "B", "C", "B"),
N_6th = c(0, 0, 1, 0), # here is the number of grade 6h classrooms in each school
N_7th = c(2,1,0,0),
N_8th = c(0,0,1,1),
Students_6th = c(0, 0, 25, 0), # here is the number of students in grade 6th from each school (the sum of all 7th grade classrooms from each school)
Students_7th = c(43, 25, 0, 0),
Students_8th = c(0, 0, 28, 34),
Type_school = c("public", "private", "public", "private")
)
School
City
N_6th
N_7th
N_8th
Students_6th
Students_7th
Students_8th
Type_school
1
A
0
2
0
0
43
0
public
2
B
0
1
0
0
25
0
private
3
C
1
0
1
25
0
28
public
4
B
0
0
1
0
0
34
private
我正在尝试 pivot_wider(),但这不足以满足我的需要。我需要求和每个学校同年级的教室数和每个学校同年级的学生数
按 return 计数和 'Number_Students' 的 sum
进行分组,然后使用 pivot_wider
并将 names_from
指定为 'Grade' 和 values_from
作为列向量
library(dplyr)
library(tidyr)
df1 %>%
group_by(School, City, Grade, Type_school) %>%
summarise(N = n(), Students = sum(Number_Students), .groups = 'drop') %>%
pivot_wider(names_from = Grade, values_from = c(N, Students), values_fill = 0)
-输出
# A tibble: 4 × 9
School City Type_school N_7th N_6th N_8th Students_7th Students_6th Students_8th
<dbl> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 A public 2 0 0 43 0 0
2 2 B private 1 0 0 25 0 0
3 3 C public 0 1 1 0 21 28
4 4 B private 0 0 1 0 0 34
这是另一种方法:无法与 akrun 的完美方法相提并论,但它包含一些有趣的特征,我们如何获得相同的结果:
library(tidyr)
library(dplyr)
df1 <- df %>%
pivot_wider(id_cols = c(School, City, Grade, Type_school),
names_from = "Grade",
values_from = "Number_Students",
values_fn = list(Number_Students = length),
values_fill = 0,
names_glue = "N_{Grade}")
df %>%
pivot_wider(id_cols = c(School, City, Grade, Number_Students),
names_from = Grade,
values_from = Number_Students,
values_fn = list(Number_Students = sum),
names_glue = "Students_{Grade}"
) %>%
right_join(df1, by=c("School", "City"))
School City Students_7th Students_6th Students_8th Type_school N_7th N_6th N_8th
<dbl> <chr> <dbl> <dbl> <dbl> <chr> <int> <int> <int>
1 1 A 43 NA NA public 2 0 0
2 2 B 25 NA NA private 1 0 0
3 3 C NA 21 28 public 0 1 1
4 4 B NA NA 34 private 0 0 1
我有一个这样的数据框:
tibble(
School = c(1, 1, 2, 3, 3, 4),
City = c("A","A", "B", "C", "C", "B"),
Grade = c("7th", "7th", "7th", "6th", "8th", "8th"),
Number_Students = c(20, 23, 25, 21, 28, 34),
Type_school = c("public", "public", "private", "public", "public", "private")
)
ID | School | City | Grade | Number_Students | Type_school |
---|---|---|---|---|---|
1 | 1 | A | 7th | 20 | public |
2 | 1 | A | 7th | 23 | public |
3 | 2 | B | 7th | 25 | private |
4 | 3 | C | 6th | 21 | public |
5 | 3 | C | 8th | 28 | public |
6 | 4 | B | 8th | 34 | private |
分析单位是教室,但我想把它变成一个数据框,其中分析单位是学校,但要进行一些计算。像这样:
tibble(
School = c(1, 2, 3, 4),
City = c("A", "B", "C", "B"),
N_6th = c(0, 0, 1, 0), # here is the number of grade 6h classrooms in each school
N_7th = c(2,1,0,0),
N_8th = c(0,0,1,1),
Students_6th = c(0, 0, 25, 0), # here is the number of students in grade 6th from each school (the sum of all 7th grade classrooms from each school)
Students_7th = c(43, 25, 0, 0),
Students_8th = c(0, 0, 28, 34),
Type_school = c("public", "private", "public", "private")
)
School | City | N_6th | N_7th | N_8th | Students_6th | Students_7th | Students_8th | Type_school |
---|---|---|---|---|---|---|---|---|
1 | A | 0 | 2 | 0 | 0 | 43 | 0 | public |
2 | B | 0 | 1 | 0 | 0 | 25 | 0 | private |
3 | C | 1 | 0 | 1 | 25 | 0 | 28 | public |
4 | B | 0 | 0 | 1 | 0 | 0 | 34 | private |
我正在尝试 pivot_wider(),但这不足以满足我的需要。我需要求和每个学校同年级的教室数和每个学校同年级的学生数
按 return 计数和 'Number_Students' 的 sum
进行分组,然后使用 pivot_wider
并将 names_from
指定为 'Grade' 和 values_from
作为列向量
library(dplyr)
library(tidyr)
df1 %>%
group_by(School, City, Grade, Type_school) %>%
summarise(N = n(), Students = sum(Number_Students), .groups = 'drop') %>%
pivot_wider(names_from = Grade, values_from = c(N, Students), values_fill = 0)
-输出
# A tibble: 4 × 9
School City Type_school N_7th N_6th N_8th Students_7th Students_6th Students_8th
<dbl> <chr> <chr> <int> <int> <int> <dbl> <dbl> <dbl>
1 1 A public 2 0 0 43 0 0
2 2 B private 1 0 0 25 0 0
3 3 C public 0 1 1 0 21 28
4 4 B private 0 0 1 0 0 34
这是另一种方法:无法与 akrun 的完美方法相提并论,但它包含一些有趣的特征,我们如何获得相同的结果:
library(tidyr)
library(dplyr)
df1 <- df %>%
pivot_wider(id_cols = c(School, City, Grade, Type_school),
names_from = "Grade",
values_from = "Number_Students",
values_fn = list(Number_Students = length),
values_fill = 0,
names_glue = "N_{Grade}")
df %>%
pivot_wider(id_cols = c(School, City, Grade, Number_Students),
names_from = Grade,
values_from = Number_Students,
values_fn = list(Number_Students = sum),
names_glue = "Students_{Grade}"
) %>%
right_join(df1, by=c("School", "City"))
School City Students_7th Students_6th Students_8th Type_school N_7th N_6th N_8th
<dbl> <chr> <dbl> <dbl> <dbl> <chr> <int> <int> <int>
1 1 A 43 NA NA public 2 0 0
2 2 B 25 NA NA private 1 0 0
3 3 C NA 21 28 public 0 1 1
4 4 B NA NA 34 private 0 0 1