R - 数据帧 (group_by/aggregate/pivot_wider) 操作
R - Dataframe (group_by/aggregate/pivot_wider) Manipulation
我目前有一个问题 manipulating/aggregating 我的数据框。我当前的数据框如下:
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22
12
100
30
25
Farm 1
2020
0
12
120
20
20
Farm 1
2019
16
6
80
10
16
Farm 1
2019
12
0
50
0
11
Farm 1
2018
8
0
0
16
0
Farm 1
2018
0
0
10
13
12
Farm 2
2020
31
28
27
10
14
Farm 2
2020
0
13
31
20
0
Farm 2
2019
3
31
0
20
43
Farm 2
2019
20
50
43
17
42
Farm 2
2018
39
33
0
48
10
Farm 2
2018
34
20
28
12
12
Farm 3
2020
27
0
37
30
42
Farm 3
2020
50
9
0
0
0
Farm 3
2019
0
19
0
20
16
Farm 3
2019
0
2
0
0
7
Farm 3
2018
0
0
5
27
0
Farm 3
2018
0
7
43
49
42
为简单起见,数据框的代码如下:
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,0,16,12,8,0,31,0,3,20,39,34,27,50,0,0,0,0)
Duck = c(12,12,6,0,0,0,28,13,31,50,33,20,0,9,19,2,0,7)
Chicken = c(100,120,80,50,0,10,27,31,0,43,0,28,37,0,0,0,5,43)
Sheep = c(30,20,10,0,16,13,10,20,20,17,48,12,30,0,20,0,27,49)
Horse = c(25,20,16,11,0,12,14,0,43,42,10,12,42,0,16,7,0,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
我可以检查是否有人知道如何使用 group_by and/or 聚合 and/or pivot_wider 或任何方法将数据框更改为下面的 table其他方法?下面的数据框按年份汇总了农场,并取了当年每只动物的平均值。
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
Average of 2020 = (22+0)/2 = 11
12
110
25
22.5
Farm 1
2019
14
3
65
5
13.5
Farm 1
2018
4
0
5
14.5
6
Farm 2
2020
15.5
20.5
29
15
7
Farm 2
2019
11.5
40.5
21.5
18.5
42.5
Farm 2
2018
36.5
26.5
14
30
11
Farm 3
2020
38.5
4.5
18.5
15
21
Farm 3
2019
0
10.5
0
10
11.5
Farm 3
2018
0
3.5
24
38
21
在此先感谢您,祝大家 2022 年快乐!
aggregate(.~Year + Farms, Data, mean)
Year Farms Cow Duck Chicken Sheep Horse
1 2018 Farm 1 4.0 0.0 5.0 14.5 6.0
2 2019 Farm 1 14.0 3.0 65.0 5.0 13.5
3 2020 Farm 1 11.0 12.0 110.0 25.0 22.5
4 2018 Farm 2 36.5 26.5 14.0 30.0 11.0
5 2019 Farm 2 11.5 40.5 21.5 18.5 42.5
6 2020 Farm 2 15.5 20.5 29.0 15.0 7.0
7 2018 Farm 3 0.0 3.5 24.0 38.0 21.0
8 2019 Farm 3 0.0 10.5 0.0 10.0 11.5
9 2020 Farm 3 38.5 4.5 18.5 15.0 21.0
aggregate(.~Farms + Year, Data, mean)
Farms Year Cow Duck Chicken Sheep Horse
1 Farm 1 2018 4.0 0.0 5.0 14.5 6.0
2 Farm 2 2018 36.5 26.5 14.0 30.0 11.0
3 Farm 3 2018 0.0 3.5 24.0 38.0 21.0
4 Farm 1 2019 14.0 3.0 65.0 5.0 13.5
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 3 2019 0.0 10.5 0.0 10.0 11.5
7 Farm 1 2020 11.0 12.0 110.0 25.0 22.5
8 Farm 2 2020 15.5 20.5 29.0 15.0 7.0
9 Farm 3 2020 38.5 4.5 18.5 15.0 21.0
Data%>%
group_by(Farms, Year) %>%
summarise(across(everything(), mean), .groups = 'drop')
# A tibble: 9 x 7
Farms Year Cow Duck Chicken Sheep Horse
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Farm 1 2018 4 0 5 14.5 6
2 Farm 1 2019 14 3 65 5 13.5
3 Farm 1 2020 11 12 110 25 22.5
4 Farm 2 2018 36.5 26.5 14 30 11
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 2 2020 15.5 20.5 29 15 7
7 Farm 3 2018 0 3.5 24 38 21
8 Farm 3 2019 0 10.5 0 10 11.5
9 Farm 3 2020 38.5 4.5 18.5 15 21
Onyambu 的回答很好。但小事 - 我知道你没有要求这个 - 你可能想考虑平均你想要平均数还是中位数统计数据。乍一看,数据似乎有些偏斜,中位数可能更适合您。
Data %>%
pivot_longer(names_to = 'names', values_to = 'values', 3:7) %>%
ggplot(aes(x = values)) + geom_density() + facet_wrap(~names)
我目前有一个问题 manipulating/aggregating 我的数据框。我当前的数据框如下:
Farm | Year | Cow | Duck | Chicken | Sheep | Horse |
---|---|---|---|---|---|---|
Farm 1 | 2020 | 22 | 12 | 100 | 30 | 25 |
Farm 1 | 2020 | 0 | 12 | 120 | 20 | 20 |
Farm 1 | 2019 | 16 | 6 | 80 | 10 | 16 |
Farm 1 | 2019 | 12 | 0 | 50 | 0 | 11 |
Farm 1 | 2018 | 8 | 0 | 0 | 16 | 0 |
Farm 1 | 2018 | 0 | 0 | 10 | 13 | 12 |
Farm 2 | 2020 | 31 | 28 | 27 | 10 | 14 |
Farm 2 | 2020 | 0 | 13 | 31 | 20 | 0 |
Farm 2 | 2019 | 3 | 31 | 0 | 20 | 43 |
Farm 2 | 2019 | 20 | 50 | 43 | 17 | 42 |
Farm 2 | 2018 | 39 | 33 | 0 | 48 | 10 |
Farm 2 | 2018 | 34 | 20 | 28 | 12 | 12 |
Farm 3 | 2020 | 27 | 0 | 37 | 30 | 42 |
Farm 3 | 2020 | 50 | 9 | 0 | 0 | 0 |
Farm 3 | 2019 | 0 | 19 | 0 | 20 | 16 |
Farm 3 | 2019 | 0 | 2 | 0 | 0 | 7 |
Farm 3 | 2018 | 0 | 0 | 5 | 27 | 0 |
Farm 3 | 2018 | 0 | 7 | 43 | 49 | 42 |
为简单起见,数据框的代码如下:
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,0,16,12,8,0,31,0,3,20,39,34,27,50,0,0,0,0)
Duck = c(12,12,6,0,0,0,28,13,31,50,33,20,0,9,19,2,0,7)
Chicken = c(100,120,80,50,0,10,27,31,0,43,0,28,37,0,0,0,5,43)
Sheep = c(30,20,10,0,16,13,10,20,20,17,48,12,30,0,20,0,27,49)
Horse = c(25,20,16,11,0,12,14,0,43,42,10,12,42,0,16,7,0,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
我可以检查是否有人知道如何使用 group_by and/or 聚合 and/or pivot_wider 或任何方法将数据框更改为下面的 table其他方法?下面的数据框按年份汇总了农场,并取了当年每只动物的平均值。
Farm | Year | Cow | Duck | Chicken | Sheep | Horse |
---|---|---|---|---|---|---|
Farm 1 | 2020 | Average of 2020 = (22+0)/2 = 11 | 12 | 110 | 25 | 22.5 |
Farm 1 | 2019 | 14 | 3 | 65 | 5 | 13.5 |
Farm 1 | 2018 | 4 | 0 | 5 | 14.5 | 6 |
Farm 2 | 2020 | 15.5 | 20.5 | 29 | 15 | 7 |
Farm 2 | 2019 | 11.5 | 40.5 | 21.5 | 18.5 | 42.5 |
Farm 2 | 2018 | 36.5 | 26.5 | 14 | 30 | 11 |
Farm 3 | 2020 | 38.5 | 4.5 | 18.5 | 15 | 21 |
Farm 3 | 2019 | 0 | 10.5 | 0 | 10 | 11.5 |
Farm 3 | 2018 | 0 | 3.5 | 24 | 38 | 21 |
在此先感谢您,祝大家 2022 年快乐!
aggregate(.~Year + Farms, Data, mean)
Year Farms Cow Duck Chicken Sheep Horse
1 2018 Farm 1 4.0 0.0 5.0 14.5 6.0
2 2019 Farm 1 14.0 3.0 65.0 5.0 13.5
3 2020 Farm 1 11.0 12.0 110.0 25.0 22.5
4 2018 Farm 2 36.5 26.5 14.0 30.0 11.0
5 2019 Farm 2 11.5 40.5 21.5 18.5 42.5
6 2020 Farm 2 15.5 20.5 29.0 15.0 7.0
7 2018 Farm 3 0.0 3.5 24.0 38.0 21.0
8 2019 Farm 3 0.0 10.5 0.0 10.0 11.5
9 2020 Farm 3 38.5 4.5 18.5 15.0 21.0
aggregate(.~Farms + Year, Data, mean)
Farms Year Cow Duck Chicken Sheep Horse
1 Farm 1 2018 4.0 0.0 5.0 14.5 6.0
2 Farm 2 2018 36.5 26.5 14.0 30.0 11.0
3 Farm 3 2018 0.0 3.5 24.0 38.0 21.0
4 Farm 1 2019 14.0 3.0 65.0 5.0 13.5
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 3 2019 0.0 10.5 0.0 10.0 11.5
7 Farm 1 2020 11.0 12.0 110.0 25.0 22.5
8 Farm 2 2020 15.5 20.5 29.0 15.0 7.0
9 Farm 3 2020 38.5 4.5 18.5 15.0 21.0
Data%>%
group_by(Farms, Year) %>%
summarise(across(everything(), mean), .groups = 'drop')
# A tibble: 9 x 7
Farms Year Cow Duck Chicken Sheep Horse
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Farm 1 2018 4 0 5 14.5 6
2 Farm 1 2019 14 3 65 5 13.5
3 Farm 1 2020 11 12 110 25 22.5
4 Farm 2 2018 36.5 26.5 14 30 11
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 2 2020 15.5 20.5 29 15 7
7 Farm 3 2018 0 3.5 24 38 21
8 Farm 3 2019 0 10.5 0 10 11.5
9 Farm 3 2020 38.5 4.5 18.5 15 21
Onyambu 的回答很好。但小事 - 我知道你没有要求这个 - 你可能想考虑平均你想要平均数还是中位数统计数据。乍一看,数据似乎有些偏斜,中位数可能更适合您。
Data %>%
pivot_longer(names_to = 'names', values_to = 'values', 3:7) %>%
ggplot(aes(x = values)) + geom_density() + facet_wrap(~names)