R - 数据帧 (group_by/aggregate/pivot_wider) 操作

R - Dataframe (group_by/aggregate/pivot_wider) Manipulation

我目前有一个问题 manipulating/aggregating 我的数据框。我当前的数据框如下:

Farm Year Cow Duck Chicken Sheep Horse
Farm 1 2020 22 12 100 30 25
Farm 1 2020 0 12 120 20 20
Farm 1 2019 16 6 80 10 16
Farm 1 2019 12 0 50 0 11
Farm 1 2018 8 0 0 16 0
Farm 1 2018 0 0 10 13 12
Farm 2 2020 31 28 27 10 14
Farm 2 2020 0 13 31 20 0
Farm 2 2019 3 31 0 20 43
Farm 2 2019 20 50 43 17 42
Farm 2 2018 39 33 0 48 10
Farm 2 2018 34 20 28 12 12
Farm 3 2020 27 0 37 30 42
Farm 3 2020 50 9 0 0 0
Farm 3 2019 0 19 0 20 16
Farm 3 2019 0 2 0 0 7
Farm 3 2018 0 0 5 27 0
Farm 3 2018 0 7 43 49 42

为简单起见,数据框的代码如下:

Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,0,16,12,8,0,31,0,3,20,39,34,27,50,0,0,0,0)
Duck = c(12,12,6,0,0,0,28,13,31,50,33,20,0,9,19,2,0,7)
Chicken = c(100,120,80,50,0,10,27,31,0,43,0,28,37,0,0,0,5,43)
Sheep = c(30,20,10,0,16,13,10,20,20,17,48,12,30,0,20,0,27,49)
Horse = c(25,20,16,11,0,12,14,0,43,42,10,12,42,0,16,7,0,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)

我可以检查是否有人知道如何使用 group_by and/or 聚合 and/or pivot_wider 或任何方法将数据框更改为下面的 table其他方法?下面的数据框按年份汇总了农场,并取了当年每只动物的平均值。

Farm Year Cow Duck Chicken Sheep Horse
Farm 1 2020 Average of 2020 = (22+0)/2 = 11 12 110 25 22.5
Farm 1 2019 14 3 65 5 13.5
Farm 1 2018 4 0 5 14.5 6
Farm 2 2020 15.5 20.5 29 15 7
Farm 2 2019 11.5 40.5 21.5 18.5 42.5
Farm 2 2018 36.5 26.5 14 30 11
Farm 3 2020 38.5 4.5 18.5 15 21
Farm 3 2019 0 10.5 0 10 11.5
Farm 3 2018 0 3.5 24 38 21

在此先感谢您,祝大家 2022 年快乐!

aggregate(.~Year + Farms, Data, mean)
  Year  Farms  Cow Duck Chicken Sheep Horse
1 2018 Farm 1  4.0  0.0     5.0  14.5   6.0
2 2019 Farm 1 14.0  3.0    65.0   5.0  13.5
3 2020 Farm 1 11.0 12.0   110.0  25.0  22.5
4 2018 Farm 2 36.5 26.5    14.0  30.0  11.0
5 2019 Farm 2 11.5 40.5    21.5  18.5  42.5
6 2020 Farm 2 15.5 20.5    29.0  15.0   7.0
7 2018 Farm 3  0.0  3.5    24.0  38.0  21.0
8 2019 Farm 3  0.0 10.5     0.0  10.0  11.5
9 2020 Farm 3 38.5  4.5    18.5  15.0  21.0

aggregate(.~Farms + Year, Data, mean)
   Farms Year  Cow Duck Chicken Sheep Horse
1 Farm 1 2018  4.0  0.0     5.0  14.5   6.0
2 Farm 2 2018 36.5 26.5    14.0  30.0  11.0
3 Farm 3 2018  0.0  3.5    24.0  38.0  21.0
4 Farm 1 2019 14.0  3.0    65.0   5.0  13.5
5 Farm 2 2019 11.5 40.5    21.5  18.5  42.5
6 Farm 3 2019  0.0 10.5     0.0  10.0  11.5
7 Farm 1 2020 11.0 12.0   110.0  25.0  22.5
8 Farm 2 2020 15.5 20.5    29.0  15.0   7.0
9 Farm 3 2020 38.5  4.5    18.5  15.0  21.0

Data%>%
   group_by(Farms, Year) %>%
   summarise(across(everything(), mean), .groups = 'drop')
# A tibble: 9 x 7
  Farms   Year   Cow  Duck Chicken Sheep Horse
  <chr>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>
1 Farm 1  2018   4     0       5    14.5   6  
2 Farm 1  2019  14     3      65     5    13.5
3 Farm 1  2020  11    12     110    25    22.5
4 Farm 2  2018  36.5  26.5    14    30    11  
5 Farm 2  2019  11.5  40.5    21.5  18.5  42.5
6 Farm 2  2020  15.5  20.5    29    15     7  
7 Farm 3  2018   0     3.5    24    38    21  
8 Farm 3  2019   0    10.5     0    10    11.5
9 Farm 3  2020  38.5   4.5    18.5  15    21  

Onyambu 的回答很好。但小事 - 我知道你没有要求这个 - 你可能想考虑平均你想要平均数还是中位数统计数据。乍一看,数据似乎有些偏斜,中位数可能更适合您。

Data %>%
  pivot_longer(names_to = 'names', values_to = 'values', 3:7) %>%
  ggplot(aes(x = values)) + geom_density() + facet_wrap(~names)