计算面板数据 R 中的平均值

Compute Average values in panel data R

我对 R 很陌生。 我有不平衡的面板数据。 BvD_ID_Number 是每个公司的标识号,TotalAsset 是每个时间段(年)中资产负债表中总资产的值。 这里有一个概述:

structure(list(BvD_ID_Number = c("FR810911719", "FR810911719", 
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699", 
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016, 
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220, 
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

我想为每个 BvD_ID_Number 计算整个可用时间 window 的 TotalAsset 的平均值。

我使用了这段代码,但效果不佳:

library(dplyr)
df <- 
p_TotalAsset1 %>% 
  group_by(p_TotalAsset1$BvD_ID_Number) %>% 
   mutate(TotalAsset_Avegage = round(mean(p_TotalAsset1$TotalAsset)))

谢谢你的帮助。

您可以使用 summarizemutate:

使用 summarize 将完全汇总您的数据,仅提供分组变量(每个公司的 ID 号)和平均值。

df %>% 
  group_by(BvD_ID_Number) %>% 
  summarize(TotalAsset_Average = round(mean(TotalAsset),0))

这给了我们:

  BvD_ID_Number TotalAsset_Average
  <chr>                      <dbl>
1 FR810911719                1697 
2 GBFC024701               124191 
3 GBFC032748                 6318.
4 GBFC32536                 72912 
5 GBFC32699                 92878.

使用mutate

df %>% 
  group_by(BvD_ID_Number) %>% 
  mutate(TotalAsset_Average = round(mean(TotalAsset),0))

这给了我们:

# A tibble: 10 x 4
# Groups:   BvD_ID_Number [5]
   BvD_ID_Number  Year TotalAsset TotalAsset_Average
   <chr>         <dbl>      <dbl>              <dbl>
 1 FR810911719    2017       2220               1697
 2 FR810911719    2016       1174               1697
 3 GBFC024701     2018     124726             124191
 4 GBFC024701     2017     126010             124191
 5 GBFC024701     2016     121837             124191
 6 GBFC32536      2017      72912              72912
 7 GBFC32699      2016     111298              92878
 8 GBFC32699      2015      74457              92878
 9 GBFC032748     2017       6579               6318
10 GBFC032748     2016       6056               6318

数据:

structure(list(BvD_ID_Number = c("FR810911719", "FR810911719", 
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699", 
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016, 
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220, 
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))