计算面板数据 R 中的平均值
Compute Average values in panel data R
我对 R 很陌生。
我有不平衡的面板数据。 BvD_ID_Number 是每个公司的标识号,TotalAsset 是每个时间段(年)中资产负债表中总资产的值。
这里有一个概述:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
我想为每个 BvD_ID_Number 计算整个可用时间 window 的 TotalAsset 的平均值。
我使用了这段代码,但效果不佳:
library(dplyr)
df <-
p_TotalAsset1 %>%
group_by(p_TotalAsset1$BvD_ID_Number) %>%
mutate(TotalAsset_Avegage = round(mean(p_TotalAsset1$TotalAsset)))
谢谢你的帮助。
您可以使用 summarize
或 mutate
:
使用 summarize
将完全汇总您的数据,仅提供分组变量(每个公司的 ID 号)和平均值。
df %>%
group_by(BvD_ID_Number) %>%
summarize(TotalAsset_Average = round(mean(TotalAsset),0))
这给了我们:
BvD_ID_Number TotalAsset_Average
<chr> <dbl>
1 FR810911719 1697
2 GBFC024701 124191
3 GBFC032748 6318.
4 GBFC32536 72912
5 GBFC32699 92878.
使用mutate
df %>%
group_by(BvD_ID_Number) %>%
mutate(TotalAsset_Average = round(mean(TotalAsset),0))
这给了我们:
# A tibble: 10 x 4
# Groups: BvD_ID_Number [5]
BvD_ID_Number Year TotalAsset TotalAsset_Average
<chr> <dbl> <dbl> <dbl>
1 FR810911719 2017 2220 1697
2 FR810911719 2016 1174 1697
3 GBFC024701 2018 124726 124191
4 GBFC024701 2017 126010 124191
5 GBFC024701 2016 121837 124191
6 GBFC32536 2017 72912 72912
7 GBFC32699 2016 111298 92878
8 GBFC32699 2015 74457 92878
9 GBFC032748 2017 6579 6318
10 GBFC032748 2016 6056 6318
数据:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
我对 R 很陌生。 我有不平衡的面板数据。 BvD_ID_Number 是每个公司的标识号,TotalAsset 是每个时间段(年)中资产负债表中总资产的值。 这里有一个概述:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
我想为每个 BvD_ID_Number 计算整个可用时间 window 的 TotalAsset 的平均值。
我使用了这段代码,但效果不佳:
library(dplyr)
df <-
p_TotalAsset1 %>%
group_by(p_TotalAsset1$BvD_ID_Number) %>%
mutate(TotalAsset_Avegage = round(mean(p_TotalAsset1$TotalAsset)))
谢谢你的帮助。
您可以使用 summarize
或 mutate
:
使用 summarize
将完全汇总您的数据,仅提供分组变量(每个公司的 ID 号)和平均值。
df %>%
group_by(BvD_ID_Number) %>%
summarize(TotalAsset_Average = round(mean(TotalAsset),0))
这给了我们:
BvD_ID_Number TotalAsset_Average
<chr> <dbl>
1 FR810911719 1697
2 GBFC024701 124191
3 GBFC032748 6318.
4 GBFC32536 72912
5 GBFC32699 92878.
使用mutate
df %>%
group_by(BvD_ID_Number) %>%
mutate(TotalAsset_Average = round(mean(TotalAsset),0))
这给了我们:
# A tibble: 10 x 4
# Groups: BvD_ID_Number [5]
BvD_ID_Number Year TotalAsset TotalAsset_Average
<chr> <dbl> <dbl> <dbl>
1 FR810911719 2017 2220 1697
2 FR810911719 2016 1174 1697
3 GBFC024701 2018 124726 124191
4 GBFC024701 2017 126010 124191
5 GBFC024701 2016 121837 124191
6 GBFC32536 2017 72912 72912
7 GBFC32699 2016 111298 92878
8 GBFC32699 2015 74457 92878
9 GBFC032748 2017 6579 6318
10 GBFC032748 2016 6056 6318
数据:
structure(list(BvD_ID_Number = c("FR810911719", "FR810911719",
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699",
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016,
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220,
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))