R,加回拟合值plm(),拟合值小于回归中的观测值
R, add back fitted values plm(), the fitted values are fewer than the observations in the regression
我们正在使用 R 包 plm
的 plm()
函数进行面板回归,并希望将拟合值作为新列添加到进行回归的数据集中。
MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
HHI*rate + HHI*rate_lag1 + HHI*lag2,
data = market_power_merged, effect = "individual",
model = "within", index = c("firm", "date"))
当我们这样使用fitted(MP_regression)
时:
fitted_values <- fitted(MP_regression)
那么它产生的拟合值少于回归输入数据中的观察值。所以我们想按日期和公司将它们添加回 market_power_merged
数据框。由于拟合值较少(fitted()
函数出于某种原因产生),按日期和公司进行匹配很重要,这样我们就可以看到哪些观察值被排除在拟合函数中,或者删除那些拟合函数不产生值。
本质上我们想要:
market_power_merged <- mutate(fitted_values = fitted(MP_regression)
并按公司(个人)和日期(时间)匹配。
显然,fitted()
的 return 带有一个 index 属性,它是用于拟合值的面板组的数据框。因此,考虑将 index 属性上的 cbind
设置为拟合值,然后 运行 left_join
或 merge
(使用 all.x=TRUE
) 在原始数据框上:
fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))
为了演示内置的plm
数据框,Produc:
data("Produc", package = "plm")
# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
Produc[sample(nrow(Produc), 50), col] <- NA
}
results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of 2 variables:
# ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)
输出
head(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 1 ALABAMA 1970 6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5 4.7 -0.24591969
# 2 ALABAMA 1971 6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2 -0.22735513
# 3 ALABAMA 1972 6 15972.41 7765.42 1764.75 6442.23 NA 31303 1072.3 NA NA
# 4 ALABAMA 1973 <NA> NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9 NA
# 5 ALABAMA 1974 6 16762.67 8025.52 NA 7002.29 42057.31 33749 1169.8 5.5 -0.09272471
# 6 ALABAMA 1975 6 17316.26 8158.23 NA 7405.76 43971.71 33604 1155.4 7.7 -0.09806212
# 7 ALABAMA 1976 6 17732.86 NA 1799.74 7704.93 50221.57 35764 1207.0 6.8 -0.01841929
# 8 ALABAMA 1977 6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2 7.4 0.02047675
# 9 ALABAMA 1978 6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5 6.3 0.07225304
# 10 ALABAMA 1979 6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0 7.1 0.09364171
tail(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 807 WYOMING 1977 8 4037.03 2898.34 291.64 847.04 19977.67 9779 170.5 3.6 0.0871588
# 808 WYOMING 1978 8 4115.61 2920.85 294.73 900.04 20760.24 11038 187.4 NA NA
# 809 WYOMING 1979 8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7 2.8 0.2346269
# 810 WYOMING 1980 8 NA 2979.23 338.06 1082.40 22628.22 13027 210.2 4.0 NA
# 811 WYOMING 1981 8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5 4.1 0.3704301
# 812 WYOMING 1982 8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7 5.8 0.3595080
# 813 WYOMING 1983 8 4950.82 3119.98 445.59 NA 28586.46 11922 NA 8.4 NA
# 814 WYOMING 1984 8 5184.73 3195.68 476.57 NA 28794.80 12073 204.3 6.3 0.3199823
# 815 WYOMING 1985 8 5448.38 3295.92 523.01 1629.45 29326.94 12022 NA 7.1 NA
# 816 WYOMING 1986 8 5700.41 3400.96 565.58 1733.88 27110.51 NA 196.3 9.0 NA
我们正在使用 R 包 plm
的 plm()
函数进行面板回归,并希望将拟合值作为新列添加到进行回归的数据集中。
MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
HHI*rate + HHI*rate_lag1 + HHI*lag2,
data = market_power_merged, effect = "individual",
model = "within", index = c("firm", "date"))
当我们这样使用fitted(MP_regression)
时:
fitted_values <- fitted(MP_regression)
那么它产生的拟合值少于回归输入数据中的观察值。所以我们想按日期和公司将它们添加回 market_power_merged
数据框。由于拟合值较少(fitted()
函数出于某种原因产生),按日期和公司进行匹配很重要,这样我们就可以看到哪些观察值被排除在拟合函数中,或者删除那些拟合函数不产生值。
本质上我们想要:
market_power_merged <- mutate(fitted_values = fitted(MP_regression)
并按公司(个人)和日期(时间)匹配。
显然,fitted()
的 return 带有一个 index 属性,它是用于拟合值的面板组的数据框。因此,考虑将 index 属性上的 cbind
设置为拟合值,然后 运行 left_join
或 merge
(使用 all.x=TRUE
) 在原始数据框上:
fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))
为了演示内置的plm
数据框,Produc:
data("Produc", package = "plm")
# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
Produc[sample(nrow(Produc), 50), col] <- NA
}
results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc, index = c("state","year"))
fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of 2 variables:
# ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
# ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...
fitted_values_df <- cbind(attr(fitted_values_vec, "index"),
fitted_values = fitted_values_vec)
Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)
输出
head(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 1 ALABAMA 1970 6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5 4.7 -0.24591969
# 2 ALABAMA 1971 6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9 5.2 -0.22735513
# 3 ALABAMA 1972 6 15972.41 7765.42 1764.75 6442.23 NA 31303 1072.3 NA NA
# 4 ALABAMA 1973 <NA> NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5 3.9 NA
# 5 ALABAMA 1974 6 16762.67 8025.52 NA 7002.29 42057.31 33749 1169.8 5.5 -0.09272471
# 6 ALABAMA 1975 6 17316.26 8158.23 NA 7405.76 43971.71 33604 1155.4 7.7 -0.09806212
# 7 ALABAMA 1976 6 17732.86 NA 1799.74 7704.93 50221.57 35764 1207.0 6.8 -0.01841929
# 8 ALABAMA 1977 6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2 7.4 0.02047675
# 9 ALABAMA 1978 6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5 6.3 0.07225304
# 10 ALABAMA 1979 6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0 7.1 0.09364171
tail(Produc,10)
# state year region pcap hwy water util pc gsp emp unemp fitted_values
# 807 WYOMING 1977 8 4037.03 2898.34 291.64 847.04 19977.67 9779 170.5 3.6 0.0871588
# 808 WYOMING 1978 8 4115.61 2920.85 294.73 900.04 20760.24 11038 187.4 NA NA
# 809 WYOMING 1979 8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7 2.8 0.2346269
# 810 WYOMING 1980 8 NA 2979.23 338.06 1082.40 22628.22 13027 210.2 4.0 NA
# 811 WYOMING 1981 8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5 4.1 0.3704301
# 812 WYOMING 1982 8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7 5.8 0.3595080
# 813 WYOMING 1983 8 4950.82 3119.98 445.59 NA 28586.46 11922 NA 8.4 NA
# 814 WYOMING 1984 8 5184.73 3195.68 476.57 NA 28794.80 12073 204.3 6.3 0.3199823
# 815 WYOMING 1985 8 5448.38 3295.92 523.01 1629.45 29326.94 12022 NA 7.1 NA
# 816 WYOMING 1986 8 5700.41 3400.96 565.58 1733.88 27110.51 NA 196.3 9.0 NA