R，加回拟合值plm()，拟合值小于回归中的观测值

Question

我们正在使用 R 包 plm 的 plm() 函数进行面板回归，并希望将拟合值作为新列添加到进行回归的数据集中。

MP_regression <- plm(operating_exp ~ HHI + rate + rate_lag1 + rate_lag2 +
                   HHI*rate + HHI*rate_lag1 + HHI*lag2,
                 data = market_power_merged, effect = "individual",
                 model = "within", index = c("firm", "date"))

当我们这样使用fitted(MP_regression)时：

fitted_values <- fitted(MP_regression)

那么它产生的拟合值少于回归输入数据中的观察值。所以我们想按日期和公司将它们添加回 market_power_merged 数据框。由于拟合值较少（fitted() 函数出于某种原因产生），按日期和公司进行匹配很重要，这样我们就可以看到哪些观察值被排除在拟合函数中，或者删除那些拟合函数不产生值。

本质上我们想要：

market_power_merged <- mutate(fitted_values = fitted(MP_regression)

并按公司（个人）和日期（时间）匹配。

Answer 1

显然，fitted() 的 return 带有一个 index 属性，它是用于拟合值的面板组的数据框。因此，考虑将 index 属性上的 cbind 设置为拟合值，然后运行 left_join 或 merge（使用 all.x=TRUE ) 在原始数据框上：

fitted_values_vec <- fitted(MP_regression)
fitted_values_df <- cbind(attr(fitted_values_vec, "index"), 
                          fitted_values = fitted_values_vec)

Produc <- base::merge(Produc, fit_values, by=c("firm", "date"), all.x=TRUE)    
# Produc <- dplyr::left_join(Produc, fit_values, by=c("firm", "date"))

为了演示内置的plm数据框，Produc:

data("Produc", package = "plm")

# ASSIGN RANDOM NAs ACROSS NON-PANEL COLUMNS
set.seed(41120)
for(col in names(Produc)[!names(Produc) %in% c("state", "year")]) {
  Produc[sample(nrow(Produc), 50), col] <- NA
}

results <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
               data = Produc, index = c("state","year"))

fitted_values_vec <- fitted(results)
str(fitted_values_vec)
# 'pseries' Named num [1:588] -0.2459 -0.2274 -0.0927 -0.0981 -0.0184 ...
# - attr(*, "names")= chr [1:588] "ALABAMA" "ALABAMA" "ALABAMA" "ALABAMA" ...
# - attr(*, "index")=Classes ‘pindex’ and 'data.frame': 588 obs. of  2 variables:
#   ..$ state: Factor w/ 48 levels "ALABAMA","ARIZONA",..: 1 1 1 1 1 1 1 1 1 1 ...
#   ..$ year : Factor w/ 17 levels "1970","1971",..: 1 2 5 6 7 8 9 10 12 13 ...


fitted_values_df <- cbind(attr(fitted_values_vec, "index"), 
                          fitted_values = fitted_values_vec)

Produc <- merge(Produc, fitted_values_df, by= c("state","year"), all.x=TRUE)

输出

head(Produc,10)

#      state year region     pcap     hwy   water    util       pc   gsp    emp unemp fitted_values
# 1  ALABAMA 1970      6 15032.67 7325.80 1655.68 6051.20 35793.80 28418 1010.5   4.7   -0.24591969
# 2  ALABAMA 1971      6 15501.94 7525.94 1721.02 6254.98 37299.91 29375 1021.9   5.2   -0.22735513
# 3  ALABAMA 1972      6 15972.41 7765.42 1764.75 6442.23       NA 31303 1072.3    NA            NA
# 4  ALABAMA 1973   <NA>       NA 7907.66 1742.41 6756.19 40084.01 33430 1135.5   3.9            NA
# 5  ALABAMA 1974      6 16762.67 8025.52      NA 7002.29 42057.31 33749 1169.8   5.5   -0.09272471
# 6  ALABAMA 1975      6 17316.26 8158.23      NA 7405.76 43971.71 33604 1155.4   7.7   -0.09806212
# 7  ALABAMA 1976      6 17732.86      NA 1799.74 7704.93 50221.57 35764 1207.0   6.8   -0.01841929
# 8  ALABAMA 1977      6 18111.93 8365.67 1845.11 7901.15 51084.99 37463 1269.2   7.4    0.02047675
# 9  ALABAMA 1978      6 18479.74 8510.64 1960.51 8008.59 52604.05 39964 1336.5   6.3    0.07225304
# 10 ALABAMA 1979      6 18881.49 8640.61 2081.91 8158.97 54525.86 40979 1362.0   7.1    0.09364171

tail(Produc,10)

#       state year region    pcap     hwy  water    util       pc   gsp   emp unemp fitted_values
# 807 WYOMING 1977      8 4037.03 2898.34 291.64  847.04 19977.67  9779 170.5   3.6     0.0871588
# 808 WYOMING 1978      8 4115.61 2920.85 294.73  900.04 20760.24 11038 187.4    NA            NA
# 809 WYOMING 1979      8 4268.71 2950.53 313.47 1004.71 21643.50 11988 200.7   2.8     0.2346269
# 810 WYOMING 1980      8      NA 2979.23 338.06 1082.40 22628.22 13027 210.2   4.0            NA
# 811 WYOMING 1981      8 4572.67 3005.62 379.19 1187.86 26330.20 13717 223.5   4.1     0.3704301
# 812 WYOMING 1982      8 4731.98 3060.64 408.43 1262.90 27724.96 13056 217.7   5.8     0.3595080
# 813 WYOMING 1983      8 4950.82 3119.98 445.59      NA 28586.46 11922    NA   8.4            NA
# 814 WYOMING 1984      8 5184.73 3195.68 476.57      NA 28794.80 12073 204.3   6.3     0.3199823
# 815 WYOMING 1985      8 5448.38 3295.92 523.01 1629.45 29326.94 12022    NA   7.1            NA
# 816 WYOMING 1986      8 5700.41 3400.96 565.58 1733.88 27110.51    NA 196.3   9.0            NA

R，加回拟合值plm()，拟合值小于回归中的观测值

R, add back fitted values plm(), the fitted values are fewer than the observations in the regression

r

dataframe

panel-data

plm