权重优化以找出 R 中的最小平方误差

Weight optimization to find out least squared error in R

我有实际值和四个不同的模型及其预测值和拟合值。有了这些拟合值,我想找到最佳权重,以便 (summation(wifi)-actuals)^2 最小化。这里 wi 是我想要找到的最佳权重,fi 是每个模型的拟合值。

我对 wi 的限制是;

  1. 权重必须大于 0,
  2. 权重必须小于 1,
  3. 权重总和必须为 1

我在这里看到了一个类似的例子 [https://stats.stackexchange.com/questions/385372/weight-optimization-in-order-to-maximize-correlation-r] 但我无法针对我的特定问题复制它。

让我们生成示例数据以更好地理解问题

actuals <- floor(runif(10, 500,1700)) 
model1_fitted <- floor(runif(10, 600,1800)) 
model2_fitted <- floor(runif(10, 400,1600)) 
model3_fitted <- floor(runif(10, 300,1500)) 
model4_fitted <- floor(runif(10, 300,1200)) 
sample_model <- data.frame(actuals, model1_fitted, model2_fitted,model3_fitted,model4_fitted)

现在,我需要以最佳方式找到 (w1,w2,w3,w4),以便最小化 (summation(wifi)-actuals)^2。我想保存权重,正如我提到的,我也有这四个模型的预测。如果我得到最佳权重,我对集成模型的预测值将是这些权重和预测值的线性函数。集成的第一个预测值如下所示,

ensemble_pred_1 = w1*model1_pred1+w2*model2_pred1+w3*model3_pred1+w4*model4_pred1

请帮助我找到最佳 wi,以便我可以根据需要生成集成模型。

根据优化问题框定你的问题并计算出所需的约束条件:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
set.seed(123)
model1_fitted <- floor(runif(10, 600,1800)) 
model2_fitted <- floor(runif(10, 400,1600)) 
model3_fitted <- floor(runif(10, 300,1500)) 
model4_fitted <- floor(runif(10, 300,1200)) 
w <- c(0.2,0.3,0.1,0.4) # sample coefficients
sample_model <- tibble(model1_fitted, model2_fitted,model3_fitted,model4_fitted) %>%
  mutate(actuals= as.vector(as.matrix(.) %*% w)  + rnorm(10,sd=10))


X <- as.matrix(sample_model[,1:4])
y <- as.matrix(sample_model[,5])

# From solve.QP description
# solving quadratic programming problems of the form min(-d^T b + 1/2 b^T D b) with the constraints A^T b >= b_0.

# Your problem
# Minimize       || Xw - y ||^2     => Minimize 1/2 w'X'Xw - (y'X)w  => D=X'X , d= X'y
# Constraint w>0,w<1, sum(w)=1      => A'w >= b0

d <- t(X) %*% y
D <- t(X) %*% X
A <- cbind(rep(1,4),diag(4)) #constraint LHS
b0 <- c(1,numeric(4)) # constraint RHS

library(quadprog)
soln <- solve.QP(D,d,A,b0,meq = 1)
w1 <- soln$solution  # Your model wieghts
w1
#> [1] 0.20996764 0.29773563 0.07146838 0.42082836

reprex package (v0.2.1)

于 2019-05-09 创建