如何降低三次方程的最大峰值(拟合)

How to reduce cubic equation's maximum peak (fitting)

根据效率收集通风量数据。取了几个样本并拟合到三次方程中。 写成Excel,得到第三个回归方程

但是从图中可以看出,90-95%的通风量高于100%。数据永远不会高于100%,但是自回归的最大顶点是凸的,所以超过了100%,呈曲线形式。

有没有办法减少最大顶点并适合它?直接使用实测数据,但不要超过100%。

也欢迎使用 R 或其他统计程序。 R值可以低一点。

谢谢。

以下是 R 中的一些想法:

首先,我制作了一些与您的相似的示例数据,并用 x^3、x^2 和 x 作为预测变量拟合了一个线性模型:

#  make example data
xx = rep(c(30, 50, 70, 100), each = 10)
yy = 1/(1+exp(-(xx-50)/15))  * 4798.20 + rnorm(length(xx), sd = 20)
xx = c(0, xx)
yy = c(0, yy)

# fit third-order linear model
m0 = lm(yy ~ I(xx^3) + I(xx^2) + xx)

x_to_predict = data.frame(xx = seq(0, 100, length.out = length(xx)))
lm_preds = predict(m0, newdata = x_to_predict)

想法 1:您可以拟合使用 S 形(或其他单调)曲线的模型。

# fit quasibinomial model for proportion
# first scale response variable between 0 and 1
m1 = glm(I(yy/max(yy)) ~ xx , family = quasibinomial())

# predict
preds_glm = predict(m1, 
                newdata = x_to_predict, 
                type = "response")

想法 2:拟合广义相加模型,生成平滑曲线。

# fit Generalized Additive Model
library(mgcv)
# you have to tune "k" somewhat -- larger means more "wiggliness"
m2 = gam(yy ~ s(xx, k = 4)) 
gam_preds = predict(m2, 
                    newdata = x_to_predict, 
        type = "response")

每个模型的图表如下所示:

# plot data and predictions
plot(xx, yy, ylab = "result", xlab = "efficiency")
lines(x_to_predict$xx, 
      preds_glm*max(yy), "l", col = 'red', lwd = 2)
lines(x_to_predict$xx, 
      gam_preds, "l", col = 'blue', lwd = 2)
lines(x_to_predict$xx, lm_preds, 
      "l", col = 'black', lwd = 2, lty = 2)
legend("bottomright", 
       lty = c(0, 1, 1, 2), 
       legend = c("data", "GLM prediction", "GAM prediction", "third-order lm"), 
       pch = c(1, NA_integer_, NA_integer_, NA_integer_), 
       col = c("black", "red", "blue", "black"))

我从散点图中提取数据,发现非常适合 Gompertz 类型的 S 形方程 "a * exp(-1.0 * exp((x - b)/c)) + Offset",提取的数据给出参数 a = -4.7537951574153149E+03,b = 5.4531406419707224E+01, c = 2.1494180901343391E+01,偏移量 = 4.4056239791186508E+03,产生 RMSE = 57.17 和 R 平方 = 0.9988,见下文。如果这看起来对您有用,我建议使用这些值作为初始参数估计重新拟合实际数据。