如何在 R 中进行非线性回归
How to do non-linear regression in R
我有一堆数据,我需要找到适合这些数据的 this equation 的参数值。
顺便说一句,我是 R 的完全初学者......
如何正确使用nls功能?
values <- read.csv(file.choose())
nls(y~A/cos(B*(C + x))^2 + D,
data =values, start = c(A = 1, B=1, C=0, D=0))
错误
Error in nls(y ~ A/cos(B * (C + x))^2 + D, data = values, start = c(A = 1, :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
数据
structure(list(X = c(212.8, 219.12, 226.07, 232.39, 239.97, 247.55,
254.5, 262.4, 269.67, 276.94, 283.89, 289.89, 297.15, 303.79,
310.11, 317.06, 322.75, 329.06, 335.38, 341.07, 347.7, 353.71,
359.39, 365.08, 371.71, 376.45, 382.77, 388.77, 394.46, 400.78,
406.78, 412.78, 419.1, 425.27, 431.27, 437.59, 442.96, 448.97,
454.65, 460.34, 465.39, 470.77, 477.08, 482.14, 486.56, 492.25,
497.3, 502.36, 507.41, 512.47, 517.52, 522.89, 528.58, 533.95,
539.32, 545.32, 550.7, 555.75, 561.44, 567.12, 571.86, 576.92,
581.97, 587.03, 605.67, 611.04, 620.2, 624.94, 643.58, 648.95,
658.43, 663.48, 673.28, 683.38, 688.12, 693.18, 697.92, 702.02,
706.45, 711.5, 715.61, 720.35, 724.14, 737.09, 742.15, 746.89,
750.99, 756.05, 760.16, 774.43, 779.17, 788.02, 797.5, 801.6,
810.45, 814.87, 823.09, 831.93, 840.46, 849.31, 862.58, 866.68,
871.42, 880.59, 885.01, 894.8, 907.44, 916.92, 925.13, 937.45,
949.77, 958.94, 972.2, 981.68, 991.16, 1000.32, 1018.96, 1031.92,
1044.87, 1057.82, 1066.98, 1080.88, 1096.05, 1109.96, 1120.39,
1129.55, 1138.71, 1148.19, 1157.35, 1166.2, 1176.31, 1186.1,
1196.21, 1207.27, 1223.38, 1239.49, 1249.92, 1265.4, 1275.19,
1290.67, 1301.73, 1312.79, 1325.11, 1336.48, 1348.8, 1359.86,
1372.5, 1378.5, 1390.19, 1401.88, 1413.57, 1424.63, 1437.58,
1449.65, 1463.23, 1476.5, 1490.72, 1504.62, 1518.84, 1525.79,
1532.42, 1539.06, 1552.01, 1565.91, 1586.76, 1600.98, 1632.89,
1658.48, 1666.69, 1674.27, 1683.12, 1692.28, 1700.5, 1718.5,
1727.03, 1735.25, 1745.99, 1754.2, 1763.68, 1773.78), Y = c(-806.78,
-805.83, -804.89, -804.25, -802.36, -801.73, -800.78, -799.83,
-797.94, -796.67, -795.41, -795.09, -793.83, -792.88, -792.25,
-791.3, -790.35, -789.72, -788.77, -788.14, -786.88, -786.25,
-785.61, -784.03, -783.4, -782.77, -782.14, -781.19, -780.56,
-780.24, -779.61, -778.66, -777.72, -777.47, -776.52, -775.89,
-775.26, -774.31, -773.68, -772.73, -772.1, -771.78, -771.78,
-771.15, -769.89, -769.26, -768.31, -768.31, -767.68, -767.36,
-767.04, -766.1, -765.47, -765.47, -764.52, -764.2, -763.57,
-762.94, -762.31, -761.67, -761.36, -761.04, -760.41, -760.09,
-758.83, -758.51, -756.94, -756.62, -755.99, -755.36, -754.72,
-754.09, -753.46, -751.88, -752.51, -752.51, -751.88, -751.56,
-751.56, -751.25, -751.25, -750.3, -750.3, -750.3, -749.67, -749.67,
-749.67, -749.04, -748.72, -748.15, -748.15, -747.52, -747.2,
-747.2, -746.89, -746.89, -746.89, -746.89, -746.25, -745.94,
-746.25, -746.57, -745.94, -745.62, -745.31, -745.31, -745.31,
-744.67, -745.31, -745.62, -745.62, -745.62, -746.25, -745.62,
-746.25, -746.25, -747.2, -747.2, -747.83, -748.47, -748.78,
-749.41, -750.04, -750.14, -751.41, -751.41, -752.36, -752.99,
-753.3, -754.25, -754.88, -754.88, -756.15, -756.78, -758.04,
-759.62, -760.57, -762.47, -763.41, -764.04, -764.99, -766.26,
-767.84, -769.1, -770.05, -771, -772.57, -773.21, -774.47, -776.37,
-777.63, -778.58, -780.47, -781.77, -783.35, -785.25, -787.14,
-788.41, -790.3, -790.94, -791.88, -793.15, -794.73, -796.94,
-799.47, -801.68, -806.42, -810.21, -811.15, -813.05, -814.31,
-815.58, -817.47, -819.37, -821.58, -822.53, -824.42, -826.32,
-827.58, -829.36)), class = "data.frame", row.names = c(NA, -180L
))
有数据...所以对我大吼大叫,因为代码太多,所以我需要输入更多单词以减少对我的愤怒。
您的函数定义过多。将 B 和 C 作为可调参数是多余的,并且会导致求解器在寻找最佳解决方案时遇到麻烦。如果您可以删除其中一个变量或定义一个固定值,那么 nls()
可以更轻松地找到解决方案。
您的等式变为 A/cos(BC + Bx)^2 +D。如果我们只想优化变量 B 和 C,它可以简化为 (BC + Bx).
让我们假设您的因变量是 K,它等于 (BC + Bx).
我们现在可以尝试求解 K = (B1C1 + B1x)。
我们现在可以为 B 选择一个随机值,比如 B1,并为 C 找到最小化误差 K= (B1C1 + B1x).
但是假设我们选择了不同的 B 值,比如 B2 存在不同的 C 值以最小化 K = (B2C2 + B2x).
因为 B 和 C 有无限多的解,nls()
产生了错误。
所以尝试:
values <- read.csv("testdata.csv")
C <- 1
nls(Y~A/cos(B*(C + X))^2 + D,
data =values, start = c(A = 1, B=1, D=1))
Nonlinear regression model
model: Y ~ A/cos(B * (C + X))^2 + D
data: values
A B D
0.03245 1.00005 -771.33115
residual sum-of-squares: 87192
Number of iterations to convergence: 23
Achieved convergence tolerance: 8.887e-06
可以为 C 尝试几个不同的值,或者固定 B 并求解 C。在所有情况下,仍然存在较大的残差 sum-of-squares 值。
这是一个更好的模型:
model_poly <- lm(Y ~ I(X^2)+ X +1, data=values)
values$y2<- predict(model_poly, values)
ggplot(values, aes(X, Y)) +geom_point() +
geom_point(data=values, aes(X, y2), color='red')
根据提供的数据,由于“x”和“X”以及“y”和“Y”不相等,我得到了一个错误。修复允许函数 运行 而没有错误:
> nls(y~A/cos(B*(C + x))^2 + D,
+ data =values, start = c(A = 1, B=1, C=0, D=0))
Error in nls(y ~ A/cos(B * (C + x))^2 + D, data = values, start = c(A = 1, :
parameters without starting value in 'data': y, x
> str(values)
'data.frame': 180 obs. of 2 variables:
$ X: num 213 219 226 232 240 ...
$ Y: num -807 -806 -805 -804 -802 ...
> nls(Y~A/cos(B*(C + X))^2 + D,
+ data =values, start = c(A = 1, B=1, C=0, D=0))
Nonlinear regression model
model: Y ~ A/cos(B * (C + X))^2 + D
data: values
A B C D
1.871e-04 1.000e+00 -2.615e-02 -7.713e+02
residual sum-of-squares: 86758
Number of iterations to convergence: 10
Achieved convergence tolerance: 9.838e-07
而且该模型的拟合度与 Dave2e 提供的模型的拟合度一样差。数据看起来严重抛物线:
我有一堆数据,我需要找到适合这些数据的 this equation 的参数值。
顺便说一句,我是 R 的完全初学者...... 如何正确使用nls功能?
values <- read.csv(file.choose())
nls(y~A/cos(B*(C + x))^2 + D,
data =values, start = c(A = 1, B=1, C=0, D=0))
错误
Error in nls(y ~ A/cos(B * (C + x))^2 + D, data = values, start = c(A = 1, :
step factor 0.000488281 reduced below 'minFactor' of 0.000976562
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
数据
structure(list(X = c(212.8, 219.12, 226.07, 232.39, 239.97, 247.55,
254.5, 262.4, 269.67, 276.94, 283.89, 289.89, 297.15, 303.79,
310.11, 317.06, 322.75, 329.06, 335.38, 341.07, 347.7, 353.71,
359.39, 365.08, 371.71, 376.45, 382.77, 388.77, 394.46, 400.78,
406.78, 412.78, 419.1, 425.27, 431.27, 437.59, 442.96, 448.97,
454.65, 460.34, 465.39, 470.77, 477.08, 482.14, 486.56, 492.25,
497.3, 502.36, 507.41, 512.47, 517.52, 522.89, 528.58, 533.95,
539.32, 545.32, 550.7, 555.75, 561.44, 567.12, 571.86, 576.92,
581.97, 587.03, 605.67, 611.04, 620.2, 624.94, 643.58, 648.95,
658.43, 663.48, 673.28, 683.38, 688.12, 693.18, 697.92, 702.02,
706.45, 711.5, 715.61, 720.35, 724.14, 737.09, 742.15, 746.89,
750.99, 756.05, 760.16, 774.43, 779.17, 788.02, 797.5, 801.6,
810.45, 814.87, 823.09, 831.93, 840.46, 849.31, 862.58, 866.68,
871.42, 880.59, 885.01, 894.8, 907.44, 916.92, 925.13, 937.45,
949.77, 958.94, 972.2, 981.68, 991.16, 1000.32, 1018.96, 1031.92,
1044.87, 1057.82, 1066.98, 1080.88, 1096.05, 1109.96, 1120.39,
1129.55, 1138.71, 1148.19, 1157.35, 1166.2, 1176.31, 1186.1,
1196.21, 1207.27, 1223.38, 1239.49, 1249.92, 1265.4, 1275.19,
1290.67, 1301.73, 1312.79, 1325.11, 1336.48, 1348.8, 1359.86,
1372.5, 1378.5, 1390.19, 1401.88, 1413.57, 1424.63, 1437.58,
1449.65, 1463.23, 1476.5, 1490.72, 1504.62, 1518.84, 1525.79,
1532.42, 1539.06, 1552.01, 1565.91, 1586.76, 1600.98, 1632.89,
1658.48, 1666.69, 1674.27, 1683.12, 1692.28, 1700.5, 1718.5,
1727.03, 1735.25, 1745.99, 1754.2, 1763.68, 1773.78), Y = c(-806.78,
-805.83, -804.89, -804.25, -802.36, -801.73, -800.78, -799.83,
-797.94, -796.67, -795.41, -795.09, -793.83, -792.88, -792.25,
-791.3, -790.35, -789.72, -788.77, -788.14, -786.88, -786.25,
-785.61, -784.03, -783.4, -782.77, -782.14, -781.19, -780.56,
-780.24, -779.61, -778.66, -777.72, -777.47, -776.52, -775.89,
-775.26, -774.31, -773.68, -772.73, -772.1, -771.78, -771.78,
-771.15, -769.89, -769.26, -768.31, -768.31, -767.68, -767.36,
-767.04, -766.1, -765.47, -765.47, -764.52, -764.2, -763.57,
-762.94, -762.31, -761.67, -761.36, -761.04, -760.41, -760.09,
-758.83, -758.51, -756.94, -756.62, -755.99, -755.36, -754.72,
-754.09, -753.46, -751.88, -752.51, -752.51, -751.88, -751.56,
-751.56, -751.25, -751.25, -750.3, -750.3, -750.3, -749.67, -749.67,
-749.67, -749.04, -748.72, -748.15, -748.15, -747.52, -747.2,
-747.2, -746.89, -746.89, -746.89, -746.89, -746.25, -745.94,
-746.25, -746.57, -745.94, -745.62, -745.31, -745.31, -745.31,
-744.67, -745.31, -745.62, -745.62, -745.62, -746.25, -745.62,
-746.25, -746.25, -747.2, -747.2, -747.83, -748.47, -748.78,
-749.41, -750.04, -750.14, -751.41, -751.41, -752.36, -752.99,
-753.3, -754.25, -754.88, -754.88, -756.15, -756.78, -758.04,
-759.62, -760.57, -762.47, -763.41, -764.04, -764.99, -766.26,
-767.84, -769.1, -770.05, -771, -772.57, -773.21, -774.47, -776.37,
-777.63, -778.58, -780.47, -781.77, -783.35, -785.25, -787.14,
-788.41, -790.3, -790.94, -791.88, -793.15, -794.73, -796.94,
-799.47, -801.68, -806.42, -810.21, -811.15, -813.05, -814.31,
-815.58, -817.47, -819.37, -821.58, -822.53, -824.42, -826.32,
-827.58, -829.36)), class = "data.frame", row.names = c(NA, -180L
))
有数据...所以对我大吼大叫,因为代码太多,所以我需要输入更多单词以减少对我的愤怒。
您的函数定义过多。将 B 和 C 作为可调参数是多余的,并且会导致求解器在寻找最佳解决方案时遇到麻烦。如果您可以删除其中一个变量或定义一个固定值,那么 nls()
可以更轻松地找到解决方案。
您的等式变为 A/cos(BC + Bx)^2 +D。如果我们只想优化变量 B 和 C,它可以简化为 (BC + Bx).
让我们假设您的因变量是 K,它等于 (BC + Bx).
我们现在可以尝试求解 K = (B1C1 + B1x)。
我们现在可以为 B 选择一个随机值,比如 B1,并为 C 找到最小化误差 K= (B1C1 + B1x).
但是假设我们选择了不同的 B 值,比如 B2 存在不同的 C 值以最小化 K = (B2C2 + B2x).
因为 B 和 C 有无限多的解,nls()
产生了错误。
所以尝试:
values <- read.csv("testdata.csv")
C <- 1
nls(Y~A/cos(B*(C + X))^2 + D,
data =values, start = c(A = 1, B=1, D=1))
Nonlinear regression model
model: Y ~ A/cos(B * (C + X))^2 + D
data: values
A B D
0.03245 1.00005 -771.33115
residual sum-of-squares: 87192
Number of iterations to convergence: 23
Achieved convergence tolerance: 8.887e-06
可以为 C 尝试几个不同的值,或者固定 B 并求解 C。在所有情况下,仍然存在较大的残差 sum-of-squares 值。
这是一个更好的模型:
model_poly <- lm(Y ~ I(X^2)+ X +1, data=values)
values$y2<- predict(model_poly, values)
ggplot(values, aes(X, Y)) +geom_point() +
geom_point(data=values, aes(X, y2), color='red')
根据提供的数据,由于“x”和“X”以及“y”和“Y”不相等,我得到了一个错误。修复允许函数 运行 而没有错误:
> nls(y~A/cos(B*(C + x))^2 + D,
+ data =values, start = c(A = 1, B=1, C=0, D=0))
Error in nls(y ~ A/cos(B * (C + x))^2 + D, data = values, start = c(A = 1, :
parameters without starting value in 'data': y, x
> str(values)
'data.frame': 180 obs. of 2 variables:
$ X: num 213 219 226 232 240 ...
$ Y: num -807 -806 -805 -804 -802 ...
> nls(Y~A/cos(B*(C + X))^2 + D,
+ data =values, start = c(A = 1, B=1, C=0, D=0))
Nonlinear regression model
model: Y ~ A/cos(B * (C + X))^2 + D
data: values
A B C D
1.871e-04 1.000e+00 -2.615e-02 -7.713e+02
residual sum-of-squares: 86758
Number of iterations to convergence: 10
Achieved convergence tolerance: 9.838e-07
而且该模型的拟合度与 Dave2e 提供的模型的拟合度一样差。数据看起来严重抛物线: