R中数据的多项式模型
Polynomial model to data in R
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,632,720,814,920,1044,1186)
如何使用多项式线性回归找到以下年份的人口:1925、1963、1978、1988、1998。
这里是解决您问题的起点。
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,
1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,
632,720,814,920,1044,1186)
df <- data.frame(Year, Africa)
# Polynomial linear regression of order 5
model1 <- lm(Africa ~ poly(Year,5), data=df)
summary(model1)
###########
Call:
lm(formula = Africa ~ poly(Year, 5), data = df)
Residuals:
Min 1Q Median 3Q Max
-59.639 -27.119 -12.397 9.149 97.398
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 411.32 10.12 40.643 < 2e-16 ***
poly(Year, 5)1 881.26 47.47 18.565 3.01e-12 ***
poly(Year, 5)2 768.50 47.47 16.190 2.42e-11 ***
poly(Year, 5)3 709.43 47.47 14.945 8.07e-11 ***
poly(Year, 5)4 628.45 47.47 13.239 4.89e-10 ***
poly(Year, 5)5 359.04 47.47 7.564 1.14e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.47 on 16 degrees of freedom
Multiple R-squared: 0.9852, Adjusted R-squared: 0.9805
F-statistic: 212.5 on 5 and 16 DF, p-value: 4.859e-14
#############
pred <- predict(model1)
plot(Year, Africa, type="o", xlab="Year", ylab="Africa")
lines(Year, pred, lwd=2, col="red")
上面估计的模型不适合小于 1900 年的年份。因此最好只估计 1900 年之后的数据的模型。
# Polynomial linear regression of order 2
df2 <- subset(df,Year>1900)
model2 <- lm(Africa ~ poly(Year,2), data=df2)
summary(model2)
###########
Call:
lm(formula = Africa ~ poly(Year, 2), data = df2)
Residuals:
Min 1Q Median 3Q Max
-9.267 -2.489 -0.011 3.334 12.482
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 586.857 1.677 349.93 < 2e-16 ***
poly(Year, 2)1 1086.646 6.275 173.17 < 2e-16 ***
poly(Year, 2)2 245.687 6.275 39.15 3.65e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.275 on 11 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996
F-statistic: 1.576e+04 on 2 and 11 DF, p-value: < 2.2e-16
###########
df2$pred <- predict(model2)
plot(df2$Year, df2$Africa, type="o", xlab="Year", ylab="Africa")
lines(df2$Year, df2$pred, lwd=2, col="red")
第二个模型的拟合度明显更好:
最后,我们得到了1925年、1963年、1978年、1988年、1998年的模型预测。
df3 <- data.frame(Year=c(1925, 1963, 1978, 1988, 1998))
df3$pred <- predict(model2, newdata=df3)
df3
Year pred
1 1925 286.4863
2 1963 301.1507
3 1978 451.7210
4 1988 597.6301
5 1998 779.9623
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,632,720,814,920,1044,1186)
如何使用多项式线性回归找到以下年份的人口:1925、1963、1978、1988、1998。
这里是解决您问题的起点。
Year <- c(1000,1500,1600,1700,1750,1800,1850,1900,1950,1955,1960,1965,
1970,1975,1980,1985,1990,1995,2000,2005,2010,2015)
Africa <- c(70,86,114,106,106,107,111,133,229,254,285,322,366,416,478,550,
632,720,814,920,1044,1186)
df <- data.frame(Year, Africa)
# Polynomial linear regression of order 5
model1 <- lm(Africa ~ poly(Year,5), data=df)
summary(model1)
###########
Call:
lm(formula = Africa ~ poly(Year, 5), data = df)
Residuals:
Min 1Q Median 3Q Max
-59.639 -27.119 -12.397 9.149 97.398
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 411.32 10.12 40.643 < 2e-16 ***
poly(Year, 5)1 881.26 47.47 18.565 3.01e-12 ***
poly(Year, 5)2 768.50 47.47 16.190 2.42e-11 ***
poly(Year, 5)3 709.43 47.47 14.945 8.07e-11 ***
poly(Year, 5)4 628.45 47.47 13.239 4.89e-10 ***
poly(Year, 5)5 359.04 47.47 7.564 1.14e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.47 on 16 degrees of freedom
Multiple R-squared: 0.9852, Adjusted R-squared: 0.9805
F-statistic: 212.5 on 5 and 16 DF, p-value: 4.859e-14
#############
pred <- predict(model1)
plot(Year, Africa, type="o", xlab="Year", ylab="Africa")
lines(Year, pred, lwd=2, col="red")
上面估计的模型不适合小于 1900 年的年份。因此最好只估计 1900 年之后的数据的模型。
# Polynomial linear regression of order 2
df2 <- subset(df,Year>1900)
model2 <- lm(Africa ~ poly(Year,2), data=df2)
summary(model2)
###########
Call:
lm(formula = Africa ~ poly(Year, 2), data = df2)
Residuals:
Min 1Q Median 3Q Max
-9.267 -2.489 -0.011 3.334 12.482
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 586.857 1.677 349.93 < 2e-16 ***
poly(Year, 2)1 1086.646 6.275 173.17 < 2e-16 ***
poly(Year, 2)2 245.687 6.275 39.15 3.65e-13 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.275 on 11 degrees of freedom
Multiple R-squared: 0.9997, Adjusted R-squared: 0.9996
F-statistic: 1.576e+04 on 2 and 11 DF, p-value: < 2.2e-16
###########
df2$pred <- predict(model2)
plot(df2$Year, df2$Africa, type="o", xlab="Year", ylab="Africa")
lines(df2$Year, df2$pred, lwd=2, col="red")
第二个模型的拟合度明显更好:
最后,我们得到了1925年、1963年、1978年、1988年、1998年的模型预测。
df3 <- data.frame(Year=c(1925, 1963, 1978, 1988, 1998))
df3$pred <- predict(model2, newdata=df3)
df3
Year pred
1 1925 286.4863
2 1963 301.1507
3 1978 451.7210
4 1988 597.6301
5 1998 779.9623