返回 NA 的模型总结
Summary of model returning NA
我是 r 的新手,不确定如何修复我遇到的错误。
这是我的数据摘要:
> summary(data)
Metro MrktRgn MedAge numHmSales
Abilene : 1 Austin-Waco-Hill Country : 6 20-25: 3 Min. : 302
Amarillo : 1 Far West Texas : 1 25-30: 6 1st Qu.: 1057
Arlington: 1 Gulf Coast - Brazos Bottom:10 30-35:28 Median : 2098
Austin : 1 Northeast Texas :14 35-40: 6 Mean : 7278
Bay Area : 1 Panhandle and South Plains: 5 45-50: 2 3rd Qu.: 5086
Beaumont : 1 South Texas : 7 50-55: 1 Max. :83174
(Other) :40 West Texas : 3
AvgSlPr totNumLs MedHHInc Pop
Min. :123833 Min. : 1257 Min. :37300 Min. : 2899
1st Qu.:149117 1st Qu.: 6028 1st Qu.:53100 1st Qu.: 56876
Median :171667 Median : 11106 Median :57000 Median : 126482
Mean :188637 Mean : 24302 Mean :60478 Mean : 296529
3rd Qu.:215175 3rd Qu.: 25472 3rd Qu.:66200 3rd Qu.: 299321
Max. :303475 Max. :224230 Max. :99205 Max. :2196000
NA's :1
然后我制作了一个模型,其中 AvSlPr 作为 y 变量,其他变量作为 x 变量
> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)
但是当我对模型进行总结时,我得到了 Std 的 NA。误差、t 值和 t p 值。
> summary(model1)
Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales +
totNumLs + MedHHInc + Pop)
Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 143175 NA NA NA
MetroAmarillo 24925 NA NA NA
MetroArlington 35258 NA NA NA
MetroAustin 160300 NA NA NA
MetroBay Area 68642 NA NA NA
MetroBeaumont 5942 NA NA NA
...
MrktRgnWest Texas NA NA NA NA
MedAge25-30 NA NA NA NA
MedAge30-35 NA NA NA NA
MedAge35-40 NA NA NA NA
MedAge45-50 NA NA NA NA
MedAge50-55 NA NA NA NA
numHmSales NA NA NA NA
totNumLs NA NA NA NA
MedHHInc NA NA NA NA
Pop NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 44 and 0 DF, p-value: NA
有谁知道出了什么问题以及我该如何解决这个问题?另外,我不应该使用虚拟变量。
您的 Metro
变量始终引用每个因子水平的一行。您至少需要两个点才能拟合一条线。让我举个例子:
dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.33801 NA NA NA
#MetroB 0.47350 NA NA NA
#MetroC -0.04118 NA NA NA
#MetroD 0.20047 NA NA NA
#MrktRgn NA NA NA NA
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA
但是如果我们加入更多的数据,使得至少有一些因子水平有不止一行的数据,就可以计算出线性模型:
dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
# 1 2 3 4 5 6 7
# 9.021e-17 2.643e-01 7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03 1.498e-01
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.24279 0.30406 0.798 0.50834
#MetroB -0.10207 0.38858 -0.263 0.81739
#MetroC -0.06696 0.39471 -0.170 0.88090
#MetroD 0.06804 0.41243 0.165 0.88413
#MrktRgn 0.70787 0.06747 10.491 0.00896 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared: 0.9857, Adjusted R-squared: 0.9571
#F-statistic: 34.45 on 4 and 2 DF, p-value: 0.02841
用于拟合模型的数据需要重新考虑。分析的目标是什么?实现目标需要哪些数据?
我是 r 的新手,不确定如何修复我遇到的错误。
这是我的数据摘要:
> summary(data)
Metro MrktRgn MedAge numHmSales
Abilene : 1 Austin-Waco-Hill Country : 6 20-25: 3 Min. : 302
Amarillo : 1 Far West Texas : 1 25-30: 6 1st Qu.: 1057
Arlington: 1 Gulf Coast - Brazos Bottom:10 30-35:28 Median : 2098
Austin : 1 Northeast Texas :14 35-40: 6 Mean : 7278
Bay Area : 1 Panhandle and South Plains: 5 45-50: 2 3rd Qu.: 5086
Beaumont : 1 South Texas : 7 50-55: 1 Max. :83174
(Other) :40 West Texas : 3
AvgSlPr totNumLs MedHHInc Pop
Min. :123833 Min. : 1257 Min. :37300 Min. : 2899
1st Qu.:149117 1st Qu.: 6028 1st Qu.:53100 1st Qu.: 56876
Median :171667 Median : 11106 Median :57000 Median : 126482
Mean :188637 Mean : 24302 Mean :60478 Mean : 296529
3rd Qu.:215175 3rd Qu.: 25472 3rd Qu.:66200 3rd Qu.: 299321
Max. :303475 Max. :224230 Max. :99205 Max. :2196000
NA's :1
然后我制作了一个模型,其中 AvSlPr 作为 y 变量,其他变量作为 x 变量
> model1 = lm(AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales + totNumLs + MedHHInc + Pop)
但是当我对模型进行总结时,我得到了 Std 的 NA。误差、t 值和 t p 值。
> summary(model1)
Call:
lm(formula = AvgSlPr ~ Metro + MrktRgn + MedAge + numHmSales +
totNumLs + MedHHInc + Pop)
Residuals:
ALL 45 residuals are 0: no residual degrees of freedom!
Coefficients: (15 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 143175 NA NA NA
MetroAmarillo 24925 NA NA NA
MetroArlington 35258 NA NA NA
MetroAustin 160300 NA NA NA
MetroBay Area 68642 NA NA NA
MetroBeaumont 5942 NA NA NA
...
MrktRgnWest Texas NA NA NA NA
MedAge25-30 NA NA NA NA
MedAge30-35 NA NA NA NA
MedAge35-40 NA NA NA NA
MedAge45-50 NA NA NA NA
MedAge50-55 NA NA NA NA
numHmSales NA NA NA NA
totNumLs NA NA NA NA
MedHHInc NA NA NA NA
Pop NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 44 and 0 DF, p-value: NA
有谁知道出了什么问题以及我该如何解决这个问题?另外,我不应该使用虚拟变量。
您的 Metro
变量始终引用每个因子水平的一行。您至少需要两个点才能拟合一条线。让我举个例子:
dat = data.frame(AvgSlPr=runif(4), Metro = factor(LETTERS[1:4]), MrktRgn = runif(4))
model1 = lm(AvgSlPr ~ Metro + MrktRgn, data = dat)
summary(model1)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
#ALL 4 residuals are 0: no residual degrees of freedom!
#Coefficients: (1 not defined because of singularities)
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.33801 NA NA NA
#MetroB 0.47350 NA NA NA
#MetroC -0.04118 NA NA NA
#MetroD 0.20047 NA NA NA
#MrktRgn NA NA NA NA
#Residual standard error: NaN on 0 degrees of freedom
#Multiple R-squared: 1, Adjusted R-squared: NaN
#F-statistic: NaN on 3 and 0 DF, p-value: NA
但是如果我们加入更多的数据,使得至少有一些因子水平有不止一行的数据,就可以计算出线性模型:
dat = rbind(dat, data.frame(AvgSlPr=2:4, Metro=factor(LETTERS[2:4]), MrktRgn = 3:5))
model2 = lm(AvgSlPr ~ Metro + MrktRgn, data=dat)
summary(model2)
#Call:
#lm(formula = AvgSlPr ~ Metro + MrktRgn, data = dat)
#Residuals:
# 1 2 3 4 5 6 7
# 9.021e-17 2.643e-01 7.304e-03 -1.498e-01 -2.643e-01 -7.304e-03 1.498e-01
#Coefficients:
# Estimate Std. Error t value Pr(>|t|)
#(Intercept) 0.24279 0.30406 0.798 0.50834
#MetroB -0.10207 0.38858 -0.263 0.81739
#MetroC -0.06696 0.39471 -0.170 0.88090
#MetroD 0.06804 0.41243 0.165 0.88413
#MrktRgn 0.70787 0.06747 10.491 0.00896 **
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#Residual standard error: 0.3039 on 2 degrees of freedom
#Multiple R-squared: 0.9857, Adjusted R-squared: 0.9571
#F-statistic: 34.45 on 4 and 2 DF, p-value: 0.02841
用于拟合模型的数据需要重新考虑。分析的目标是什么?实现目标需要哪些数据?