使用自己的数据集计算 PRESS 统计量会在 R 中产生错误
Calculating PRESS statistic using own data set produces error in R
我试图使用 qpcR
包中的 PRESS()
函数计算 PRESS 统计数据。我首先根据导入的数据创建回归函数:
> job_proficiency_lm_first_order_formula_best = job_proficiency ~ T_1 + T_3 + T_4
> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency_lm_first_order_formula_best)
> summary(job_proficiency_lm_first_order_best_subs)
Call:
lm(formula = job_proficiency_lm_first_order_formula_best, data = Job_Proficiency)
Residuals:
Min 1Q Median 3Q Max
-5.4579 -3.1563 -0.2057 1.8070 6.6083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -124.20002 9.87406 -12.578 3.04e-11 ***
T_1 0.29633 0.04368 6.784 1.04e-06 ***
T_3 1.35697 0.15183 8.937 1.33e-08 ***
T_4 0.51742 0.13105 3.948 0.000735 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.072 on 21 degrees of freedom
Multiple R-squared: 0.9615, Adjusted R-squared: 0.956
F-statistic: 175 on 3 and 21 DF, p-value: 5.16e-15
如您所见,回归函数的计算很顺利。
但是当我尝试计算 PRESS
统计数据时,我得到以下信息:
> PRESS(object = job_proficiency_lm_first_order_best_subs)
.
Error in eval(predvars, data, env) : object 'T_1' not found
为了测试 PRESS()
函数本身是否正常工作,我尝试使用来自 R
的内置数据集获取 PRESS 统计信息,尤其是 swiss
数据集:
> test = lm(data = swiss, formula = Fertility ~ Agriculture + Examination)
> PRESS(test)
.........10.........20.........30.........40.......
$stat
[1] 4594.711
$residuals
[1] 5.86874937 -0.11299684 8.99475044 9.63703923 6.86207418 -4.99681787 15.67581939 21.66065932 7.37038439 11.95400827 15.75323917 0.44045951 -4.80167644
[14] 2.81771330 -0.11677715 2.18088788 0.62738886 -6.43338393 -2.03263398 0.06287026 2.99119927 -7.88458225 -7.23342328 -8.51283184 -1.12064764 1.82564228
[27] -10.11322228 -9.54214928 -4.12165698 -6.78996076 -8.18443581 -9.65615193 -3.18410523 -2.56286583 -0.78611489 -12.32904436 10.00836421 6.33398831 11.08423270
[40] 7.20518930 6.42985483 15.41461736 4.64693055 4.94386095 -18.45443801 -27.04073067 -23.95733041
$P.square
[1] 0.3598858
可以看出没有问题。所以这一定是幕后发生的事情。所以我来这里是想询问我可能遇到的问题是什么?
参考这里是我导入的数据集它不是太大希望它不违反任何规则:
> dput(Job_Proficiency)
structure(list(job_proficiency = c(88, 80, 96, 76, 80, 73, 58,
116, 104, 99, 64, 126, 94, 71, 111, 109, 100, 127, 99, 82, 67,
109, 78, 115, 83), T_1 = c(86, 62, 110, 101, 100, 78, 120, 105,
112, 120, 87, 133, 140, 84, 106, 109, 104, 150, 98, 120, 74,
96, 104, 94, 91), T_2 = c(110, 97, 107, 117, 101, 85, 77, 122,
119, 89, 81, 120, 121, 113, 102, 129, 83, 118, 125, 94, 121,
114, 73, 121, 129), T_3 = c(100, 99, 103, 93, 95, 95, 80, 116,
106, 105, 90, 113, 96, 98, 109, 102, 100, 107, 108, 95, 91, 114,
93, 115, 97), T_4 = c(87, 100, 103, 95, 88, 84, 74, 102, 105,
97, 88, 108, 89, 78, 109, 108, 102, 110, 95, 90, 85, 103, 80,
104, 83)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L), spec = structure(list(cols = list(
job_proficiency = structure(list(), class = c("collector_double",
"collector")), T_1 = structure(list(), class = c("collector_double",
"collector")), T_2 = structure(list(), class = c("collector_double",
"collector")), T_3 = structure(list(), class = c("collector_double",
"collector")), T_4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec"))
编辑:由于@Otto,第一个错误已得到纠正,但现在我遇到了另一个错误:
> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
> PRESS(job_proficiency_lm_first_order_best_subs)
.........10.........20.....
Error in PRESS.res^2 : non-numeric argument to binary operator
我所做的只是手动将我的公式输入到回归模型中。
出于某种原因,PRESS()
似乎希望公式以字符串形式给出。这有效:
library('qpcR')
job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
PRESS(job_proficiency_lm_first_order_best_subs)
.........10
$stat
[1] 56.11556
$residuals
[1] 4.24693620 -0.02950692 -0.24941392 -1.68812204 0.37184702 -3.35442911
[7] 1.86363303 -1.48719175 3.34459605 -2.62766088
$P.square
[1] 0.9785162
关于您的第二个错误“Error in PRESS.res^2 : non-numeric argument to binary operator
”,我怀疑这是因为您的 Job_Proficiency 是一个 tibble,而不是 data.frame。两种数据表示方式几乎一样,except when they are not.
也许解决第二个错误的最简单方法是通过
将您的输入数据从 tibble 转换为 data.frame
Job_Proficiency <- as.data.frame(Job_Proficiency)
然后继续你的分析。
就我而言,我们发现的两个问题(公式无法预分配,tibbles 导致错误)都是明显的错误,应该报告给包开发人员。
我试图使用 qpcR
包中的 PRESS()
函数计算 PRESS 统计数据。我首先根据导入的数据创建回归函数:
> job_proficiency_lm_first_order_formula_best = job_proficiency ~ T_1 + T_3 + T_4
> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency_lm_first_order_formula_best)
> summary(job_proficiency_lm_first_order_best_subs)
Call:
lm(formula = job_proficiency_lm_first_order_formula_best, data = Job_Proficiency)
Residuals:
Min 1Q Median 3Q Max
-5.4579 -3.1563 -0.2057 1.8070 6.6083
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -124.20002 9.87406 -12.578 3.04e-11 ***
T_1 0.29633 0.04368 6.784 1.04e-06 ***
T_3 1.35697 0.15183 8.937 1.33e-08 ***
T_4 0.51742 0.13105 3.948 0.000735 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.072 on 21 degrees of freedom
Multiple R-squared: 0.9615, Adjusted R-squared: 0.956
F-statistic: 175 on 3 and 21 DF, p-value: 5.16e-15
如您所见,回归函数的计算很顺利。
但是当我尝试计算 PRESS
统计数据时,我得到以下信息:
> PRESS(object = job_proficiency_lm_first_order_best_subs)
.
Error in eval(predvars, data, env) : object 'T_1' not found
为了测试 PRESS()
函数本身是否正常工作,我尝试使用来自 R
的内置数据集获取 PRESS 统计信息,尤其是 swiss
数据集:
> test = lm(data = swiss, formula = Fertility ~ Agriculture + Examination)
> PRESS(test)
.........10.........20.........30.........40.......
$stat
[1] 4594.711
$residuals
[1] 5.86874937 -0.11299684 8.99475044 9.63703923 6.86207418 -4.99681787 15.67581939 21.66065932 7.37038439 11.95400827 15.75323917 0.44045951 -4.80167644
[14] 2.81771330 -0.11677715 2.18088788 0.62738886 -6.43338393 -2.03263398 0.06287026 2.99119927 -7.88458225 -7.23342328 -8.51283184 -1.12064764 1.82564228
[27] -10.11322228 -9.54214928 -4.12165698 -6.78996076 -8.18443581 -9.65615193 -3.18410523 -2.56286583 -0.78611489 -12.32904436 10.00836421 6.33398831 11.08423270
[40] 7.20518930 6.42985483 15.41461736 4.64693055 4.94386095 -18.45443801 -27.04073067 -23.95733041
$P.square
[1] 0.3598858
可以看出没有问题。所以这一定是幕后发生的事情。所以我来这里是想询问我可能遇到的问题是什么?
参考这里是我导入的数据集它不是太大希望它不违反任何规则:
> dput(Job_Proficiency)
structure(list(job_proficiency = c(88, 80, 96, 76, 80, 73, 58,
116, 104, 99, 64, 126, 94, 71, 111, 109, 100, 127, 99, 82, 67,
109, 78, 115, 83), T_1 = c(86, 62, 110, 101, 100, 78, 120, 105,
112, 120, 87, 133, 140, 84, 106, 109, 104, 150, 98, 120, 74,
96, 104, 94, 91), T_2 = c(110, 97, 107, 117, 101, 85, 77, 122,
119, 89, 81, 120, 121, 113, 102, 129, 83, 118, 125, 94, 121,
114, 73, 121, 129), T_3 = c(100, 99, 103, 93, 95, 95, 80, 116,
106, 105, 90, 113, 96, 98, 109, 102, 100, 107, 108, 95, 91, 114,
93, 115, 97), T_4 = c(87, 100, 103, 95, 88, 84, 74, 102, 105,
97, 88, 108, 89, 78, 109, 108, 102, 110, 95, 90, 85, 103, 80,
104, 83)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L), spec = structure(list(cols = list(
job_proficiency = structure(list(), class = c("collector_double",
"collector")), T_1 = structure(list(), class = c("collector_double",
"collector")), T_2 = structure(list(), class = c("collector_double",
"collector")), T_3 = structure(list(), class = c("collector_double",
"collector")), T_4 = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 0), class = "col_spec"))
编辑:由于@Otto,第一个错误已得到纠正,但现在我遇到了另一个错误:
> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
> PRESS(job_proficiency_lm_first_order_best_subs)
.........10.........20.....
Error in PRESS.res^2 : non-numeric argument to binary operator
我所做的只是手动将我的公式输入到回归模型中。
出于某种原因,PRESS()
似乎希望公式以字符串形式给出。这有效:
library('qpcR')
job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
PRESS(job_proficiency_lm_first_order_best_subs)
.........10
$stat
[1] 56.11556
$residuals
[1] 4.24693620 -0.02950692 -0.24941392 -1.68812204 0.37184702 -3.35442911
[7] 1.86363303 -1.48719175 3.34459605 -2.62766088
$P.square
[1] 0.9785162
关于您的第二个错误“Error in PRESS.res^2 : non-numeric argument to binary operator
”,我怀疑这是因为您的 Job_Proficiency 是一个 tibble,而不是 data.frame。两种数据表示方式几乎一样,except when they are not.
也许解决第二个错误的最简单方法是通过
将您的输入数据从 tibble 转换为 data.frameJob_Proficiency <- as.data.frame(Job_Proficiency)
然后继续你的分析。
就我而言,我们发现的两个问题(公式无法预分配,tibbles 导致错误)都是明显的错误,应该报告给包开发人员。