使用自己的数据集计算 PRESS 统计量会在 R 中产生错误

Calculating PRESS statistic using own data set produces error in R

我试图使用 qpcR 包中的 PRESS() 函数计算 PRESS 统计数据。我首先根据导入的数据创建回归函数:

> job_proficiency_lm_first_order_formula_best = job_proficiency ~ T_1 + T_3 + T_4
> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency_lm_first_order_formula_best)
> summary(job_proficiency_lm_first_order_best_subs)

Call:
lm(formula = job_proficiency_lm_first_order_formula_best, data = Job_Proficiency)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4579 -3.1563 -0.2057  1.8070  6.6083 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -124.20002    9.87406 -12.578 3.04e-11 ***
T_1            0.29633    0.04368   6.784 1.04e-06 ***
T_3            1.35697    0.15183   8.937 1.33e-08 ***
T_4            0.51742    0.13105   3.948 0.000735 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.072 on 21 degrees of freedom
Multiple R-squared:  0.9615,    Adjusted R-squared:  0.956 
F-statistic:   175 on 3 and 21 DF,  p-value: 5.16e-15

如您所见,回归函数的计算很顺利。

但是当我尝试计算 PRESS 统计数据时,我得到以下信息:

> PRESS(object = job_proficiency_lm_first_order_best_subs)
.
Error in eval(predvars, data, env) : object 'T_1' not found

为了测试 PRESS() 函数本身是否正常工作,我尝试使用来自 R 的内置数据集获取 PRESS 统计信息,尤其是 swiss 数据集:

> test = lm(data = swiss, formula = Fertility ~ Agriculture + Examination)
> PRESS(test)
.........10.........20.........30.........40.......
$stat
[1] 4594.711

$residuals
 [1]   5.86874937  -0.11299684   8.99475044   9.63703923   6.86207418  -4.99681787  15.67581939  21.66065932   7.37038439  11.95400827  15.75323917   0.44045951  -4.80167644
[14]   2.81771330  -0.11677715   2.18088788   0.62738886  -6.43338393  -2.03263398   0.06287026   2.99119927  -7.88458225  -7.23342328  -8.51283184  -1.12064764   1.82564228
[27] -10.11322228  -9.54214928  -4.12165698  -6.78996076  -8.18443581  -9.65615193  -3.18410523  -2.56286583  -0.78611489 -12.32904436  10.00836421   6.33398831  11.08423270
[40]   7.20518930   6.42985483  15.41461736   4.64693055   4.94386095 -18.45443801 -27.04073067 -23.95733041

$P.square
[1] 0.3598858

可以看出没有问题。所以这一定是幕后发生的事情。所以我来这里是想询问我可能遇到的问题是什么?

参考这里是我导入的数据集它不是太大希望它不违反任何规则:

> dput(Job_Proficiency)
structure(list(job_proficiency = c(88, 80, 96, 76, 80, 73, 58, 
116, 104, 99, 64, 126, 94, 71, 111, 109, 100, 127, 99, 82, 67, 
109, 78, 115, 83), T_1 = c(86, 62, 110, 101, 100, 78, 120, 105, 
112, 120, 87, 133, 140, 84, 106, 109, 104, 150, 98, 120, 74, 
96, 104, 94, 91), T_2 = c(110, 97, 107, 117, 101, 85, 77, 122, 
119, 89, 81, 120, 121, 113, 102, 129, 83, 118, 125, 94, 121, 
114, 73, 121, 129), T_3 = c(100, 99, 103, 93, 95, 95, 80, 116, 
106, 105, 90, 113, 96, 98, 109, 102, 100, 107, 108, 95, 91, 114, 
93, 115, 97), T_4 = c(87, 100, 103, 95, 88, 84, 74, 102, 105, 
97, 88, 108, 89, 78, 109, 108, 102, 110, 95, 90, 85, 103, 80, 
104, 83)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -25L), spec = structure(list(cols = list(
    job_proficiency = structure(list(), class = c("collector_double", 
    "collector")), T_1 = structure(list(), class = c("collector_double", 
    "collector")), T_2 = structure(list(), class = c("collector_double", 
    "collector")), T_3 = structure(list(), class = c("collector_double", 
    "collector")), T_4 = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 0), class = "col_spec"))

编辑:由于@Otto,第一个错误已得到纠正,但现在我遇到了另一个错误:

> job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
> PRESS(job_proficiency_lm_first_order_best_subs)
.........10.........20.....
Error in PRESS.res^2 : non-numeric argument to binary operator

我所做的只是手动将我的公式输入到回归模型中。

出于某种原因,PRESS() 似乎希望公式以字符串形式给出。这有效:

library('qpcR')
job_proficiency_lm_first_order_best_subs = lm(data = Job_Proficiency, formula = job_proficiency ~ T_1 + T_3 + T_4)
PRESS(job_proficiency_lm_first_order_best_subs)
.........10
$stat
[1] 56.11556

$residuals
 [1]  4.24693620 -0.02950692 -0.24941392 -1.68812204  0.37184702 -3.35442911
 [7]  1.86363303 -1.48719175  3.34459605 -2.62766088

$P.square
[1] 0.9785162

关于您的第二个错误“Error in PRESS.res^2 : non-numeric argument to binary operator”,我怀疑这是因为您的 Job_Proficiency 是一个 tibble,而不是 data.frame。两种数据表示方式几乎一样,except when they are not.

也许解决第二个错误的最简单方法是通过

将您的输入数据从 tibble 转换为 data.frame
Job_Proficiency <- as.data.frame(Job_Proficiency) 

然后继续你的分析。

就我而言,我们发现的两个问题(公式无法预分配,tibbles 导致错误)都是明显的错误,应该报告给包开发人员。