SAS 上的 4 向方差分析。由于错误（df），我的代码没有正确显示 F 值和 P 值，我哪里出错了？

Question

             Alloy 97-1-1-1                   Alloy AuCa        
Dentist Method  1500°F  1600°F  1700°F      1500°F  1600°F  1700°F

1        1      813      792     792          907    792     835
         2      782      698     665          1115   835     870
         3      752      620     835          847    560     585

2        1       715      803     813         858    907    882
         2       772      782    743          933    792    824
         3       835      715    673          698    734    681

3        1       743      627    752          858    762    724
         2       813      743    613          824    847    782
         3       743      681    743          715    824    681

4        1       792      743    762          894    792    649
         2       690      882    772          813    870    858
         3       493      707    289          715    813    312

5        1       707      698    715          772    1048   870
         2       803      665    752          824     933   835
         3       421      483    405          536     405   312

这是我对上述数据的 sas 代码：

data gold;

do dentist=1, 2, 3, 4, 5;
    do method=1, 2, 3;
        do alloy= 1,2;
           do temp=1500, 1600, 1700; 
              input y @@; output; 
           end;
          end;
        end;
    end;

cards;
813 792 792     907 792 835
782 698 665     1115 835 870
752 620 835     847 560 585

715 803 813     858 907 882
772 782 743     933 792 824
835 715 673     698 734 681

743 627 752     858 762 724
813 743 613     824 847 782
743 681 743     715 824 681

792 743 762     894 792 649
690 882 772     813 870 858
493 707 289     715 813 312

707 698 715     772 1048 870
803 665 752     824 933 835
421 483 405     536 405 312
;
ODS graphics on;
proc GLM data=gold;
class dentist method alloy temp;
model y=dentist|method|alloy|temp;
run; quit;

我哪里错了？

这是输出的一部分：

The GLM Procedure
Dependent Variable: y 

    Source DF Sum of Squares Mean Square F Value Pr > F 
    Model  89 1891095.556     21248.265   .       . 
    Error  0  0.000             .     
    Total 89  1891095.556       

    R-Square Coeff Var Root MSE y Mean 
    1.000000  .        .        741.7778

错误应该是 Residuals 75772.0 16 4735.7

residuals/error 不应该是 0，因为整个代码都是错误的。 :(

我还需要知道如何为上述代码创建交互 plot/graph。对我的代码的任何帮助将不胜感激。

Answer 1

这是过拟合的经典例子。

您只有 90 个测量值，因此模型具有 89 个自由度 (DF)。为了适应这些，您正在使用

1 拦截
为不同的牙医加上 5 个因素，但有一个限制：它们总和必须为 0，即 4 DF
为方法加上 3 个因素，再次减去一个约束，即 2 DF
加15个牙医和方法组合的因素，必须满足以下8个约束条件。由于这些约束不是完全独立的，这会将 DF 减少为 7，即您允许 GLM 到 8 DF
- 对于每位牙医，所有方法的因数总和必须为 0，并且
- 对于每种方法，所有牙医的系数总和必须为 0

等等。

简而言之，您允许 GLM 过程选择 1 个截距加上 89 个其他 DF 以仅适合 90 个值。 GLM 可以生成完全适合您的数据的模型。难怪模型没有错误！

为了更好地理解它：

引入与真实测量值略有不同的假测量值，例如这样

data gold;
do dentist=1, 2, 3, 4, 5;
    do method=1, 2, 3;
        do alloy= 1,2;
            do temp=1500, 1600, 1700; 
                input y @@; 
                output; 
                Y +.1 * rand('NORMAL', 0, 500);
                output; 
            end;
        end;
    end;
end;
cards;

现在你的输出可能看起来像

Source          DF      Sum of Squar    Mean Square     F Value    Pr > F      
Model           89      19556981.91     219741.37       1.45       0.0403      
Error           90      13643754.57     151597.27                                   
Corrected To    179     33200736.48                                                 
                                                                                    
R-Square        Coeff   Root MSE        y Mean                                      
0.589053        51.89   389.3549        750.2041

（不完全是，因为我引入了一些随机性） 事实上，你仍然给 GLM 一个截距和 89 个因子 (DF) 来选择，但是你要求它拟合 180 个值（1 个截距和 179 个 DF）

你应该做什么

_（除非你要求牙医多做90次测量）就是选择更简单的型号。我想你对评估牙医不感兴趣，而只对技术感兴趣，即方法、alloys 和温度，所以写

proc GLM data=gold;
    class dentist method alloy temp;
    model y=method|alloy|temp; ** <- nothing about dentists here **;
run; quit;

结果将是：

Dependent Variable: y                   
Source              DF      Sum of Squar    Mean Square     F Value     Pr > F
Model               17      905055.156      53238.539       3.89        <.0001
Error               72      986040.4        13695.006              
Corrected Total     89      1891095.556                            
                                                                                            
R-Square            Coeff Var   Root MSE        y Mean 
0.478588            15.77638    117.0257        741.7778

这告诉你更简单的模型更多关于你的数字 （均方 53238.539） 比 'error' 它没有解释 _（均方 13695.006） （可能性小于 0.01%） 这是偶然的可能性极小。

输出的最后一部分

Source              DF      Type III SS     Mean Square     F Value         Pr > F      
method              2       593427.4889     296713.7444     21.67           <.0001      
alloy               1       105815.5111     105815.5111     7.73            0.0069      
method*alloy        2       54685.0889      27342.5444      2               0.1433      
temp                2       82178.0222      41089.0111      3               0.056       
method*temp         4       30652.4444      7663.1111       0.56            0.6927      
alloy*temp          2       21725.3556      10862.6778      0.79            0.4563      
method*alloy*temp   4       16571.2444      4142.8111       0.3             0.8754

告诉你

该方法极有可能产生差异（小于 0.01% 的可能性是偶然的高均方值）
我们有统计上显着的迹象 alloy 有所不同 （0.69% 的高均方值可能是偶然的）
有一些迹象表明温度会产生影响（5.6% 的高均方值可能是偶然的），你最好在发布之前收集更多数据
方法和 alloy 之间可能存在相互作用，但需要更多的数据来研究它

这就是我从你的实验中得出的结论。

SAS 上的 4 向方差分析。由于错误（df），我的代码没有正确显示 F 值和 P 值，我哪里出错了？

4 way ANOVA on SAS. My code is not displaying the F values and the P values correctly because of error(df), where am I going wrong?

sas

anova

factor-analysis

这是过拟合的经典例子。

为了更好地理解它：

你应该做什么