用于拟合连续（正支持）分布的 Proc 单变量和 Proc 严重性之间的差异

Question

我的目标是将数据拟合到任何具有正支持的分布。（威布尔 (2p)、伽马 (2p)、帕累托 (2p)、对数正态 (2p)、指数 (1P)）。第一次尝试，我使用 proc univariate.This 是我的代码

proc univariate data=fit plot outtable=table;
   var week1;
   histogram / exp gamma lognormal weibull pareto;
   inset n mean(5.3) std='Standar Deviasi'(5.3) 
          / pos = ne  header = 'Summary Statistics';
   axis1 label=(a=90 r=0);
   run;

我注意到的第一件事是，没有显示 weibull 的 kolmogorov 统计数据 distribution.Then 我改用了 proc 严重性。

proc severity data=fit print=all plots(histogram kernel)=all;
loss week1; 
dist exp pareto gamma logn weibull;
run;

现在，我得到了威布尔分布的 KS 统计量。然后我比较了由 proc 严重性和 proc 单变量产生的 KS 统计数据。他们是不同的。为什么？我应该使用哪一个？

Answer 1

我无权访问 SAS/ETS，因此无法用 proc severity 确认这一点，但我想您看到的差异归结为分布参数的拟合方式。

使用您的 proc univriate 代码，您不需要对多个参数进行估算（某些情况下某些参数默认设置为 1 或 0，请参阅用户指南中的 sigma and theta）。例如：

data have;
    do i = 1 to 1000;
        x = rand("weibull", 5, 5);
        output;
    end;
run;
ods graphics on;
proc univariate data = have;
    var x;
    /* Request maximum liklihood estimate of scale and threshold parameters */
    histogram / weibull(theta = EST sigma = EST);
    /* Request maximum liklihood estimate of scale parameter and 0 as threshold */
    histogram / weibull;
run;

您会注意到，当请求估计 theta 时，SAS 也会生成 KS 统计量，这是由于 SAS 估计需要知道分布参数的拟合统计量的方式（完整解释 here）。

我的猜测是，您看到两个程序之间的拟合统计数据不同，因为它们返回的拟合略有不同，或者它们使用不同的计算来估计拟合统计数据。如果您有兴趣，可以在用户指南 (proc severity and proc univariate) 中研究他们如何执行参数估计。如果您想进一步调查，您可以强制分布参数在两个过程中匹配，然后比较拟合统计量以查看它们的差异程度。

我建议，如果可能，您只使用其中一种程序，并且 select 就输出而言，您使用最适合您需求的程序。

用于拟合连续（正支持）分布的 Proc 单变量和 Proc 严重性之间的差异

Difference between Proc univarite and Proc severity for fitting continuous (positive support) distribution

sas

model-fitting

goodness-of-fit