Rglm.fit不return概率?

R glm.fit does not return probability?

首先 post 在这里,R 的新手。所以如果我没有得到这个 post 正确的:)。

我正在尝试使用 glm() 来拟合模型,然后在模型上使用预测。

  fit_GLM <- glm(y ~., data = traintemp, family = "binomial")
  pred_GLM <- predict(fit_GLM, newdata = testtemp)

我的训练数据包含大约 430000 个观察值,有 6 个预测变量和一个二元结果。我尝试用 0-1 或 False-True 改变结果。

我的测试数据包含大约 215000 个观察值。

我可以成功运行模型,但是predict函数返回的数据有点奇怪。 (对我来说)我期待一个概率,但是函数 returns:

         Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
    -0.0433000 -0.0006504  0.0004760  0.0103800  0.0024810  1.0020000 

我是不是遗漏了什么明显的东西?

此外,如果我改为 运行 lm(),结果非常相似,但是 运行 速度太快了,这是怎么回事?

编辑:我的数据示例:

TripType VisitNumber Weekday         Upc ScanCount DepartmentDescription FinelineNumber
1        0           7  Friday 60538815980         1                 SHOES           8931
2        0           7  Friday  7410811099         1         PERSONAL CARE           4504
3        0           8  Friday  2006613744         2 PAINT AND ACCESSORIES           1017
4        0           8  Friday  2006618783         2 PAINT AND ACCESSORIES           1017
5        0           8  Friday  7004802737         1 PAINT AND ACCESSORIES           2802
6        0           8  Friday  2238495318         1 PAINT AND ACCESSORIES           4501

谢谢你,感恩节快乐!

编辑 23 列火车:

TripType Weekday         Upc ScanCount    DepartmentDescription FinelineNumber
1         0  Friday 60538815980         1                    SHOES           8931
2         0  Friday  7410811099         1            PERSONAL CARE           4504
3         0  Friday  2006613744         2    PAINT AND ACCESSORIES           1017
4         0  Friday  2006618783         2    PAINT AND ACCESSORIES           1017
5         0  Friday  7004802737         1    PAINT AND ACCESSORIES           2802
6         0  Friday  2238495318         1    PAINT AND ACCESSORIES           4501
7         0  Friday  5200010239         1              DSD GROCERY           4606
8         0  Friday 88679300501         2    PAINT AND ACCESSORIES           3504
9         0  Friday  2238400200         2    PAINT AND ACCESSORIES           3565
10        0  Friday 72450408840         1    PAINT AND ACCESSORIES           1028
11        0  Friday 25541500000         2                    DAIRY           1305
12        0  Friday 72450403700         2    PAINT AND ACCESSORIES           1018
13        0  Friday  7874204967         1 HOUSEHOLD CHEMICALS/SUPP            707
14        0  Friday  3270011053         3        PETS AND SUPPLIES           1001
15        0  Friday  1070080727         1      IMPULSE MERCHANDISE            115
16        0  Friday        3107         1                  PRODUCE            103
17        0  Friday        4011         1                  PRODUCE           5501
18        0  Friday  6414410235         1              DSD GROCERY           2008
19        0  Friday  4178900743         1        GROCERY DRY GOODS           3114
20        0  Friday  7800002374         1              DSD GROCERY           3467

测试:

   TripType Weekday         Upc ScanCount    DepartmentDescription FinelineNumber
1         0  Friday 68113152929        -1       FINANCIAL SERVICES           1000
2         0  Friday  2238403510         2    PAINT AND ACCESSORIES           3565
3         0  Friday  2006613743         1    PAINT AND ACCESSORIES           1017
4         0  Friday  2238400200        -1    PAINT AND ACCESSORIES           3565
5         0  Friday 22006000000         1    MEAT - FRESH & FROZEN           6009
6         0  Friday  2236760452         1    PAINT AND ACCESSORIES              7
7         0  Friday 88679300501        -1    PAINT AND ACCESSORIES           3504
8         0  Friday  3019294203         1    PAINT AND ACCESSORIES           2801
9         0  Friday  2310010776         1        PETS AND SUPPLIES           3300
10        0  Friday  5114139038         1    PAINT AND ACCESSORIES           4415
11        0  Friday  5114197561         1    PAINT AND ACCESSORIES           4415
12        0  Friday  2800053970         1  CANDY, TOBACCO, COOKIES            115
13        0  Friday  7794800902         1              DSD GROCERY           7950
14        0  Friday  7920018317         1      IMPULSE MERCHANDISE            110
15        0  Friday  3500076633         1            PERSONAL CARE            203
16        0  Friday  5460010568         1 HOUSEHOLD CHEMICALS/SUPP             52
17        0  Friday  2899521479         1       FABRICS AND CRAFTS           1059
18        0  Friday  2899521979         1       FABRICS AND CRAFTS           1062
19        0  Friday  1200004300         1              DSD GROCERY           9501
20        0  Friday 88743955560         1                MENS WEAR            144

来自?predict.glm

所需的预测类型。默认值在线性预测变量的范围内;备选方案 "response" 在响应变量的范围内。因此,对于默认的二项式模型,默认预测是对数赔率(logit 尺度上的概率)并且 type = "response" 给出预测概率 。 "terms" 选项 returns 一个矩阵,给出模型公式中每一项在线性预测尺度上的拟合值。

所以在你的情况下:

pred_GLM <- predict(fit_GLM, newdata = testtemp, type = "response")