使用 R 的逻辑模型的一系列概率的图形表示

Graphical representation of a series of probabilities from logistic model with R

我想在R中的logit模型上制作一系列预测图。该模型如下:

modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)

Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial, 
    data = datos)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8327  -1.0676  -0.3743   1.0907   1.9014  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.275016   0.743781   5.748 9.05e-09 ***
price       -0.148547   0.021930  -6.774 1.26e-11 ***
age          0.009497   0.004592   2.068   0.0386 *  
poor_prop   -0.184504   0.029633  -6.226 4.78e-10 ***
airportYES   0.871132   0.200409   4.347 1.38e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 697.28  on 505  degrees of freedom
Residual deviance: 610.46  on 501  degrees of freedom
AIC: 620.46

Number of Fisher Scoring iterations: 4

我想在散点图中表示变量 Sold 的三个概率系列,基于三个不同的价格值:20、30 和 40。变量 age 和 airport 将具有常数值并且 poor_price 是会变化的变量。在图中,Y 轴代表概率,X 轴代表 poor_price 变量。我所做的如下:

# Let's make the predictions and save them in variables to use them later:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")

b = predict(modelo_logit3, newdata = data.frame(price=30, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")

c = predict(modelo_logit3, newdata = data.frame(price=40, age=50, 
                                            poor_prop=c(5,25,35,50,65), 
                                            airport= 'YES'), type ="response")



# Now, we create a dataframe with the prediction results for different combinations of
# "price" and "poor_prop":

predicciones <- data.frame(
        price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
        
        fitted_values = c(a,b,c),
        
        poor_prop = c(5,25,35,50,65)
        
)

# Let's see the dataframe:
predicciones

# attach of the dataframe:
attach(predicciones)

# Finally, let's make the plot:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
                                col = price)) + geom_point() + geom_line() + 
  scale_color_gradient(low="blue", high="red")

我显示我创建的数据框:

price fitted_values poor_prop
20  8.490973e-01    5       
20  1.231930e-01    25      
20  2.171980e-02    35      
20  1.392686e-03    50      
20  8.759648e-05    65      
30  5.602225e-01    5       
30  3.082831e-02    25      
30  5.001293e-03    35      
30  3.156376e-04    50      
30  1.983277e-05    65
40  2.238433e-01    5       
40  7.149899e-03    25      
40  1.136666e-03    35      
40  7.147629e-05    50      
40  4.490112e-06    65  

而我得到的剧情如下:

然而,正确的做法是将每条线与其各自的价格连接起来,以获得三个系列的概率,所以我不明白为什么所有的点都连接在一起。如果有人有想法并帮助我,我将不胜感激。

此致!

您可以将 price 转换为一个因数:

ggplot(data = predicciones, 
       aes(x = poor_prop, y = fitted_values, col = factor(price))) + 
  geom_point() + 
  geom_line() + 
  scale_color_manual(values = c("blue", "purple", "red"),
                     name = "price")