使用 R 的逻辑模型的一系列概率的图形表示
Graphical representation of a series of probabilities from logistic model with R
我想在R中的logit模型上制作一系列预测图。该模型如下:
modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)
Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial,
data = datos)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8327 -1.0676 -0.3743 1.0907 1.9014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.275016 0.743781 5.748 9.05e-09 ***
price -0.148547 0.021930 -6.774 1.26e-11 ***
age 0.009497 0.004592 2.068 0.0386 *
poor_prop -0.184504 0.029633 -6.226 4.78e-10 ***
airportYES 0.871132 0.200409 4.347 1.38e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 697.28 on 505 degrees of freedom
Residual deviance: 610.46 on 501 degrees of freedom
AIC: 620.46
Number of Fisher Scoring iterations: 4
我想在散点图中表示变量 Sold 的三个概率系列,基于三个不同的价格值:20、30 和 40。变量 age 和 airport 将具有常数值并且 poor_price 是会变化的变量。在图中,Y 轴代表概率,X 轴代表 poor_price 变量。我所做的如下:
# Let's make the predictions and save them in variables to use them later:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
b = predict(modelo_logit3, newdata = data.frame(price=30, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
c = predict(modelo_logit3, newdata = data.frame(price=40, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
# Now, we create a dataframe with the prediction results for different combinations of
# "price" and "poor_prop":
predicciones <- data.frame(
price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
fitted_values = c(a,b,c),
poor_prop = c(5,25,35,50,65)
)
# Let's see the dataframe:
predicciones
# attach of the dataframe:
attach(predicciones)
# Finally, let's make the plot:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
col = price)) + geom_point() + geom_line() +
scale_color_gradient(low="blue", high="red")
我显示我创建的数据框:
price fitted_values poor_prop
20 8.490973e-01 5
20 1.231930e-01 25
20 2.171980e-02 35
20 1.392686e-03 50
20 8.759648e-05 65
30 5.602225e-01 5
30 3.082831e-02 25
30 5.001293e-03 35
30 3.156376e-04 50
30 1.983277e-05 65
40 2.238433e-01 5
40 7.149899e-03 25
40 1.136666e-03 35
40 7.147629e-05 50
40 4.490112e-06 65
而我得到的剧情如下:
然而,正确的做法是将每条线与其各自的价格连接起来,以获得三个系列的概率,所以我不明白为什么所有的点都连接在一起。如果有人有想法并帮助我,我将不胜感激。
此致!
您可以将 price
转换为一个因数:
ggplot(data = predicciones,
aes(x = poor_prop, y = fitted_values, col = factor(price))) +
geom_point() +
geom_line() +
scale_color_manual(values = c("blue", "purple", "red"),
name = "price")
我想在R中的logit模型上制作一系列预测图。该模型如下:
modelo_logit3 <- glm(formula = Sold ~ price+age+poor_prop+airport, data = datos, family = binomial)
summary(modelo_logit3)
Call:
glm(formula = Sold ~ price + age + poor_prop + airport, family = binomial,
data = datos)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8327 -1.0676 -0.3743 1.0907 1.9014
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.275016 0.743781 5.748 9.05e-09 ***
price -0.148547 0.021930 -6.774 1.26e-11 ***
age 0.009497 0.004592 2.068 0.0386 *
poor_prop -0.184504 0.029633 -6.226 4.78e-10 ***
airportYES 0.871132 0.200409 4.347 1.38e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 697.28 on 505 degrees of freedom
Residual deviance: 610.46 on 501 degrees of freedom
AIC: 620.46
Number of Fisher Scoring iterations: 4
我想在散点图中表示变量 Sold 的三个概率系列,基于三个不同的价格值:20、30 和 40。变量 age 和 airport 将具有常数值并且 poor_price 是会变化的变量。在图中,Y 轴代表概率,X 轴代表 poor_price 变量。我所做的如下:
# Let's make the predictions and save them in variables to use them later:
a = predict(modelo_logit3, newdata = data.frame(price=20, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
b = predict(modelo_logit3, newdata = data.frame(price=30, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
c = predict(modelo_logit3, newdata = data.frame(price=40, age=50,
poor_prop=c(5,25,35,50,65),
airport= 'YES'), type ="response")
# Now, we create a dataframe with the prediction results for different combinations of
# "price" and "poor_prop":
predicciones <- data.frame(
price = c(rep(20, times=5), rep(30, times=5), rep(40, times=5)),
fitted_values = c(a,b,c),
poor_prop = c(5,25,35,50,65)
)
# Let's see the dataframe:
predicciones
# attach of the dataframe:
attach(predicciones)
# Finally, let's make the plot:
ggplot(data = predicciones, aes(x = poor_prop, y = fitted_values,
col = price)) + geom_point() + geom_line() +
scale_color_gradient(low="blue", high="red")
我显示我创建的数据框:
price fitted_values poor_prop
20 8.490973e-01 5
20 1.231930e-01 25
20 2.171980e-02 35
20 1.392686e-03 50
20 8.759648e-05 65
30 5.602225e-01 5
30 3.082831e-02 25
30 5.001293e-03 35
30 3.156376e-04 50
30 1.983277e-05 65
40 2.238433e-01 5
40 7.149899e-03 25
40 1.136666e-03 35
40 7.147629e-05 50
40 4.490112e-06 65
而我得到的剧情如下:
然而,正确的做法是将每条线与其各自的价格连接起来,以获得三个系列的概率,所以我不明白为什么所有的点都连接在一起。如果有人有想法并帮助我,我将不胜感激。
此致!
您可以将 price
转换为一个因数:
ggplot(data = predicciones,
aes(x = poor_prop, y = fitted_values, col = factor(price))) +
geom_point() +
geom_line() +
scale_color_manual(values = c("blue", "purple", "red"),
name = "price")