用ggplot的stat_smooth()函数绘制glm决策边界returns错行

Drawing the glm decision boundary with ggplot's stat_smooth() function returns wrong line

我想在将逻辑回归模型拟合到我的数据后绘制决策边界。我使用 ggplot 和 stat_smooth() 函数来定义决策边界线。但是返回的情节是错误的。有关可重现的示例,请参见下文:

#-----------------------------------------------------------------------------------------------------
# CONSTRUCT THE DATA
#-----------------------------------------------------------------------------------------------------

X.1_Y.1 <- rnorm(1000, mean = 1.5, sd= 0.3)

X.2_Y.1 <- rnorm(1000, mean = 1.5, sd= 5)

X.1_Y.0 <- rnorm(99000, mean = 0, sd = 1)

X.2_Y.0 <- rnorm(99000, mean = 0, sd = 1)

data <- data.table(X.1 = c(X.1_Y.1 , X.1_Y.0),
                   X.2 = c(X.2_Y.1  , X.2_Y.0),
                   Y = c(rep(1, 1000) , rep(0, 99000 ))
                   )


#-----------------------------------------------------------------------------------------------------
# FIT A LOGISTIC MODEL ON THE DATA
#-----------------------------------------------------------------------------------------------------


model <- glm(Y ~ X.1 + X.2, data, family = "binomial")

summary(model)

#Call:
#  glm(formula = Y ~ ., family = "binomial", data = data)

#Deviance Residuals: 
#  Min       1Q   Median       3Q      Max  
#-1.6603  -0.1194  -0.0679  -0.0384   4.6263  

#Coefficients:
#  Estimate Std. Error z value Pr(>|z|)    
#(Intercept) -6.04055    0.06636  -91.02   <2e-16 ***
#  X.1          1.60828    0.03854   41.73   <2e-16 ***
#  X.2          0.43272    0.01673   25.87   <2e-16 ***
#  ---
#  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

#(Dispersion parameter for binomial family taken to be 1)

#Null deviance: 11200.3  on 99999  degrees of freedom
#Residual deviance:  8218.5  on 99997  degrees of freedom
#AIC: 8224.5


#-------------------------------------------------------------------------------------------------------
# DEFINE AND DRAW THE DECISION BOUNDARY
#-------------------------------------------------------------------------------------------------------

# 0 = -6.04 + 1.61 * X.1 + 0.44 * X2 => X2 = 6.04/0.44 - 1.61/0.44 * X.1

setDT(data)


ggplot(data, aes(X.1, X.2, color = as.factor(Y))) +
  geom_point(alpha = 0.2) + 
   stat_smooth(formula = x.2 ~ 6.04/0.44 - (1.61/0.44) * X.1, color = "blue", size = 2) +
  coord_equal() +
  theme_economist()

本returns以下剧情:

你很容易看出画的线是错误的。根据公式,当 X.1 = 0 时,X.2 应为 6.04/0.44,这显然不是此图中的情况。

你能告诉我我的代码在哪里出错以及如何改正吗?

我们将不胜感激您的建议。

如果你想在你的图表上画一条适合自己的线,你不应该使用 stat_smooth,你应该使用 stat_function。例如

ggplot(data, aes(X.1, X.2, color = as.factor(Y))) +
  geom_point(alpha = 0.2) + 
  stat_function(fun=function(x) {6.04/0.44 - (1.61/0.44) * x}, color = "blue", size = 2) +
  coord_equal()