不同的 objects 没有出现在我的 ggplot2 上

Different objects are not showing up on my ggplot2

我正在研究 returns 边缘学生的大学录取,我正在尝试制作以下数据的 ggplot2,即完成或未完成硕士学位的学生的平均工资在医学和平均 'GPA'(外国当量)到 'acceptance score' 的距离:

SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
              372.682,388.939,386.994) 
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
                        "0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")

我必须进行回归不连续性设计 (RDD),因此要进行回归 - 据我所知 - 我必须将 DistanceGrades 重写为数字,所以我刚刚创建了一个变量 z

z <- -5:4

其中 0 是截止值(即 0 等于 DistanceGrades 中的“0.0”)。 然后我制作一个数据框

df <- data.frame(z,SalaryAfter)

现在我创建情节的尝试变得有点混乱(我使用包 'fpp3',但我想它只是 ggplot2 和可能的 dyplr 包)

df %>% 
  select(z, SalaryAfter) %>% 
  mutate(D = as.factor(ifelse(z >= -0.1, 1, 0))) %>% 
  ggplot(aes(x = z, y = SalaryAfter, color = D)) +
  geom_point(stat = "identity") + 
  geom_smooth(method = "lm") +
  geom_vline(xintercept = 0) + 
  theme(panel.grid = element_line(color = "white",
                                  size = 0.75,
                                  linetype = 1)) +
  xlim(-6,5) +
  xlab("Distance to acceptance score") +
  labs(title = "Figur 1.1", subtitle = "Salary for every distance to the acceptance score")

哪些地块:

我想做的是首先,如果 z>0,则用虚拟变量 D=1 分割数据,如果 z<0,则用 D=0。然后我用线性回归和 z=0 处的垂直线绘制它。最后我写标题和副标题。现在我有两个问题:

  1. x 轴显示 -5,-2.5,...但我希望它显示所有整数,有理数与离散的 z 变量无关。我尝试用几种不同的方法来解决这个问题,但是 none 其中的方法有效,我不记得我尝试过的所有方法 (theme(panel.grid...),scale_x_discrete 等等),但结果都非常相似。它们都会导致 x-axis 被完全删除,以至于没有数字,有时甚至会删除轴标题。
  2. 我希望数据第一部分的回归通道扩展到 z=0

当我尝试解决这两个问题时,我再次得到类似的结果,当我 运行 代码时,我尝试的大多数事情都没有产生错误消息,但它们对我的情节没有任何影响或者他们删除了一些让我产生疑问的现有元素。我想这个错误是由某些元素不能一起工作引起的,但我不知道。

试试这个:

library(tidyverse)

SalaryAfter <- c(287.780,305.181,323.468,339.082,344.738,370.475,373.257,
                 372.682,388.939,386.994) 
DistanceGrades <- c("<=-1.0","[-0.9,-0.5]","[-0.4,-0.3]","-0,2","-0.1",
                    "0.0","0.1","[0.2,0.3]","[0.4,0.5]",">=0.5")

z <- -5:4
df <- data.frame(z,SalaryAfter) %>%
  select(z, SalaryAfter) %>% 
  mutate(D = as.factor(ifelse(z >= -0.1, 1, 0)))

# Fit a lm model for the left part of the panel
fit_data <- lm(SalaryAfter~z, data = filter(df, z <= -0.1)) %>%
  predict(., newdata = data.frame(z = seq(-5, 0, 0.1)), interval = "confidence") %>%
  as.data.frame() %>%
  mutate(z = seq(-5, 0, 0.1), D = factor(0, levels = c(0, 1)))

# Plot
ggplot(mapping = aes(color = D)) +
  geom_ribbon(data = filter(fit_data, z <= 0 & -1 <= z), 
              aes(x = z, ymin = lwr, ymax = upr), 
              fill = "grey70", color = "transparent", alpha = 0.5) +
  geom_line(data = fit_data, aes(x = z, y = fit), size = 1) + 
  geom_point(data = df, aes(x = z, y = SalaryAfter), stat = "identity") + 
  geom_smooth(data = df, aes(x = z, y = SalaryAfter), method = "lm") +
  geom_vline(xintercept = 0) + 
  theme(panel.grid = element_line(color = "white",
                                  size = 0.75,
                                  linetype = 1)) +
  scale_x_continuous(limits = c(-6, 5), breaks = -6:5) +
  xlab("Distance to acceptance score") +
  labs(title = "Figure 1.1", subtitle = "Salary for every distance to the acceptance score")