根据生存数据绘制剂量反应曲线

Question

我想根据 library(drc) 制作剂量反应曲线，并且一直在思考如何正确准备我的数据集以制作绘图。特别是，我正在努力如何准备好我的 y 轴。

我制作了一个数据框 (df) 来帮助阐明我想做什么。

df <- read.table("https://pastebin.com/raw/TZdjp2JX", header=T)

为今天的练习打开必要的库

library(drc)

library(ggplot2)

假设我喜欢蜂鸟，用不同浓度的糖做一个实验，看看哪种浓度最适合蜂鸟。因此，我运行在封闭环境（这里是“房间”栏）中进行了一项实验，有 4 种不同的糖浓度（浓度栏），每个浓度有 10 只鸟。我还运行每个实验平行重复 4 次，这就是为什么有 4 个“房间”。 36 小时后（“时间”列），我进入房间，检查有多少只鸟存活下来，创建一个“yes/no”变量，或 1 & 0（这里，这是我的“状态”列），其中1==生存，0==死亡。

对于这个数据集，我特意让它在浓度 0 下存活最多，50% 在浓度 1 下存活，25% 在浓度 2 下存活，只有 10% 在浓度 3 下存活。

我运行遇到的第一个问题是：如何将从“状态”列生成的 y 轴转换为百分比？我在做 kaplan-meier 生存曲线时已经这样做了，但不幸的是，这在这里不起作用。显然，这个应该列应该从 0% 到 100%（我们可以称该列为“死亡率”）。成功后，我想做一个剂量反应曲线，如下所示（我在网上找到这个例子，直接复制到这里用做例子。它来自R中包含的黑麦草数据集）

ryegrass.LL.4 <- drm(rootl ~ conc, data = ryegrass, fct = LL.3())

我必须承认，接下来的代码步骤让我有些困惑。

# new dose levels as support for the line
newdata <- expand.grid(conc=exp(seq(log(0.5), log(100), length=100)))
# predictions and confidence intervals
pm <- predict(ryegrass.LL.4, newdata=newdata, interval="confidence")
# new data with predictions
newdata$p <- pm[,1]
newdata$pmin <- pm[,2]
newdata$pmax <- pm[,3]

# plot curve

# need to shift conc == 0 a bit up, otherwise there are problems with coord_trans
ryegrass$conc0 <- ryegrass$conc
ryegrass$conc0[ryegrass$conc0 == 0] <- 0.5
# plotting the curve
ggplot(ryegrass, aes(x = conc0, y = rootl)) +
  geom_point() +
  geom_ribbon(data=newdata, aes(x=conc, y=p, ymin=pmin, ymax=pmax), alpha=0.2) +
  geom_line(data=newdata, aes(x=conc, y=p)) +
  coord_trans(x="log") +
  xlab("Ferulic acid (mM)") + ylab("Root length (cm)")

最后，我想生成一条类似的曲线，但死亡率在 y 轴上，从 0 到 100（从低到高），并在周围的阴影灰色区域中显示置信区间回归线。意思是，我的第一步代码应该像下面这样：

model <- drc(mortality ~ Concentration, data=df, fct = LL.3()) 但是我在“死亡率”的创建部分迷失了方向，在下一步中使用 ggplot

谁能帮我实现这个目标？从 ryegrass 的示例中，我很困惑如何将其转换为对我的假数据集有帮助。我希望这里有人能够帮助我解决这个问题！非常感谢，如果有其他方法可以构建我的数据集等，我将不胜感激。

-安迪

Answer 1

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(drc)
#> Loading required package: MASS
#> 
#> Attaching package: 'MASS'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> 
#> 'drc' has been loaded.
#> Please cite R and 'drc' if used for a publication,
#> for references type 'citation()' and 'citation('drc')'.
#> 
#> Attaching package: 'drc'
#> The following objects are masked from 'package:stats':
#> 
#>     gaussian, getInitial

df <- read.table("https://pastebin.com/raw/sH5hCr2J", header=T)

制作 mortality 或像我在这里做的那样 survival，可以使用 dplyr 包轻松完成。这将有助于执行许多计算。您似乎有兴趣计算四个房间（或重复）中每个浓度的生存百分比。所以第一步就是按照这些列对数据进行分组，然后计算出我们想要的统计量。

df_calc <- df %>%
  group_by(Concentration, room) %>%
  summarise(surv = sum(Status)/n())
#> `summarise()` has grouped output by 'Concentration'. You can override using the `.groups` argument.

不知道Concentration是否代表任意浓度水平，所以我正在推进以下内容假设：

1 == 含糖量较高，2 == 含糖量较低
浓度编码为对数 space - 所以我转换为线性 space

df_calc <- mutate(df_calc, conc = exp(-Concentration))

需要说明的是，conc 变量只是我试图获得接近实验真实已知浓度的结果。如果您的数据具有真实浓度，那么请不要介意此计算。

df_calc
#> # A tibble: 12 x 4
#> # Groups:   Concentration [3]
#>    Concentration  room  surv   conc
#>            <int> <int> <dbl>  <dbl>
#>  1             1     1   0.5 0.368 
#>  2             1     2   0.4 0.368 
#>  3             1     3   0.5 0.368 
#>  4             1     4   0.6 0.368 
#>  5             2     1   0   0.135 
#>  6             2     2   0.4 0.135 
#>  7             2     3   0.2 0.135 
#>  8             2     4   0.4 0.135 
#>  9             3     1   0.2 0.0498
#> 10             3     2   0   0.0498
#> 11             3     3   0   0.0498
#> 12             3     4   0.2 0.0498

mod <- drm(surv ~ conc, data =  df_calc, fct = LL.3())

创建新的 conc 个数据点

newdata <- data.frame(conc = exp(seq(log(0.01), log(10), length = 100)))

编辑

为了回复您的评论，我将解释上面的代码块。同样，conc 变量预计为单位浓度。在这个假设的案例中，我们有三个浓度水平 c(0.049, 0.135, 0.368)。为简洁起见，假设单位为 mg of sugar/ml of water。我们的模型适合这三个剂量水平，每个剂量水平有 4 个数据点。如果我们愿意，我们可以只绘制 c(0.049, 0.368) 这些水平之间的曲线，但在这个例子中，我选择 c(0.01, 10) mg/ml 作为要绘制的域。这只是为了让我们可以根据模型拟合可视化曲线的最终位置。简而言之，您选择您最感兴趣的范围。正如我稍后展示的那样 - 尽管我们可以选择实验数据范围之外的数据点，但置信区间非常大，表明模型对这些点没有帮助。

使用 log() 函数转换这些值的原因是为了确保我们采样的点看起来均匀分布在 log10 尺度上（大多数响应曲线都是用这种变换绘制的）。一旦我们得到 100 个点的序列，我们使用 exp() 到 return 回到线性 space（我们的模型适合）。这些值然后在 predict 函数中用作新的 dose 水平与拟合模型一起使用。

所有这些都保存到 newdata 变量中，允许绘制线和置信区间。

使用模型和生成的数据点预测一个新的 surv 值以及上限和下限

newdata <- cbind(newdata,
                 suppressWarnings(predict(mod, newdata = newdata, interval="confidence")))

情节与ggplot2

ggplot(df_calc, aes(conc)) +
  geom_point(aes(y = surv)) +
  geom_ribbon(aes(ymin = Lower, ymax = Upper), data = newdata, alpha = 0.2) +
  geom_line(aes(y = Prediction), data = newdata) +
  scale_x_log10() +
  coord_cartesian(ylim = c(0, 1))

您可能会注意到，当我们尝试时，置信区间会大大增加预测没有数据的范围。

^{由 reprex package (v1.0.0)}

于 2021-10-27 创建

根据生存数据绘制剂量反应曲线

Plotting Dose Response Curve from Survival Data

r

ggplot2

drc

编辑