如何在一个折线图中绘制多个组的成对均值？

Question

我还在学习 R 代码，所以也许这个问题很简单，但我就是想不出来。

我想绘制在三个不同时间点进行的调查问卷的平均分数和置信区间：基线时、4 个治疗周期后和 8 个治疗周期后。本问卷包含 3 个量表；感觉、运动和自主神经。所以我想绘制每个时间点三个不同尺度的平均分数。所以我想要一个线图，X 轴上有时间点（基线；4 个周期后；8 个周期后），Y 轴上我想要分数，图表必须包含三个不同的颜色线，表示感觉、运动和自主尺度。我想使用 ggplot。

我有一个包含以下列的数据框：

ID -> 这是患者的ID（我的总共有60个患者数据框）
c0sen -> 基线感官量表得分
c4sen -> 4 个治疗周期后的感觉量表评分
c8sen -> 8 个治疗周期后的感觉量表评分
c0mot -> 基线运动量表得分
c4mot -> 4 个治疗周期后的运动量表评分
c8mot -> 8 个治疗周期后的运动量表评分
c0aut -> 基线时的自主规模得分
c4aut -> 4 个治疗周期后的自主神经量表评分
c8aut -> 8 个治疗周期后的自主神经量表评分

这就是我想要的：

希望有人能帮助我！非常感谢！

Answer 1

这是我用虚构的数据想出来的。感谢您分享您的数据结构，但将来最好分享数据本身，这可以通过控制台中的 dput(your.data.frame) 完成，然后 copying/pasting 输出到问题中作为代码...或者只是使用代码创建一个虚拟集，这就是我在这里所做的。

虚拟数据

library(tidyr)
library(dplyr)
library(ggplot2)

raw_df <- data.frame(
  id=1:60,
  c0sen=rnorm(60, 7, 0.2),
  c4sen=rnorm(60, 8.5, 0.5),
  c8sen=rnorm(60, 11, 1.2),
  c0mot=rnorm(60, 6, 0.3),
  c4mot=rnorm(60, 7.5, 0.5),
  c8mot=rnorm(60, 9.6, 0.8),
  c0aut=rnorm(60, 3, 0.1),
  c4aut=rnorm(60, 2.9, 0.1),
  c8aut=rnorm(60, 3.5, 0.8)
)

处理数据

在继续绘图之前，您需要准备数据集以使用 ggplot2 进行绘图。与 Tidyverse, you should prepare your data to be following Tidy Data Principles 中的其他软件包一样，这就是我将在此处使用 tidyr 和 dplyr 软件包所做的。

按原样排列的数据有很多相同的信息分布在我们需要 gather() 在一起的多个列中，但在每一列中也有我们需要 separate() 的多条信息]分开（时间和测量类型）。

第一步是将数据收集成“长”格式，其中有一列用于 measure（c0aut、c8mot 等），一列用于 score，同时维护 id 列。然后我们需要将measure列分成两列：一列描述time，另一列描述测量的type。

df <- raw_df %>%
  gather(key='measure', value='score', -id) %>%
  separate(col=measure, into=c('c_time','type'), sep=2)

最后，我想修复 c_time 只给我号码，我们可以这样做：

df <- df %>% separate(c_time, into=c('c', 'time'), sep=1) %>%
  select(-c)

现在，应该注意 df$time 实际上是一个字符向量（不是数值）...但这实际上没关系，因为我们希望 ggplot2 将其视为序数因子，而不是数值，因为在 x 轴上我们希望 0、4 和 8 均匀分布。

绘制数据

既然你提到这对你来说是新的，我将把情节代码分解成几个部分，这样就可以很容易地按照创建情节所采取的步骤进行操作。首先，我们从基础开始，我们在其中设置数据框以及整个过程中使用的通用美学。请注意，color= 映射到类型，但 group= 也是如此。这是必要的，以便 ggplot2 知道数据也应该根据类型分组（而不是将数据集作为一个整体）。这对我们要绘制的几何体来说非常重要。

p <- ggplot(df, aes(x=time, y=score, color=type, group=type))

统计和几何学。 然后，我们通过 3 次调用 stat_summary 在绘图区域绘制数据，绘制线、误差线和点（按此顺序）。误差条是使用均值 +/- 标准误差 ("mean_se") 绘制的，尽管当然可以使用其他函数。我们还必须用错误栏覆盖 color= 美学，因为我们希望它们都是黑色的（而不是根据类型着色），并且我们必须向点添加 shape= 美学，以便我们可以将其映射到类型以匹配您的情节。

p <- p + stat_summary(
    geom='line', fun=mean) +
  stat_summary(
    geom='errorbar', fun.data=mean_se,
    color='black', width=0.1) +
  stat_summary(
    geom='point', fun=mean, aes(shape=type))

体重秤。 对于比例，我通过重命名我们的 "0", "4", "8" 轴来设置 x 轴属性，并且我还将扩展设置为不像默认扩展那么多，因为它看起来更好一些。 scale_color 和 scale_shape 调用对于同时更改它们很重要，否则您将失去两个比例之间的连接，并且 ggplot2 实际上会显示两个单独的比例。

type_labels <- c('Autonomic', 'Motor', 'Sensory')

p <- p + scale_x_discrete(
    name=NULL, labels=c('Baseline', '4 weeks', '8 weeks'),
    expand=expansion(mult=0.05)) +
  scale_color_manual(name=NULL, labels=type_labels, values=rainbow(3)) +
  scale_shape_discrete(name=NULL, labels=type_labels)

主题元素。 最后，我设置了主题元素，其中包括命名内容、保持 theme_bw() 的整体整洁外观以及在图例周围添加方框，我将其放置在底部。

p <- p + labs(
    title='EORTC QLQ-CIPN20',
    y='Symptom Score'
  ) +
  theme_bw() +
  theme(
    legend.position='bottom',
    legend.title=element_blank(),
    legend.background = element_rect(color='black')
  )
p

这一切都为您提供了以下内容：

Answer 2

在这样的问题中包含您的实际数据总是一个好主意，但以下内容应该与您所拥有的非常接近：

set.seed(123)

df  <- data.frame(ID    = factor(1:60),
                  c0sen = rbinom(60, 15, 8.8/15),
                  c4sen = rbinom(60, 15, 9.2/15),
                  c8sen = rbinom(60, 15, 10/15),
                  c0mot = rbinom(60, 15, 8.1/15),
                  c4mot = rbinom(60, 15, 8.4/15),
                  c8mot = rbinom(60, 15, 8.6/15),
                  c0aut = rbinom(60, 15, 3/15),
                  c4aut = rbinom(60, 15, 3/15),
                  c8aut = rbinom(60, 15, 3.5/15))
head(df)
#>   ID c0sen c4sen c8sen c0mot c4mot c8mot c0aut c4aut c8aut
#> 1  1    10     8     9     6     8     7     1     3     2
#> 2  2     7    12    11     9     8    13     2     3     5
#> 3  3     9    10    11     7    10     7     5     3     3
#> 4  4     7    10    11     9     8     7     2     2     4
#> 5  5     6     8    11     8     9     8     2     5     6
#> 6  6    12     9     6     8     7     9     4     3     2

现在，这只是使用 ggplot 绘图的错误格式。您首先需要将数据转换为长格式，然后对其进行汇总。在这里，我们使用 reshape2::melt 将数据整形为适当的列，然后使用来自 dplyr 的 summarize 进行汇总：

library(reshape2)
library(dplyr)

summary_df <- melt(df) %>% 
  mutate(time = as.numeric(substr(variable, 2, 2))) %>%
  transmute(ID, time, modality = as.factor(substr(variable, 3, 5)), 
            score = value) %>%
  group_by(modality, time) %>%
  summarize(mean = mean(score), 
            upper = mean + 1.96 * sd(score)/sqrt(length(score)),
            lower = mean - 1.96 * sd(score)/sqrt(length(score)))

这给了我们一些可以使用的东西：

summary_df
#> # A tibble: 9 x 5
#> # Groups:   modality [3]
#>   modality  time  mean upper lower
#>   <fct>    <dbl> <dbl> <dbl> <dbl>
#> 1 aut          0  2.93  3.35  2.52
#> 2 aut          4  2.87  3.25  2.48
#> 3 aut          8  3.45  3.89  3.01
#> 4 mot          0  7.95  8.38  7.52
#> 5 mot          4  8.48  8.99  7.98
#> 6 mot          8  8.62  9.15  8.09
#> 7 sen          0  8.7   9.18  8.22
#> 8 sen          4  9.17  9.63  8.71
#> 9 sen          8 10.1  10.5   9.70

现在我们使用 geom_line、geom_point 和 geom_errorbar 的组合进行绘图：

library(ggplot2)

ggplot(summary_df, aes(x = time, y = mean, colour = modality)) + 
  geom_line(size = 1) + 
  geom_point(aes(shape = modality), size = 3) +
  geom_errorbar(aes(ymin = lower, ymax = upper), width = 0.2, size = 1) +
  theme_classic() +
  scale_color_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
  scale_shape_discrete(labels = c("Autonomic", "Motor", "Sensory")) +
  theme(legend.position = "bottom", text = element_text(size = 12)) +
  labs(x = "Cycles", y = "Symptom score")

给我们想要的结果：

^{由 reprex package (v0.3.0)}

于 2020 年 7 月 2 日创建

如何在一个折线图中绘制多个组的成对均值？

How to plot paired means for multiple groups in one line graph?

r

linegraph

ggplot2

confidence-interval

虚拟数据

处理数据

绘制数据