R:将死亡类型绘制为堆积面积图的生存曲线
R: Plot Survival curve with types of death as stacked area chart
我有一个经过右删失的数据集,其中包含给定样本的生命时间和不同类型的死亡信息,我想绘制生存曲线图(实际值将从样本中计算出来)而不是来自模型估计),将不同类型的死亡作为堆积面积图,如下所示:
我如何在 R 中完成此操作?
数据集看起来像这样:
death type time event
1 Type3 81 1
2 NA 868 0
3 Type3 1022 1
4 NA 868 0
5 NA 868 0
6 NA 868 0
7 NA 868 0
8 NA 887 0
9 Type3 156 1
10 NA 868 0
11 NA 868 0
12 NA 868 0
13 Type3 354 1
14 Type3 700 1
15 Type3 632 1
16 NA 868 0
17 Type1 308 1
18 NA 1001 0
19 NA 1054 0
20 NA 1059 0
21 Type3 120 1
22 NA 732 0
23 Type3 543 1
24 Type1 379 1
25 NA 613 0
26 NA 1082 0
27 Type3 226 1
28 Type2 1 0
29 NA 976 0
30 NA 1000 0
31 NA 706 0
32 NA 1015 0
33 Type3 882 1
34 NA 1088 0
35 NA 642 0
36 Type3 953 1
37 NA 1068 0
38 NA 819 0
39 NA 1029 0
40 Type3 34 1
41 NA 1082 0
42 Type3 498 1
43 NA 923 0
44 NA 1041 0
45 Type3 321 1
46 NA 557 0
47 NA 628 0
48 Type3 197 1
49 Type3 155 1
50 NA 955 0
其中death type带NA的表示删失数据,time为死亡时间或删失时间,event为1为死亡者,0为被删失者。 (这是 'survfit' 要求的格式,但我也有实际的开始和结束时间作为日期)
(现在,只有 50 个点不可能构建这样的曲线,但是数据有更多的行不适合这里)。
这是一段丑陋的代码,但它包含了想法。我没有花时间弄清楚如何添加图例。另请注意,这种图形虽然在概念上很有趣,但不一定反映 KM 曲线。老实说,如果您打算以这种方式呈现数据,那么在固定时间点以堆叠条形的形式呈现更有意义。
请注意,我很确定此代码中存在一些缺陷。它没有保修,但可能会让您入门。
SurvData <- structure(list(row.names = c("", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", ""), death = 1:50, type = c("Type3",
NA, "Type3", NA, NA, NA, NA, NA, "Type3", NA, NA, NA, "Type3",
"Type3", "Type3", NA, "Type1", NA, NA, NA, "Type3", NA, "Type3",
"Type1", NA, NA, "Type3", "Type2", NA, NA, NA, NA, "Type3", NA,
NA, "Type3", NA, NA, NA, "Type3", NA, "Type3", NA, NA, "Type3",
NA, NA, "Type3", "Type3", NA), time = c(81L, 868L, 1022L, 868L,
868L, 868L, 868L, 887L, 156L, 868L, 868L, 868L, 354L, 700L, 632L,
868L, 308L, 1001L, 1054L, 1059L, 120L, 732L, 543L, 379L, 613L,
1082L, 226L, 1L, 976L, 1000L, 706L, 1015L, 882L, 1088L, 642L,
953L, 1068L, 819L, 1029L, 34L, 1082L, 498L, 923L, 1041L, 321L,
557L, 628L, 197L, 155L, 955L), event = c(1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("row.names",
"death", "type", "time", "event"), class = "data.frame", row.names = c(NA,
-50L))
library(dplyr)
library(zoo)
library(RColorBrewer)
SurvDataSummary <-
arrange(SurvData, time, type) %>%
mutate(type = ifelse(is.na(type), "Alive", type)) %>%
group_by(time) %>%
#* Count the number of each type at each time point
summarise(n_at_time = n(),
alive_at_time = sum(type == "Alive"),
type1_at_time = sum(type == "Type1"),
type2_at_time = sum(type == "Type2"),
type3_at_time = sum(type == "Type3")) %>%
ungroup() %>%
mutate(n_alive = sum(n_at_time) - cumsum(lag(n_at_time, default = 0)),
#* Proportion of each type
p_type1_at_time = type1_at_time / n_alive,
p_type2_at_time = type2_at_time / n_alive,
p_type3_at_time = type3_at_time / n_alive,
#* convert 0 to NA
p_type1_at_time = ifelse(p_type1_at_time == 0, NA, p_type1_at_time),
p_type2_at_time = ifelse(p_type2_at_time == 0, NA, p_type2_at_time),
p_type3_at_time = ifelse(p_type3_at_time == 0, NA, p_type3_at_time),
#* Back fill NAs with last known value
p_type1_at_time = na.locf(p_type1_at_time, FALSE),
p_type2_at_time = na.locf(p_type2_at_time, FALSE),
p_type3_at_time = na.locf(p_type3_at_time, FALSE),
#* make leading NAs 0
p_type1_at_time = ifelse(is.na(p_type1_at_time), 0, p_type1_at_time),
p_type2_at_time = ifelse(is.na(p_type2_at_time), 0, p_type2_at_time),
p_type3_at_time = ifelse(is.na(p_type3_at_time), 0, p_type3_at_time),
#* Calculate cumulative proportions
p_alive_at_time = 1 - p_type1_at_time - p_type2_at_time - p_type3_at_time,
cump_type1_at_time = p_alive_at_time + p_type1_at_time,
cump_type2_at_time = cump_type1_at_time + p_type2_at_time,
cump_type3_at_time = cump_type2_at_time + p_type3_at_time,
#* Get the following time for using geom_rect
next_time = lead(time)) %>%
pal <- brewer.pal(4, "PRGn")
ggplot(SurvDataSummary,
aes(xmin = time,
xmax = next_time)) +
geom_rect(aes(ymin = 0, ymax = p_alive_at_time), fill = pal[1]) +
geom_rect(aes(ymin = p_alive_at_time, ymax = cump_type1_at_time), fill = pal[2]) +
geom_rect(aes(ymin = cump_type1_at_time, ymax = cump_type2_at_time), fill = pal[3]) +
geom_rect(aes(ymin = cump_type2_at_time, ymax = cump_type3_at_time), fill = pal[4])
我有一个经过右删失的数据集,其中包含给定样本的生命时间和不同类型的死亡信息,我想绘制生存曲线图(实际值将从样本中计算出来)而不是来自模型估计),将不同类型的死亡作为堆积面积图,如下所示:
我如何在 R 中完成此操作?
数据集看起来像这样:
death type time event
1 Type3 81 1
2 NA 868 0
3 Type3 1022 1
4 NA 868 0
5 NA 868 0
6 NA 868 0
7 NA 868 0
8 NA 887 0
9 Type3 156 1
10 NA 868 0
11 NA 868 0
12 NA 868 0
13 Type3 354 1
14 Type3 700 1
15 Type3 632 1
16 NA 868 0
17 Type1 308 1
18 NA 1001 0
19 NA 1054 0
20 NA 1059 0
21 Type3 120 1
22 NA 732 0
23 Type3 543 1
24 Type1 379 1
25 NA 613 0
26 NA 1082 0
27 Type3 226 1
28 Type2 1 0
29 NA 976 0
30 NA 1000 0
31 NA 706 0
32 NA 1015 0
33 Type3 882 1
34 NA 1088 0
35 NA 642 0
36 Type3 953 1
37 NA 1068 0
38 NA 819 0
39 NA 1029 0
40 Type3 34 1
41 NA 1082 0
42 Type3 498 1
43 NA 923 0
44 NA 1041 0
45 Type3 321 1
46 NA 557 0
47 NA 628 0
48 Type3 197 1
49 Type3 155 1
50 NA 955 0
其中death type带NA的表示删失数据,time为死亡时间或删失时间,event为1为死亡者,0为被删失者。 (这是 'survfit' 要求的格式,但我也有实际的开始和结束时间作为日期)
(现在,只有 50 个点不可能构建这样的曲线,但是数据有更多的行不适合这里)。
这是一段丑陋的代码,但它包含了想法。我没有花时间弄清楚如何添加图例。另请注意,这种图形虽然在概念上很有趣,但不一定反映 KM 曲线。老实说,如果您打算以这种方式呈现数据,那么在固定时间点以堆叠条形的形式呈现更有意义。
请注意,我很确定此代码中存在一些缺陷。它没有保修,但可能会让您入门。
SurvData <- structure(list(row.names = c("", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", ""), death = 1:50, type = c("Type3",
NA, "Type3", NA, NA, NA, NA, NA, "Type3", NA, NA, NA, "Type3",
"Type3", "Type3", NA, "Type1", NA, NA, NA, "Type3", NA, "Type3",
"Type1", NA, NA, "Type3", "Type2", NA, NA, NA, NA, "Type3", NA,
NA, "Type3", NA, NA, NA, "Type3", NA, "Type3", NA, NA, "Type3",
NA, NA, "Type3", "Type3", NA), time = c(81L, 868L, 1022L, 868L,
868L, 868L, 868L, 887L, 156L, 868L, 868L, 868L, 354L, 700L, 632L,
868L, 308L, 1001L, 1054L, 1059L, 120L, 732L, 543L, 379L, 613L,
1082L, 226L, 1L, 976L, 1000L, 706L, 1015L, 882L, 1088L, 642L,
953L, 1068L, 819L, 1029L, 34L, 1082L, 498L, 923L, 1041L, 321L,
557L, 628L, 197L, 155L, 955L), event = c(1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("row.names",
"death", "type", "time", "event"), class = "data.frame", row.names = c(NA,
-50L))
library(dplyr)
library(zoo)
library(RColorBrewer)
SurvDataSummary <-
arrange(SurvData, time, type) %>%
mutate(type = ifelse(is.na(type), "Alive", type)) %>%
group_by(time) %>%
#* Count the number of each type at each time point
summarise(n_at_time = n(),
alive_at_time = sum(type == "Alive"),
type1_at_time = sum(type == "Type1"),
type2_at_time = sum(type == "Type2"),
type3_at_time = sum(type == "Type3")) %>%
ungroup() %>%
mutate(n_alive = sum(n_at_time) - cumsum(lag(n_at_time, default = 0)),
#* Proportion of each type
p_type1_at_time = type1_at_time / n_alive,
p_type2_at_time = type2_at_time / n_alive,
p_type3_at_time = type3_at_time / n_alive,
#* convert 0 to NA
p_type1_at_time = ifelse(p_type1_at_time == 0, NA, p_type1_at_time),
p_type2_at_time = ifelse(p_type2_at_time == 0, NA, p_type2_at_time),
p_type3_at_time = ifelse(p_type3_at_time == 0, NA, p_type3_at_time),
#* Back fill NAs with last known value
p_type1_at_time = na.locf(p_type1_at_time, FALSE),
p_type2_at_time = na.locf(p_type2_at_time, FALSE),
p_type3_at_time = na.locf(p_type3_at_time, FALSE),
#* make leading NAs 0
p_type1_at_time = ifelse(is.na(p_type1_at_time), 0, p_type1_at_time),
p_type2_at_time = ifelse(is.na(p_type2_at_time), 0, p_type2_at_time),
p_type3_at_time = ifelse(is.na(p_type3_at_time), 0, p_type3_at_time),
#* Calculate cumulative proportions
p_alive_at_time = 1 - p_type1_at_time - p_type2_at_time - p_type3_at_time,
cump_type1_at_time = p_alive_at_time + p_type1_at_time,
cump_type2_at_time = cump_type1_at_time + p_type2_at_time,
cump_type3_at_time = cump_type2_at_time + p_type3_at_time,
#* Get the following time for using geom_rect
next_time = lead(time)) %>%
pal <- brewer.pal(4, "PRGn")
ggplot(SurvDataSummary,
aes(xmin = time,
xmax = next_time)) +
geom_rect(aes(ymin = 0, ymax = p_alive_at_time), fill = pal[1]) +
geom_rect(aes(ymin = p_alive_at_time, ymax = cump_type1_at_time), fill = pal[2]) +
geom_rect(aes(ymin = cump_type1_at_time, ymax = cump_type2_at_time), fill = pal[3]) +
geom_rect(aes(ymin = cump_type2_at_time, ymax = cump_type3_at_time), fill = pal[4])