列宽的小提琴图ggplot2
violin plot ggplot2 with width from column
我是 R 的新手,我只将它用于可视化,所以我可能会遗漏一些简单的东西。
只是我想要的是,我有两列应该是 x 和 y 轴。我的第三列应该定义图形的宽度。尽管我从不同的答案中尝试了很多东西,但我对代码并没有走得太远。假设我在代码中到目前为止:
ggplot(disM, aes(x=study, y=value)) +
geom_violin() +
labs(list(title="Distribution", x="Studies", y="Ranges"))
这并没有真正取得任何成就。
我有一个 table 这样的:
Col0 study value
1 30-31 breast cancer 357263
2 32-33 breast cancer 352067
3 34-35 breast cancer 340264
4 36-37 breast cancer 309827
5 38-39 breast cancer 298684
6 40-41 breast cancer 322570
7 42-43 breast cancer 338480
8 44-45 breast cancer 354451
9 46-47 breast cancer 429183
10 48-49 breast cancer 396942
11 50-51 breast cancer 415195
12 52-53 breast cancer 368217
13 54-55 breast cancer 445884
14 56-57 breast cancer 395652
15 58-59 breast cancer 386643
16 60-61 breast cancer 461940
17 62-63 breast cancer 473772
18 64-65 breast cancer 464228
19 66-67 breast cancer 485851
20 68-69 breast cancer 513411
21 70-71 breast cancer 576618
22 72-73 breast cancer 588724
23 74-75 breast cancer 634343
24 76-77 breast cancer 584662
25 78-79 breast cancer 608901
26 80-81 breast cancer 617286
27 82-83 breast cancer 659318
28 84-85 breast cancer 757167
29 86-87 breast cancer 1044465
30 88-89 breast cancer 982901
31 90-91 breast cancer 1114269
32 92-93 breast cancer 1110257
33 94-95 breast cancer 1742966
34 96-97 breast cancer 6379974
35 98-99 breast cancer 3437746
36 100-101 breast cancer 118984063
37 30-31 renal cancer 1055566
38 32-33 renal cancer 1089405
39 34-35 renal cancer 1228087
40 36-37 renal cancer 1265606
41 38-39 renal cancer 1264919
42 40-41 renal cancer 1248949
43 42-43 renal cancer 1391738
44 44-45 renal cancer 1453100
45 46-47 renal cancer 1443915
46 48-49 renal cancer 1429785
47 50-51 renal cancer 1372041
48 52-53 renal cancer 1339706
49 54-55 renal cancer 1418135
50 56-57 renal cancer 1484162
51 58-59 renal cancer 1582617
52 60-61 renal cancer 1571977
53 62-63 renal cancer 1652503
54 64-65 renal cancer 1742230
55 66-67 renal cancer 1859936
56 68-69 renal cancer 1928028
57 70-71 renal cancer 2041783
58 72-73 renal cancer 2108994
59 74-75 renal cancer 2154244
60 76-77 renal cancer 2218430
61 78-79 renal cancer 2333206
62 80-81 renal cancer 2377262
63 82-83 renal cancer 2345651
64 84-85 renal cancer 2402114
65 86-87 renal cancer 2519284
66 88-89 renal cancer 2542761
67 90-91 renal cancer 2587606
68 92-93 renal cancer 2308279
69 94-95 renal cancer 2980927
70 96-97 renal cancer 14108950
71 98-99 renal cancer 2762116
72 100-101 renal cancer 211513230
X 轴应为研究列,y 轴应为 Col0
,小提琴图的宽度应为值列。我不能拆分 col0,因为我只有一个范围的数据。
任何关于检查内容的指针,如何做到这一点将不胜感激。对不起,如果我错过了类似的问题。
提前致谢
我来猜一猜。 (如果我是对的,您还可以查找有关 pyramid plots 的信息。)
重新排序标签,使“100-101”真正排在最后:
disM$Col0 <- factor(disM$Col0,levels=unique(disM$Col0))
重新排列以便更容易绘制多边形(我希望有更简单的方法来做到这一点,但想不出一个):
library(plyr)
disM2 <- ddply(disM,"study",
function(dd) with(dd,
data.frame(y=c(as.numeric(Col0),rev(as.numeric(Col0))),
x=c(-value/2,rev(value/2)))))
library(ggplot2); theme_set(theme_bw())
ggplot(disM2)+
geom_polygon(aes(x,y),alpha=0.5)+
facet_wrap(~study)+
labs(list(title="Distribution"))+
scale_y_continuous(breaks=as.numeric(disM$Col0),
labels=disM$Col0)+
scale_x_continuous(labels=NULL)
另一种取法如下。
首先你应该计算你的 classes 标记(class 最大值 + class 最小值 / 2)。在您的情况下,它是间隔的中点(您可以使用 levels(x = my_data$col0)
提取)。
my_data$class_mark <- rep(x = seq(from = 30.5, to = 100.5, by = 2), times = 2)
那么你应该拆分你的数据:
my_data_br <- my_data[which(my_data$study == "breast cancer"),]
my_data_re <- my_data[which(my_data$study == "renal cancer"),]
您的 value
列的数字非常大,因此您应该将它们除以最小值:
my_data_br$value <- my_data_br$value/min(my_data_br$value)
my_data_re$value <- my_data_re$value/min(my_data_re$value)
之后,您应该重复每个 class 标记的次数与其 value
的次数相同。
classmark_rep_br <- rep(x = my_data_br[,4], times = my_data_br[,3])
br_rep <- rep("breast cancer", times = length(x = classmark_rep_br))
br_data <- cbind.data.frame(br_rep, classmark_rep_br)
names(br_data) <- c("study", "value")
classmark_rep_re <- rep(x = my_data_re[,4], times = my_data_re[,3])
re_rep <- rep("renal cancer", times = length(x = classmark_rep_re))
re_data <- cbind.data.frame(re_rep, classmark_rep_re)
names(re_data) <- c("study", "value")
最后创建你的新数据库:
my_data2 <- rbind.data.frame(br_data, re_data)
现在您可以根据自己的喜好制作漂亮的图表(例如 the following one)并保存:
my_graph <- ggplot(data = my_data2, aes(x = study, y = value, fill = study)) + geom_violin() +
theme(legend.position = "none", panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text = element_text(size = 10, face = "bold"), panel.background = element_blank(),
axis.line = element_line(colour = "black")) +
labs(x = "", y = "") + scale_fill_brewer(palette="Pastel1") +
scale_x_discrete(labels = c("Breast cancer", "Renal cancer")) +
scale_y_continuous(breaks = c(30.5, 40.5, 50.5, 60.5, 70.5, 80.5, 90.5, 100.5),
labels = c("30-31", "40-41", "50-51", "60-61", "70-71", "80-81", "90-91", "100-101"))
ggsave(plot = my_graph, filename = "my_graph", path = "output/path/", device = "png", dpi = 200)
我是 R 的新手,我只将它用于可视化,所以我可能会遗漏一些简单的东西。
只是我想要的是,我有两列应该是 x 和 y 轴。我的第三列应该定义图形的宽度。尽管我从不同的答案中尝试了很多东西,但我对代码并没有走得太远。假设我在代码中到目前为止:
ggplot(disM, aes(x=study, y=value)) +
geom_violin() +
labs(list(title="Distribution", x="Studies", y="Ranges"))
这并没有真正取得任何成就。
我有一个 table 这样的:
Col0 study value
1 30-31 breast cancer 357263
2 32-33 breast cancer 352067
3 34-35 breast cancer 340264
4 36-37 breast cancer 309827
5 38-39 breast cancer 298684
6 40-41 breast cancer 322570
7 42-43 breast cancer 338480
8 44-45 breast cancer 354451
9 46-47 breast cancer 429183
10 48-49 breast cancer 396942
11 50-51 breast cancer 415195
12 52-53 breast cancer 368217
13 54-55 breast cancer 445884
14 56-57 breast cancer 395652
15 58-59 breast cancer 386643
16 60-61 breast cancer 461940
17 62-63 breast cancer 473772
18 64-65 breast cancer 464228
19 66-67 breast cancer 485851
20 68-69 breast cancer 513411
21 70-71 breast cancer 576618
22 72-73 breast cancer 588724
23 74-75 breast cancer 634343
24 76-77 breast cancer 584662
25 78-79 breast cancer 608901
26 80-81 breast cancer 617286
27 82-83 breast cancer 659318
28 84-85 breast cancer 757167
29 86-87 breast cancer 1044465
30 88-89 breast cancer 982901
31 90-91 breast cancer 1114269
32 92-93 breast cancer 1110257
33 94-95 breast cancer 1742966
34 96-97 breast cancer 6379974
35 98-99 breast cancer 3437746
36 100-101 breast cancer 118984063
37 30-31 renal cancer 1055566
38 32-33 renal cancer 1089405
39 34-35 renal cancer 1228087
40 36-37 renal cancer 1265606
41 38-39 renal cancer 1264919
42 40-41 renal cancer 1248949
43 42-43 renal cancer 1391738
44 44-45 renal cancer 1453100
45 46-47 renal cancer 1443915
46 48-49 renal cancer 1429785
47 50-51 renal cancer 1372041
48 52-53 renal cancer 1339706
49 54-55 renal cancer 1418135
50 56-57 renal cancer 1484162
51 58-59 renal cancer 1582617
52 60-61 renal cancer 1571977
53 62-63 renal cancer 1652503
54 64-65 renal cancer 1742230
55 66-67 renal cancer 1859936
56 68-69 renal cancer 1928028
57 70-71 renal cancer 2041783
58 72-73 renal cancer 2108994
59 74-75 renal cancer 2154244
60 76-77 renal cancer 2218430
61 78-79 renal cancer 2333206
62 80-81 renal cancer 2377262
63 82-83 renal cancer 2345651
64 84-85 renal cancer 2402114
65 86-87 renal cancer 2519284
66 88-89 renal cancer 2542761
67 90-91 renal cancer 2587606
68 92-93 renal cancer 2308279
69 94-95 renal cancer 2980927
70 96-97 renal cancer 14108950
71 98-99 renal cancer 2762116
72 100-101 renal cancer 211513230
X 轴应为研究列,y 轴应为 Col0
,小提琴图的宽度应为值列。我不能拆分 col0,因为我只有一个范围的数据。
任何关于检查内容的指针,如何做到这一点将不胜感激。对不起,如果我错过了类似的问题。
提前致谢
我来猜一猜。 (如果我是对的,您还可以查找有关 pyramid plots 的信息。)
重新排序标签,使“100-101”真正排在最后:
disM$Col0 <- factor(disM$Col0,levels=unique(disM$Col0))
重新排列以便更容易绘制多边形(我希望有更简单的方法来做到这一点,但想不出一个):
library(plyr)
disM2 <- ddply(disM,"study",
function(dd) with(dd,
data.frame(y=c(as.numeric(Col0),rev(as.numeric(Col0))),
x=c(-value/2,rev(value/2)))))
library(ggplot2); theme_set(theme_bw())
ggplot(disM2)+
geom_polygon(aes(x,y),alpha=0.5)+
facet_wrap(~study)+
labs(list(title="Distribution"))+
scale_y_continuous(breaks=as.numeric(disM$Col0),
labels=disM$Col0)+
scale_x_continuous(labels=NULL)
另一种取法如下。
首先你应该计算你的 classes 标记(class 最大值 + class 最小值 / 2)。在您的情况下,它是间隔的中点(您可以使用 levels(x = my_data$col0)
提取)。
my_data$class_mark <- rep(x = seq(from = 30.5, to = 100.5, by = 2), times = 2)
那么你应该拆分你的数据:
my_data_br <- my_data[which(my_data$study == "breast cancer"),]
my_data_re <- my_data[which(my_data$study == "renal cancer"),]
您的 value
列的数字非常大,因此您应该将它们除以最小值:
my_data_br$value <- my_data_br$value/min(my_data_br$value)
my_data_re$value <- my_data_re$value/min(my_data_re$value)
之后,您应该重复每个 class 标记的次数与其 value
的次数相同。
classmark_rep_br <- rep(x = my_data_br[,4], times = my_data_br[,3])
br_rep <- rep("breast cancer", times = length(x = classmark_rep_br))
br_data <- cbind.data.frame(br_rep, classmark_rep_br)
names(br_data) <- c("study", "value")
classmark_rep_re <- rep(x = my_data_re[,4], times = my_data_re[,3])
re_rep <- rep("renal cancer", times = length(x = classmark_rep_re))
re_data <- cbind.data.frame(re_rep, classmark_rep_re)
names(re_data) <- c("study", "value")
最后创建你的新数据库:
my_data2 <- rbind.data.frame(br_data, re_data)
现在您可以根据自己的喜好制作漂亮的图表(例如 the following one)并保存:
my_graph <- ggplot(data = my_data2, aes(x = study, y = value, fill = study)) + geom_violin() +
theme(legend.position = "none", panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
axis.text = element_text(size = 10, face = "bold"), panel.background = element_blank(),
axis.line = element_line(colour = "black")) +
labs(x = "", y = "") + scale_fill_brewer(palette="Pastel1") +
scale_x_discrete(labels = c("Breast cancer", "Renal cancer")) +
scale_y_continuous(breaks = c(30.5, 40.5, 50.5, 60.5, 70.5, 80.5, 90.5, 100.5),
labels = c("30-31", "40-41", "50-51", "60-61", "70-71", "80-81", "90-91", "100-101"))
ggsave(plot = my_graph, filename = "my_graph", path = "output/path/", device = "png", dpi = 200)