ggplot 中多个组的密度图
Density plot for multiple groups in ggplot
我看过example1 and How to overlay density plots in R? and Overlapped density plots in ggplot2关于如何制作密度图。我可以用第二个 link 中的代码绘制密度图。但是我想知道如何在 ggplot
或 plotly
中制作这样的图表?
我已经查看了所有示例,但无法解决我的问题。
我有一个带有基因表达 leukemia data description 的玩具数据框,其中的列指的是 2 组个体
leukemia_big <- read.csv("http://web.stanford.edu/~hastie/CASI_files/DATA/leukemia_big.csv")
df <- data.frame(class= ifelse(grepl("^ALL", colnames(leukemia_big),
fixed = FALSE), "ALL", "AML"), row.names = colnames(leukemia_big))
plot(density(as.matrix(leukemia_big[,df$class=="ALL"])),
lwd=2, col="red")
lines(density(as.matrix(leukemia_big[,df$class=="AML"])),
lwd=2, col="darkgreen")
Ggplot 需要整齐的格式数据,也称为长格式数据框。
以下示例将执行此操作。但要小心,提供的数据集按患者类型具有几乎相同的值分布,因此当您绘制 ALL 和 AML 类型的患者时,曲线重叠并且您看不到差异。
library(tidyverse)
leukemia_big %>%
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
ggplot(aes(x = value, fill = type)) + geom_density(alpha = 0.5)
在第二个示例中,我将为所有 AML 类型患者的值变量添加 1 个单位,以直观地演示重叠问题
leukemia_big %>%
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
mutate(value2 = if_else(condition = type == "ALL", true = value, false = value + 1)) %>% # Helps demonstrate the overlapping between both type of patients
ggplot(aes(x = value2, fill = type)) + geom_density(alpha = 0.5)`
我看过example1 and How to overlay density plots in R? and Overlapped density plots in ggplot2关于如何制作密度图。我可以用第二个 link 中的代码绘制密度图。但是我想知道如何在 ggplot
或 plotly
中制作这样的图表?
我已经查看了所有示例,但无法解决我的问题。
我有一个带有基因表达 leukemia data description 的玩具数据框,其中的列指的是 2 组个体
leukemia_big <- read.csv("http://web.stanford.edu/~hastie/CASI_files/DATA/leukemia_big.csv")
df <- data.frame(class= ifelse(grepl("^ALL", colnames(leukemia_big),
fixed = FALSE), "ALL", "AML"), row.names = colnames(leukemia_big))
plot(density(as.matrix(leukemia_big[,df$class=="ALL"])),
lwd=2, col="red")
lines(density(as.matrix(leukemia_big[,df$class=="AML"])),
lwd=2, col="darkgreen")
Ggplot 需要整齐的格式数据,也称为长格式数据框。 以下示例将执行此操作。但要小心,提供的数据集按患者类型具有几乎相同的值分布,因此当您绘制 ALL 和 AML 类型的患者时,曲线重叠并且您看不到差异。
library(tidyverse)
leukemia_big %>%
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
ggplot(aes(x = value, fill = type)) + geom_density(alpha = 0.5)
在第二个示例中,我将为所有 AML 类型患者的值变量添加 1 个单位,以直观地演示重叠问题
leukemia_big %>%
as_data_frame() %>% # Optional, makes df a tibble, which makes debugging easier
gather(key = patient, value = value, 1:72) %>% #transforms a wide df into a tidy or long df
mutate(type = gsub('[.].*$','', patient)) %>% #creates a variable with the type of patient
mutate(value2 = if_else(condition = type == "ALL", true = value, false = value + 1)) %>% # Helps demonstrate the overlapping between both type of patients
ggplot(aes(x = value2, fill = type)) + geom_density(alpha = 0.5)`