如何在 R 中绘制拷贝数变异曲线?

How to plot copy number variation profile in R?

我正在尝试在 R 中绘制拷贝数变异曲线图。这就是我正在寻找的,但所有单元格都在我的数据中。

倍性在 Y 轴上,染色体数在 X 轴上

这是我的数据,这是我迄今为止尝试过的数据,但没有提供我正在寻找的数据

input <- data.frame(chrom = sample("chr1"),start = sample(c(780000, 2920000, 4920000)), stop=sample(c(2920000, 4920000, 692000)), cell0=sample(1), cell1=sample(1,3,1),cell2=sample(2,1,2)
ggplot(input, aes(x=chrom, y=cell_0, group=1)) +
  geom_point() +
  geom_line(color = "#00AFBB", size = 1) 

这是整个文件的link

https://pastebin.com/440AX3Dr

当我 运行 答案中的代码时,这就是我得到的。我希望所有的染色体都能像上图一样水平。

我们可以使用 facet_wrap 将每个 chrom 并排放置。我使用了一堆格式变量来使情节看起来像上面显示的那样。为了更好地说明,我还用两个 chrom 制作了自己的数据。往下看;

read.table(text="chrom start stop cell_0 cell_1 cell_2
chr1 780000 2920000 2 2 2
chr1 2920000 4920000 1 2 3
chr1 4920000 6920000 2 3 2
chr2 480000 1920000 1 2 3
chr2 1920000 2920000 2 2 2
chr2 2920000 3920000 1 3 3", header=T) -> input
library(ggplot2)
library(tidyr)

input %>% 
  pivot_longer(c(start,stop)) %>% 
    ggplot(., aes(x=value, y=as.factor(cell_0), group=1L)) +
      geom_point(colour="grey") +
      facet_wrap(~chrom, strip.position = "bottom", scales = "free_x") +
      geom_line(color = "#00AFBB", size = 1) +
      theme_bw() +
      theme(panel.spacing.x=unit(0, "lines"),
            panel.spacing.y=unit(0, "lines"),
            axis.title.x=element_blank(),
            axis.text.x=element_blank(),
            axis.ticks.x=element_blank(),
            strip.background = element_rect(color="black", fill="white")) +
      scale_x_continuous(expand = c(.01, 0)) +
      scale_y_discrete("ploidy", expand = c(.3,.3)) +
      ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

整个数据的更新解

我添加了另一列来展示这如何适用于两个 cell 列。不过,这块地块会很拥挤。

# input <- read.table(file = "clipboard", header=T)
## read data from pastebin

library(ggplot2)
library(tidyr)
library(dplyr)

set.seed(123)

input %>% 
  mutate(cell_1 = cell_0  + 
         sample.int(1, 1417, replace = T) * sample(c(-1,1),1417, replace = T)) %>% 
  pivot_longer(c(start,stop), names_to = "step", values_to = "time") %>% 
  pivot_longer(c(cell_0,cell_1), names_to = "cell", values_to = "ploidy") %>% 
  ggplot(data=., aes(x=time, y=as.factor(ploidy), group=cell)) +
  geom_point(aes(colour=cell)) +
  facet_wrap(~chrom, strip.position = "bottom", scales = "free_x", nrow=1) +
  geom_line(aes(color = cell), size = 1, alpha=0.5) +
  theme_bw() +
  scale_x_continuous(expand = c(.01, 0)) +
  scale_y_discrete("ploidy", expand = c(.1,.1)) +
  theme(panel.spacing.x=unit(0, "lines"),
        panel.spacing.y=unit(0, "lines"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.background = element_rect(color="black", fill="white"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"),
        plot.title = element_text(hjust = 0.5)) +
  ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

最终更新:

library(ggplot2)
library(tidyr)
library(dplyr)
library(stringr)

input %>% 
  pivot_longer(c(start,stop), names_to = "step", values_to = "time") %>% 
  mutate(chrom = factor(chrom, levels = str_sort(unique(chrom), numeric = T))) %>% 
  ggplot(data=., aes(x=time, y=as.factor(cell_0), group=1L)) +
  geom_point(colour="grey", size=0.5) +
  geom_line(color = "#00AFBB", size = 1, alpha=0.5) +
  facet_wrap(~as.factor(chrom), 
             strip.position = "bottom", scales = "free_x", nrow=1) +
  theme_bw() +
  scale_x_continuous(expand = c(.01, 0)) +
  scale_y_discrete("ploidy", expand = c(.1,.1)) +
  theme(panel.spacing.x=unit(0, "lines"),panel.spacing.y=unit(0, "lines"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.background = element_rect(color="black", fill="white"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"),
        plot.title = element_text(hjust = 0.5)) +
  ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

reprex package (v0.3.0)

于 2019-12-10 创建