在 R 中使用循环和 PDF 图形绘制 DNA 核苷酸数据
Plot DNA nucleotide data using loop and PDF graphics in R
我的老板让我使用 R 中的 pdf 图形功能绘制 DNA 核苷酸矩阵。我有一些正在使用的代码,但我无法弄明白并且花了太多时间试!我知道可能还有其他 methods/packages 可以可视化这些遗传数据,我对听到它们非常感兴趣,但我也需要按照分配给我的方式来做。
我在 R 中有这样的序列数据:
> head(b)
Sequence X236 X237 X238 X239 X240 X241 X242 X244 X246 X247 X248 X249 X250 X251 X252 X253 X254 X255 X256 X257 X258 X259
1 L19088.1 G G G G G A G A C C A A G A T G G C C G A A
2 chr1_43580199_43586187 · · · · · · · · · · · · · · · · · · · · g g
一共1040行483列,字符可能有A、a、G、g、T、t、C、c、中点或X。
我想为不同的字符着色并以类似于热图的方式绘制它们。点和 X 不需要着色。到目前为止我正在使用的代码是:
pdf(
sprintf(
"%s/L1.pdf",
out_dir),
width = 8.5, height = 11 )
par(omi = rep(0.5,4))
par(mai = rep(0.5,4))
par(bg = "#eeeeee")
plot( NULL,
xlim = c(1,100), ylim = c(1,140),
xlab = NA, ylab = NA,
xaxt = "n", yaxt = "n",
bty = "n", asp = 1 )
plot_width <- 100
w <- plot_width / 600
genome_colors <- list()
genome_colors[["A"]] <- "#ea0064"
genome_colors[["a"]] <- "#ea0064"
genome_colors[["C"]] <- "#008a3f"
genome_colors[["c"]] <- "#008a3f"
genome_colors[["G"]] <- "#116eff"
genome_colors[["g"]] <- "#116eff"
genome_colors[["T"]] <- "#cf00dc"
genome_colors[["t"]] <- "#cf00dc"
I <- nrow(b)
J <- ncol(b)
for ( i in 1:I ){
for ( j in i:J ){
# plot nucleotide as rectangle with color and text label, something like:
# plot nucleotides with genome_colors
# rect( (j-1)*w, top-(i-1)*w, j*w, top-i*w, col = color, border = NA )
}
# text( (j+1)*w, top-(i-1)*w, labels = i, cex = 0.05, col = "#dddddd" )
}
dev.off()
如果有人可以帮助我绘制循环或指出有用的方向,我将非常感激!
假设 df 是 wide 格式的数据框(每个位置一列,每个序列一行),示例:
df <- structure(list(sequence = c("L19088.1", "chr1_43580199_43586187"
), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
)), class = "data.frame", row.names = 1:2)
## > df
## sequence X236 X237 X238 X239 X240 X241
## 1 L19088.1 G G A T A G
## 2 chr1_43580199_43586187 . . a C c G
...您可以像这样使用 tidyverse 中的包 ggplot2
和 tidyr
:
library(tidyr)
library(ggplot2)
df %>%
## reshape to long table
## (one column each for sequence, position and nucleotide):
pivot_longer(-sequence, ## stack all columns *except* sequence
names_to = 'position',
values_to = 'nucleotide'
) %>%
## create the plot:
ggplot() +
geom_tile(aes(x = position, y = sequence, fill = nucleotide),
height = .9 ## adjust to visually separate sequences
) +
scale_fill_manual(values = c('A'='#ea0064', 'a'='#ea0064', 'C'='#008a3f',
'c'='#008a3f', 'G'='#116eff', 'g'='#116eff',
'T'='#cf00dc', 't'='#cf00dc', '.'='#a0a0a0'
)
) +
labs(x = 'x-axis-title', y='y-axis-title') +
## remove x-axis (=position) elements: they'll probably be too dense:
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
^^^ 简单的样式参见例如ggplot themes
使用方便的包装器保存图 ggsave
:
ggsave(filename = 'my_plot.pdf',
width = 12, ## inches; to fill DIN A4 landscape
height = 8
)
使用pdf()
函数时,不要忘记显式print
你的情节:
pdf(file = 'my_plot.pdf',
## ... other parameters
)
print( ## you need to print the plot
qplot(data = cars, x = speed, y = dist, geom = 'point')
)
dev.off()
我的老板让我使用 R 中的 pdf 图形功能绘制 DNA 核苷酸矩阵。我有一些正在使用的代码,但我无法弄明白并且花了太多时间试!我知道可能还有其他 methods/packages 可以可视化这些遗传数据,我对听到它们非常感兴趣,但我也需要按照分配给我的方式来做。
我在 R 中有这样的序列数据:
> head(b)
Sequence X236 X237 X238 X239 X240 X241 X242 X244 X246 X247 X248 X249 X250 X251 X252 X253 X254 X255 X256 X257 X258 X259
1 L19088.1 G G G G G A G A C C A A G A T G G C C G A A
2 chr1_43580199_43586187 · · · · · · · · · · · · · · · · · · · · g g
一共1040行483列,字符可能有A、a、G、g、T、t、C、c、中点或X。
我想为不同的字符着色并以类似于热图的方式绘制它们。点和 X 不需要着色。到目前为止我正在使用的代码是:
pdf(
sprintf(
"%s/L1.pdf",
out_dir),
width = 8.5, height = 11 )
par(omi = rep(0.5,4))
par(mai = rep(0.5,4))
par(bg = "#eeeeee")
plot( NULL,
xlim = c(1,100), ylim = c(1,140),
xlab = NA, ylab = NA,
xaxt = "n", yaxt = "n",
bty = "n", asp = 1 )
plot_width <- 100
w <- plot_width / 600
genome_colors <- list()
genome_colors[["A"]] <- "#ea0064"
genome_colors[["a"]] <- "#ea0064"
genome_colors[["C"]] <- "#008a3f"
genome_colors[["c"]] <- "#008a3f"
genome_colors[["G"]] <- "#116eff"
genome_colors[["g"]] <- "#116eff"
genome_colors[["T"]] <- "#cf00dc"
genome_colors[["t"]] <- "#cf00dc"
I <- nrow(b)
J <- ncol(b)
for ( i in 1:I ){
for ( j in i:J ){
# plot nucleotide as rectangle with color and text label, something like:
# plot nucleotides with genome_colors
# rect( (j-1)*w, top-(i-1)*w, j*w, top-i*w, col = color, border = NA )
}
# text( (j+1)*w, top-(i-1)*w, labels = i, cex = 0.05, col = "#dddddd" )
}
dev.off()
如果有人可以帮助我绘制循环或指出有用的方向,我将非常感激!
假设 df 是 wide 格式的数据框(每个位置一列,每个序列一行),示例:
df <- structure(list(sequence = c("L19088.1", "chr1_43580199_43586187"
), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"),
X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
)), class = "data.frame", row.names = 1:2)
## > df
## sequence X236 X237 X238 X239 X240 X241
## 1 L19088.1 G G A T A G
## 2 chr1_43580199_43586187 . . a C c G
...您可以像这样使用 tidyverse 中的包 ggplot2
和 tidyr
:
library(tidyr)
library(ggplot2)
df %>%
## reshape to long table
## (one column each for sequence, position and nucleotide):
pivot_longer(-sequence, ## stack all columns *except* sequence
names_to = 'position',
values_to = 'nucleotide'
) %>%
## create the plot:
ggplot() +
geom_tile(aes(x = position, y = sequence, fill = nucleotide),
height = .9 ## adjust to visually separate sequences
) +
scale_fill_manual(values = c('A'='#ea0064', 'a'='#ea0064', 'C'='#008a3f',
'c'='#008a3f', 'G'='#116eff', 'g'='#116eff',
'T'='#cf00dc', 't'='#cf00dc', '.'='#a0a0a0'
)
) +
labs(x = 'x-axis-title', y='y-axis-title') +
## remove x-axis (=position) elements: they'll probably be too dense:
theme(axis.title.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
^^^ 简单的样式参见例如ggplot themes
使用方便的包装器保存图 ggsave
:
ggsave(filename = 'my_plot.pdf',
width = 12, ## inches; to fill DIN A4 landscape
height = 8
)
使用pdf()
函数时,不要忘记显式print
你的情节:
pdf(file = 'my_plot.pdf',
## ... other parameters
)
print( ## you need to print the plot
qplot(data = cars, x = speed, y = dist, geom = 'point')
)
dev.off()