geom_bar ggplot2 具有正值和负值的堆叠、分组条形图 - 金字塔图

geom_bar ggplot2 stacked, grouped bar plot with positive and negative values - pyramid plot

我什至不知道如何正确描述我试图生成的情节,这不是一个好的开始。我将首先向您展示我的数据,然后尝试解释/展示包含其中元素的图像。

我的数据:

   strain condition count.up count.down
1    phbA  balanced      120       -102
2    phbA   limited      114       -319
3    phbB  balanced      122       -148
4    phbB   limited       97       -201
5   phbAB  balanced      268       -243
6   phbAB   limited      140       -189
7    phbC  balanced       55        -65
8    phbC   limited      104       -187
9    phaZ  balanced       99        -28
10   phaZ   limited      147       -205
11   bdhA  balanced      246       -159
12   bdhA   limited      143       -383
13  acsA2  balanced      491       -389
14  acsA2   limited      131       -295

我有七个样本,每个样本有两种情况。对于这些样本中的每一个,我都有下调的基因数量和上调的基因数量(count.down 和 count.up)。

我想绘制此图以便将每个样本分组;所以 phbA balanced 在 phbA limited 旁边被闪避了。每个条在图的正侧有一部分(代表 count.up #),在图的负侧有一部分(代表 count.down #)。

我希望 'balanced' 条件下的条形是一种颜色,'limited' 条件下的条形是另一种颜色。理想情况下,每种颜色会有两种渐变(一种用于 count.up,一种用于 count.down),只是为了在条形的两个部分之间形成视觉差异。

一些包含我试图整合的元素的图像:

我也尝试过应用这个 Whosebug 示例的一些部分,但我不知道如何让它适用于我的数据集。 I like the pos v. neg bars here; a single bar that covers both, and the colour differentiation of it. This does not have the grouping of conditions for one sample, or the colour coding extra layer that differentiates condition

我尝试了很多东西,但就是做不好。我认为我真的很挣扎,因为很多 geom_bar 示例使用计数数据,该图会自行计算,而我直接给它提供计数数据。我似乎无法在我的代码中成功地进行区分,当我转移到 stat= "identity" 时,一切都变得一团糟。任何想法或建议将不胜感激!

使用 link 建议: 所以我一直在用它作为模板,但我被卡住了。

df <- read.csv("countdata.csv", header=T) 
df.m <- melt(df, id.vars = c("strain", "condition")) 
ggplot(df.m, aes(condition)) + geom_bar(subset = ,(variable == "count.up"),    aes(y = value, fill = strain), stat = "identity") + geom_bar(subset = ,(variable == "count.down"), aes(y = -value, fill = strain), stat = "identity") + xlab("") + scale_y_continuous("Export - Import",formatter = "comma") 

当我尝试 运行 ggplot 行时,它返回错误:找不到函数“.”。我意识到我没有 dplyr installed/loaded,所以我这样做了。 然后我玩了很多,最后得出:

library(ggplot2)
library(reshape2)
library(dplyr)
library(plyr)

df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))

#this is what the df.m looks like now (if you look at my initial input df, I    just changed in the numbers in excel to all be positive). Included so you can see what the melt does
df.m =read.table(text = "
strain condition   variable value
1    phbA  balanced   count.up   120
2    phbA   limited   count.up   114
3    phbB  balanced   count.up   122
4    phbB   limited   count.up    97
5   phbAB  balanced   count.up   268
6   phbAB   limited   count.up   140
7    phbC  balanced   count.up    55
8    phbC   limited   count.up   104
9    phaZ  balanced   count.up    99
10   phaZ   limited   count.up   147
11   bdhA  balanced   count.up   246
12   bdhA   limited   count.up   143
13  acsA2  balanced   count.up   491
14  acsA2   limited   count.up   131
15   phbA  balanced count.down   102
16   phbA   limited count.down   319
17   phbB  balanced count.down   148
18   phbB   limited count.down   201
19  phbAB  balanced count.down   243
20  phbAB   limited count.down   189
21   phbC  balanced count.down    65
22   phbC   limited count.down   187
23   phaZ  balanced count.down    28
24   phaZ   limited count.down   205
25   bdhA  balanced count.down   159 
26   bdhA   limited count.down   383
27  acsA2  balanced count.down   389
28  acsA2   limited count.down   295", header = TRUE)

此图按应变绘制,count.up 和 count.down 两种条件下的值

ggplot(df.m, aes(strain)) + geom_bar(subset = .(variable == "count.up"), aes(y = value, fill = condition), stat = "identity") + geom_bar(subset = .(variable == "count.down"), aes(y = -value, fill = condition), stat = "identity") + xlab("") 

#this adds a line break at zero
labels <- gsub("20([0-9]{2})M([0-9]{2})", "\2\n\1",
           df.m$strain)


#this adds a line break at zero to improve readability
last_plot() + geom_hline(yintercept = 0,colour = "grey90")

我无法开始工作的一件事(不幸的是)是如何在每个条形框中显示代表 'value' 的数字。我已经得到要显示的数字,但我无法将它们放在正确的位置。我要疯了!

我的数据和上面一样;这是我的代码所在的位置

我看过大量在闪避图上使用 geom_text 显示标签的示例。我一直无法成功实施。我得到的最接近的如下 - 任何建议将不胜感激!

library(ggplot2)
library(reshape2)
library(plyr)
library(dplyr)
df <- read.csv("countdata.csv", header=T)
df.m <- melt(df, id.vars = c("strain", "condition"))
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(subset = .(variable == "count.up"), 
aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(subset = .(variable == "count.down"), 
aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")

last_plot() + geom_text(aes(strain, value, group=condition, label=label, ymax = 500, ymin= -500), position = position_dodge(width=0.9),size=4)

这给出了这个:

为什么不对齐!

我怀疑我的问题与我实际绘制的方式有关,或者我没有正确告诉 geom_text 命令如何定位自己。有什么想法吗?

试试这个。正如您用两个陈述(一个表示正面,一个表示负面)定位条一样,以相同的方式定位文本。然后,使用 vjust 微调它们的位置(在栏内或栏外)。此外,数据框中没有 'label' 变量;我假设标签是 value.

library(ggplot2)

## Using your df.m data frame
ggplot(df.m, aes(strain), ylim(-500:500)) + 
geom_bar(data = subset(df.m, variable == "count.up"), 
   aes(y = value, fill = condition), stat = "identity", position = "dodge") +
geom_bar(data = subset(df.m, variable == "count.down"), 
   aes(y = -value, fill = condition), stat = "identity", position = "dodge") + 
geom_hline(yintercept = 0,colour = "grey90")


last_plot() + 
   geom_text(data = subset(df.m, variable == "count.up"), 
      aes(strain, value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = 1.5, size=4) +
    geom_text(data = subset(df.m, variable == "count.down"), 
      aes(strain, -value, group=condition, label=value),
        position = position_dodge(width=0.9), vjust = -.5, size=4) +
    coord_cartesian(ylim = c(-500, 500))