R ggplot2 按百分比堆叠条形图与几个分类变量
R ggplot2 stacked barplot by percentage with several categorical variables
这是一个简单的问题,但我很难理解 ggplot2 要求的格式:
我在 R 中有以下 data.table
,
print(dt)
ID category A B C totalABC
1: 10 group1 1 3 0 4
2: 11 group1 1 11 1 13
3: 12 group2 15 20 2 37
4: 13 group2 6 12 2 20
5: 14 group2 17 83 6 106
...
我的目标是创建一个比例堆积条形图,如本例所示:https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs
其中 X/totalABC 的百分比,其中 X 是 category_type
A、B 或 C。我还想按类别执行此操作,例如x 轴值应为 group1
、group2
等
举个具体的例子,在group1
的情况下,总共有4+13=17个元素。
百分比为 percent_A = 11.7%, percent_B = 82.3%, percent_C = 5.9%
正确的 ggplot2 解决方案似乎是:
library(ggplot2)
pp = ggplot(dt, aes(x=category, y=percentage, fill=category_type)) +
geom_bar(position="dodge", stat="identity")
我的困惑:如何创建一个对应于三个分类值的 percentage
列?
如果以上内容不正确,我将如何格式化我的 data.table
以创建堆叠条形图?
这是一个解决方案:
require(data.table)
require(ggplot2)
require(dplyr)
melt(dt,measure.vars = c("A","B","C"),
variable.name = "groups",value.name = "nobs") %>%
ggplot(aes(x=category,y=nobs,fill=groups)) +
geom_bar(stat = "identity",position="fill")
您可以使用以下代码:
melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
dt[,-1] %>% #get rid of ID
group_by(category) %>% #group by category
summarise_each(funs(sum))), #get the summation for each variable
id.vars=c("category", "totalABC")) %>%
ggplot(aes(x=category,y=value/totalABC,fill=variable))+ #define the x and y
geom_bar(stat = "identity",position="fill") + #make the stacked bars
scale_y_continuous(labels = scales::percent) #change y axis to % format
这将绘制:
数据:
dt <- structure(list(ID = 10:14, category = structure(c(1L, 1L, 2L,
2L, 2L), .Label = c("group1", "group2"), class = "factor"), A = c(1L,
1L, 15L, 6L, 17L), B = c(3L, 11L, 20L, 12L, 83L), C = c(0L, 1L,
2L, 2L, 6L), totalABC = c(4L, 13L, 37L, 20L, 106L)), .Names = c("ID",
"category", "A", "B", "C", "totalABC"), row.names = c(NA, -5L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000100788>)
如果您想坚持使用用于绘图的代码怎么办?
在那种情况下,您可以使用它来获取百分比:
df <- melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
dt[,-1] %>% #get rid of ID
group_by(category) %>% #group by category
summarise_each(funs(sum))), #get the summation for each variable
id.vars=c("category", "totalABC")) %>%
mutate(percentage = dtf$value*100/dtf$totalABC)
但需要修改您的 ggplot
以正确获取堆叠条:
#variable is the column carrying category_type
#position dodge make the bars to be plotted next to each other
#while fill makes the stacked bars
ggplot(df, aes(x=category, y=percentage, fill=variable)) +
geom_bar(position="fill", stat="identity")
这是一个简单的问题,但我很难理解 ggplot2 要求的格式:
我在 R 中有以下 data.table
,
print(dt)
ID category A B C totalABC
1: 10 group1 1 3 0 4
2: 11 group1 1 11 1 13
3: 12 group2 15 20 2 37
4: 13 group2 6 12 2 20
5: 14 group2 17 83 6 106
...
我的目标是创建一个比例堆积条形图,如本例所示:https://rpubs.com/escott8908/RGC_Ch3_Gar_Graphs
其中 X/totalABC 的百分比,其中 X 是 category_type
A、B 或 C。我还想按类别执行此操作,例如x 轴值应为 group1
、group2
等
举个具体的例子,在group1
的情况下,总共有4+13=17个元素。
百分比为 percent_A = 11.7%, percent_B = 82.3%, percent_C = 5.9%
正确的 ggplot2 解决方案似乎是:
library(ggplot2)
pp = ggplot(dt, aes(x=category, y=percentage, fill=category_type)) +
geom_bar(position="dodge", stat="identity")
我的困惑:如何创建一个对应于三个分类值的 percentage
列?
如果以上内容不正确,我将如何格式化我的 data.table
以创建堆叠条形图?
这是一个解决方案:
require(data.table)
require(ggplot2)
require(dplyr)
melt(dt,measure.vars = c("A","B","C"),
variable.name = "groups",value.name = "nobs") %>%
ggplot(aes(x=category,y=nobs,fill=groups)) +
geom_bar(stat = "identity",position="fill")
您可以使用以下代码:
melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
dt[,-1] %>% #get rid of ID
group_by(category) %>% #group by category
summarise_each(funs(sum))), #get the summation for each variable
id.vars=c("category", "totalABC")) %>%
ggplot(aes(x=category,y=value/totalABC,fill=variable))+ #define the x and y
geom_bar(stat = "identity",position="fill") + #make the stacked bars
scale_y_continuous(labels = scales::percent) #change y axis to % format
这将绘制:
数据:
dt <- structure(list(ID = 10:14, category = structure(c(1L, 1L, 2L,
2L, 2L), .Label = c("group1", "group2"), class = "factor"), A = c(1L,
1L, 15L, 6L, 17L), B = c(3L, 11L, 20L, 12L, 83L), C = c(0L, 1L,
2L, 2L, 6L), totalABC = c(4L, 13L, 37L, 20L, 106L)), .Names = c("ID",
"category", "A", "B", "C", "totalABC"), row.names = c(NA, -5L
), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000000100788>)
如果您想坚持使用用于绘图的代码怎么办?
在那种情况下,您可以使用它来获取百分比:
df <- melt(data.frame( #melt to get each variable (i.e. A, B, C) in a single row
dt[,-1] %>% #get rid of ID
group_by(category) %>% #group by category
summarise_each(funs(sum))), #get the summation for each variable
id.vars=c("category", "totalABC")) %>%
mutate(percentage = dtf$value*100/dtf$totalABC)
但需要修改您的 ggplot
以正确获取堆叠条:
#variable is the column carrying category_type
#position dodge make the bars to be plotted next to each other
#while fill makes the stacked bars
ggplot(df, aes(x=category, y=percentage, fill=variable)) +
geom_bar(position="fill", stat="identity")