使用频率 table 创建堆积条
Create a stacked bar using a frequency table
实际上我正在使用两个具有名称的频率表:identified_modification_table
和 unidentified_modifications_table
这些文件的结构是这样的:
identified_modification_table
Modifications | Frequency
MOD:42123 | 12
MOD:1234 | 7
MOD:7618 | 36
MOD:411232 | 51
unidentified_modifications_table
Modifications | Frequency
MOD:42123 | 12
MOD:12 | 20
MOD:7618 | 36
MOD:411232 | 51
我想合并这些文件并创建此输出,以便创建像本示例一样的堆叠条形图。
Modifications | Frequency.1 | Frequency.2
MOD:42123 | 12 | 12
MOD:1234 | 7 | NA
MOD:12 | NA | 20
MOD:7618 | 36 | 36
MOD:411232 | 51 | 51
我试图使用此代码合并表并在值不存在的地方添加 NA。
df_final <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);
但这不能正常工作,我不知道为什么。
在此之后我想我应该只使用 melt 和 ggplot2 堆积条:
df_barplot <- melt(df,measure.vars = names(df))
ggplot((df_barplot), aes(x = value, fill = variable)) +
geom_bar(stat = "count", position = "dodge") +
theme(axis.text.x = element_text(angle = 20, hjust = 0.5, vjust = -0.1)) +
guides(fill=FALSE)+
labs("Barplot") +
xlab("Values")+
ylab("Frequency")+
theme(text = element_text(size=18), axis.text.x = element_text(angle = 90, hjust = 1, size = 15), axis.text.y=element_text(size = 15))
有谁知道我该怎么做?
这是一个可重现的例子:
df1 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:44","MOD:123", "MOD:123", "MOD:212"), Frequency=c(1,41,616,727,828,8993,383))
df2 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:445","MOD:12", "MOD:123", "MOD:212"), Frequency=c(1,43,64,77,88,893,38))
谢谢
这是 tidyverse 的方法:
library(tidyverse)
merged_df <- full_join(df1, df2, by = "modifications")
merged_df <- gather(merged_df, key = Category, value = Frequency, -modifications)
图表:
ggplot(merged_df, aes(x = modifications, y = Frequency, fill = Category)) +
geom_col(position = "dodge")
我想这就是你想要的
df3<-merge(df1,df2, by = "modifications",all = T)
library(reshape2)
df3<- melt(df3)
df3$variable<-factor(df3$variable,labels = c("modifications1","modifications2"))
library(ggplot2)
ggplot(df3, aes(x = modifications, y = value, fill = variable)) +
geom_bar(stat = "identity",position = "dodge")
编辑:添加了 all = T 以保留出现在 table
中的所有频率
实际上我正在使用两个具有名称的频率表:identified_modification_table
和 unidentified_modifications_table
这些文件的结构是这样的:
identified_modification_table
Modifications | Frequency
MOD:42123 | 12
MOD:1234 | 7
MOD:7618 | 36
MOD:411232 | 51
unidentified_modifications_table
Modifications | Frequency
MOD:42123 | 12
MOD:12 | 20
MOD:7618 | 36
MOD:411232 | 51
我想合并这些文件并创建此输出,以便创建像本示例一样的堆叠条形图。
Modifications | Frequency.1 | Frequency.2
MOD:42123 | 12 | 12
MOD:1234 | 7 | NA
MOD:12 | NA | 20
MOD:7618 | 36 | 36
MOD:411232 | 51 | 51
我试图使用此代码合并表并在值不存在的地方添加 NA。
df_final <- cbind.data.frame(df1, df2[match(df1$modifications, df2$modifications), ]);
但这不能正常工作,我不知道为什么。
在此之后我想我应该只使用 melt 和 ggplot2 堆积条:
df_barplot <- melt(df,measure.vars = names(df))
ggplot((df_barplot), aes(x = value, fill = variable)) +
geom_bar(stat = "count", position = "dodge") +
theme(axis.text.x = element_text(angle = 20, hjust = 0.5, vjust = -0.1)) +
guides(fill=FALSE)+
labs("Barplot") +
xlab("Values")+
ylab("Frequency")+
theme(text = element_text(size=18), axis.text.x = element_text(angle = 90, hjust = 1, size = 15), axis.text.y=element_text(size = 15))
有谁知道我该怎么做?
这是一个可重现的例子:
df1 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:44","MOD:123", "MOD:123", "MOD:212"), Frequency=c(1,41,616,727,828,8993,383))
df2 <- data.frame(modifications=c("MOD:214", "MOD:3","MOD:24","MOD:445","MOD:12", "MOD:123", "MOD:212"), Frequency=c(1,43,64,77,88,893,38))
谢谢
这是 tidyverse 的方法:
library(tidyverse)
merged_df <- full_join(df1, df2, by = "modifications")
merged_df <- gather(merged_df, key = Category, value = Frequency, -modifications)
图表:
ggplot(merged_df, aes(x = modifications, y = Frequency, fill = Category)) +
geom_col(position = "dodge")
我想这就是你想要的
df3<-merge(df1,df2, by = "modifications",all = T)
library(reshape2)
df3<- melt(df3)
df3$variable<-factor(df3$variable,labels = c("modifications1","modifications2"))
library(ggplot2)
ggplot(df3, aes(x = modifications, y = value, fill = variable)) +
geom_bar(stat = "identity",position = "dodge")
编辑:添加了 all = T 以保留出现在 table
中的所有频率