使用 pheatmap 按行注释对数据进行排序?
Using pheatmap to sort data by row annotations?
我正在尝试创建一个包含测试数据列和个体研究参与者行的热图。参与者可以分为三个不同的组。我想用三个组来注释图,然后对每个组内的数据进行聚类以了解它们之间的差异。
我是创建热图的新手,我无法使用行注释。我也不确定如何在注释正常工作后仅在每个组内进行聚类。我在想包“pheatmap.type”会起作用,但不幸的是,它不适用于 R 版本 4.0.2。
我无法 post 确切的数据(机密),但我附上了示例文件,我将描述我到目前为止所做的事情和 post 代码。我有一个数据框,第一列作为标签,包括参与者 ID 和组(使用 row.names=1 完成),然后是 12 列数字数据(无 NA)。然后我按行名对数据进行排序,并使用缩放函数缩放数据并生成矩阵。然后,我尝试通过以多种不同方式将组信息添加到数据框来创建注释行。到目前为止我尝试过的如下:
#dataframe with Group and ID as row names and 12 numerical columns
df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)
#ordering the dataframe so that the groups are in order
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]
#Z-scoring (scaling) data
df_HM_matrix_1 <- scale(df_1_HM)
#creating a color palette
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)
#Plotting heatmap
install.packages("gplots")
library(gplots)
#trying to plot the heatmap with annotation_row data
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row)
annotation_row = data.frame(
df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)
rownames(annotation_row) = paste("df_Group", 1:28, sep = "")
rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching
#I also tried to use a dataframe with just the groups as column 1 to get row annotation
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=df_Group)
df_Group <- data.frame(df_1$Group, df_1$ID)
#Also tried using the select function to create a dataframe for the row annotation
df_Group_1 <- select(df_1, Group)
#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric
任何对此的帮助都会很棒!!
示例数据如下:
structure(list(Group_ID = structure(1:28, .Label = c("Group1_10",
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26",
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1",
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23",
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11",
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4",
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36,
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13,
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49,
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67,
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26,
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49,
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92,
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44,
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56,
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93,
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59,
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12,
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58,
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42,
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31),
Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
)), class = "data.frame", row.names = c(NA, -28L))
要使注释与 pheatmap
一起使用,因子必须 有序 。为此,将 ordered = TRUE
添加到 factor()
:
annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))
您也可以使用 as.ordered()
来完成同样的事情。
要按注释组对热图行进行排序,只需将参数 cluster_rows = F
添加到 pheatmap()
:
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row,
cluster_rows = F)
这是现在的样子:
我正在尝试创建一个包含测试数据列和个体研究参与者行的热图。参与者可以分为三个不同的组。我想用三个组来注释图,然后对每个组内的数据进行聚类以了解它们之间的差异。
我是创建热图的新手,我无法使用行注释。我也不确定如何在注释正常工作后仅在每个组内进行聚类。我在想包“pheatmap.type”会起作用,但不幸的是,它不适用于 R 版本 4.0.2。
我无法 post 确切的数据(机密),但我附上了示例文件,我将描述我到目前为止所做的事情和 post 代码。我有一个数据框,第一列作为标签,包括参与者 ID 和组(使用 row.names=1 完成),然后是 12 列数字数据(无 NA)。然后我按行名对数据进行排序,并使用缩放函数缩放数据并生成矩阵。然后,我尝试通过以多种不同方式将组信息添加到数据框来创建注释行。到目前为止我尝试过的如下:
#dataframe with Group and ID as row names and 12 numerical columns
df_1_HM <- data.frame(df_1$Group_ID, df_1$Test1, df_1$Test2, df_1$Test3, df_1$Test4, df_1$Test5, df_1$Test6, df_1$Test7, df_1$Test8, df_1$Test9, df_1$Test10, df_1$Test11, df_1$Test12, row.names=1)
#ordering the dataframe so that the groups are in order
df_1_HM_ordered <- df_1_HM[ order(row.names(df_1_HM)), ]
#Z-scoring (scaling) data
df_HM_matrix_1 <- scale(df_1_HM)
#creating a color palette
my_palette <- colorRampPalette(c("white", "grey", "black"))(n = 100)
#Plotting heatmap
install.packages("gplots")
library(gplots)
#trying to plot the heatmap with annotation_row data
#The method below does not work for me. The plot will run with no errors but does not actually plot - it ends up becoming a list of 4 with no data.
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row)
annotation_row = data.frame(
df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)))
)
rownames(annotation_row) = paste("df_Group", 1:28, sep = "")
rownames(annotation_row) = rownames(df_HM_matrix_1) # name matching
#I also tried to use a dataframe with just the groups as column 1 to get row annotation
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=df_Group)
df_Group <- data.frame(df_1$Group, df_1$ID)
#Also tried using the select function to create a dataframe for the row annotation
df_Group_1 <- select(df_1, Group)
#When I use either of the data frame methods above I get the following error: Error in cut.default(a, breaks = 100) : 'x' must be numeric
任何对此的帮助都会很棒!!
示例数据如下:
structure(list(Group_ID = structure(1:28, .Label = c("Group1_10",
"Group1_13", "Group1_15", "Group1_2", "Group1_20", "Group1_26",
"Group1_27", "Group1_3", "Group1_6", "Group1_8", "Group2_1",
"Group2_12", "Group2_14", "Group2_16", "Group2_21", "Group2_23",
"Group2_25", "Group2_28", "Group2_7", "Group2_9", "Group3_11",
"Group3_17", "Group3_18", "Group3_19", "Group3_24", "Group3_4",
"Group3_5", "Group3_6"), class = "factor"), Test1 = c(1.44, 4.36,
0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13,
0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49,
1.43, 2.58, 2.49, 2.64), Test2 = c(1.44, 4.36, 0.75, 0.59, 1.67,
0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26,
1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49,
2.64), Test3 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92,
2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 2.64), Test4 = c(1.44,
4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56,
2.13, 0.86, 0.12, 0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93,
1.49, 1.43, 2.58, 2.49, 0.31), Test5 = c(1.44, 4.36, 0.75, 0.59,
1.67, 0.41, 2.42, 0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12,
0.26, 1.47, 2.64, 3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58,
2.49, 0.31), Test6 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42,
0.57, 0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 0.31),
Test7 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test8 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test9 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 1.49
), Test10 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test11 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
), Test12 = c(1.44, 4.36, 0.75, 0.59, 1.67, 0.41, 2.42, 0.57,
0.89, 0.45, 0.31, 1.56, 2.13, 0.86, 0.12, 0.26, 1.47, 2.64,
3.92, 2.19, 0.43, 0.98, 1.93, 1.49, 1.43, 2.58, 2.49, 3.92
)), class = "data.frame", row.names = c(NA, -28L))
要使注释与 pheatmap
一起使用,因子必须 有序 。为此,将 ordered = TRUE
添加到 factor()
:
annotation_row = data.frame(df_Group = factor(rep(c("Group 1", "Group 2", "Group 3"), c(11, 10, 7)), ordered = TRUE))
您也可以使用 as.ordered()
来完成同样的事情。
要按注释组对热图行进行排序,只需将参数 cluster_rows = F
添加到 pheatmap()
:
pheatmap(df_HM_matrix_1,
scale="none",
color=my_palette,
fontsize=14,
annotation_row=annotation_row,
cluster_rows = F)
这是现在的样子: