[R language] - pheatmap package to draw clustering heatmap 1 (nanny-level tutorial)

  • I. Introduction

1 Introduction to heat maps

       Usually, a heat map uses color changes to visually represent the changes between different samples after normalizing the obtained data or other factors. Essentially, it is a data matrix composed of small squares with preset colors representing numerical values. By clustering factors or samples, the similarity between different sample data can be observed.

2 Heat map drawing method

Commonly used drawing software: origin , excel, Tbtools, GraphPadPrism

R package for drawing cluster heatmaps in R language: pheatmap , heatmap, corrplot, complexHeatmap

       Among them, pheatmap is the most widely used drawing package in R language for drawing cluster heat maps. Using this plotting package can help us quickly generate heatmaps containing clustering results.

2. Use pheatmap package to draw clustering heatmaps

1 Data preparation

Data input format (csv format):

2 R package loading and data import

#下载包#

install.packages("pheatmap")

install.packages("RColorBrewer")

#加载包#

library("pheatmap")

library("RColorBrewer")

#加载绘图数据#

data<-read.table(file='C:/Rdata/jc/pheatmap.csv',header=TRUE,row.names= 1,sep=',')

head(data) #查看数据

 

#data=log2(data[,1:6]+1) #对基因表达量数据处理

#data <- as.matrix(data) #转变为matrix格式矩阵

#head(data)

3 Heat map drawing

3.1 Basic heat map and standardization

pheatmap(data) #基础热图绘制

 

Figure 1 Heat map with unnormalized data

3.2 Perform normalized drawing

3.2.1 Preparation for heat map drawing—homogenization

       Drawing a heat map usually requires normalizing the data so that factors with large differences are in the same order of magnitude, making it easier to observe the changing patterns of different factors between different samples. Generally speaking, the distribution of a factor among different samples will be displayed in the row direction of the heat map, so in order to display the distribution of a factor among different samples, the normalization process will be performed according to "row".

pheatmap(data, scale='row') #标准化的方法,row是按照进行标准化(归一化),column是按照列进行标准化,none为不进行标准化

 

Figure 2 Normalized heat map

3.3 Heat map clustering method and clustering tree adjustment

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为“euclidean”,也可为  "correlation"即按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行、列聚类,根据行、列聚类数量分隔热图行

         treeheight_row = 30, treeheight_col = 30) #、行、列聚类树高度调整

 

 Figure 3 Heat map after setting the heat map clustering method and clustering tree adjustment

3.4 Clustering heat map cell format adjustment

3.4.1 Cell length, width and border color adjustment

Use "cellwidth", "cellheight", and "border_color" to set the width, height and color of the cell border of the heat map cell:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = NA, #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5)  #表示单个单元格的宽度\高度,默认为 “NA”

 

 Figure 4 Heat map after setting cell width, height and border color

3.4.2 Adjustment of numerical display and numerical font size in cells

Use "display_numbers", "fontsize_number", "number_format", and "number_color" to set the numerical display, numerical font size, numerical format and font color on the heat map cell:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = NA, #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = T, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30") #表示热图单元格上显示的数据字体颜色

Figure 5 Heat map after setting the numerical display and numerical font size in the cell

 

3.4.3 Heat map cell distinction markers

Use "display_numbers" to mark according to the value of the heat map cell. If the original value of the cell is greater than 1, it is "***", otherwise it is " ";

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = NA, #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = matrix(ifelse(data > 1, "***", ""), nrow = nrow(data)),#使用“display_numbers” 根据热图单元格的数值进行标记,若该单元格原始数值大于1,则为 “***”,否则为 " ";

         fontsize_number = 10, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30") #表示热图单元格上显示的数据字体颜色

Figure 6 The heat map after setting the heat map cell distinction markers

3.5 Heat map beautification and personalization

3.5.1 Heat map title and row and column labels

Use "show_rownames", "show_colnames", "main" to set row names, column name display and titles:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = T, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         show_rownames = F, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题") #表示热图的标题名字

 

 Figure 7 The heat map after setting the heat map row names, column name display and title

3.5.2 Heat map font size setting

Use "fontsize", "fontsize_row", and "fontsize_row" to set the font size, row name, and column name font size in the heat map. The default is consistent with fontsize:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = 6, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = T, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题") #表示热图的标题名字

 

 Figure 8 The heat map after setting the font size, row name, and column name font size in the heat map.

3.5.3 Heat map personalization

3.5.3.1 Heat map color setting legend

Use "color", "legend", "legend_breaks", etc. to set the heat map color, legend display, legend range, etc.:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题", #表示热图的标题名字

         color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

         legend = T, #表示是否显示图例,值为TRUE或FALSE

         legend_breaks = NA, #设置图例的范围legend_breaks=c(-2.5,0,2.5)表示图例断点的设置,默认为NA,

         legend_labels = NA) #表示图例断点的标签

 

 Figure 9 Heat map with heat map color and legend set

3.5.3.2 Label angle and partition position of heat map under unclustered condition

Use "angle_col" to set the angle of the column labels; "gaps_row" and "gaps_col" set the break position of the heat map in the row and column directions when row and column clustering is not performed:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = F, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题", #表示热图的标题名字

         color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

         legend = T, #表示是否显示图例,值为TRUE或FALSE

         legend_breaks = NA, #设置图例的范围legend_breaks=c(-2.5,0,2.5)表示图例断点的设置,默认为NA,

         legend_labels = NA, #表示图例断点的标签

         angle_col = "45", #表示列标签的角度

         gaps_row = NULL,  #仅在未进行行聚类时使用,表示在行方向上热图的隔断位置

         gaps_col = c(1,2,3,4,5,6))  #仅在未进行列聚类时使用,表示在列方向上热图的隔断位置

 

 Figure 10 Heat map after setting column label angles and column partitions

3.5.3.3 Customize row and column labels

Use "gaps_row" and "gaps_col" to customize row and column labels:

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题", #表示热图的标题名字

         color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

         legend = T, #表示是否显示图例,值为TRUE或FALSE

         legend_breaks = NA, #设置图例的范围legend_breaks=c(-2.5,0,2.5)表示图例断点的设置,默认为NA,

         legend_labels = NA, #表示图例断点的标签

         angle_col = "45", #表示列标签的角度

         gaps_row = NULL,  #仅在未进行行聚类时使用,表示在行方向上热图的隔断位置

         gaps_col = c(1,2,3,4,5,6),  #仅在未进行列聚类时使用,表示在列方向上热图的隔断位置

         labels_row = NULL, #表示使用行标签代替行名

         labels_col = c("sample1","sample2","sample3","sample4","sample5","sample6"))  #表示使用列标签代替列名

 

  Figure 11 Heat map of customized complete column labels

3.6 Heat map saving

Save the heatmap using "filename", "width" and "height":

pheatmap(data, scale = "row", #表示进行均一化的方向,值为 “row”, “column” 或者"none"

         clustering_distance_rows = "euclidean", clustering_distance_cols = "euclidean",#表示行、列聚类使用的度量方法,默认为欧式距离“euclidean”, "correlation"表示按照 Pearson correlation方法进行聚类

         clustering_method = "complete", #表示聚类方法,包括:‘ward’, ‘ward.D’, ‘ward.D2’, ‘single’, ‘complete’, ‘average’, ‘mcquitty’, ‘median’, ‘centroid’

         cluster_rows = T,cluster_cols = T, #cluster_rows表示仅对行聚类,cluster_cols表示仅对列聚类,值为TRUE或FALSE

         cutree_rows = NA, cutree_cols = NA, #若进行了行/列聚类,根据行/列聚类数量分隔热图行,cutree_rows=num分割行,cutree_cols=num分割列

         treeheight_row = 30, treeheight_col = 30, #若行、列聚类树高度调整

         border_color = "grey60", #表示热图每个小的单元格边框的颜色,默认为 "grey60"

         cellwidth = 60, cellheight = 7.5,  #表示单个单元格的宽度\高度,默认为 “NA”

         display_numbers = F, #表示是否在单元格上显示原始数值或按照特殊条件进行区分标记

         fontsize_number = 6, #表示热图上显示数字的字体大小

         number_format = "%.2f", #表示热图单元格上显示的数据格式,“%.2f” 表示两位小数,“%.1e”表示科学计数法

         number_color = "grey30", #表示热图单元格上显示的数据字体颜色

         fontsize =10, fontsize_row = 6, fontsize_col = 10, #热图中字体大小、行、列名字体大小

         show_rownames = T, show_colnames = T, #表示是否显示行名、列名

         main = "Gene标题", #表示热图的标题名字

         color = colorRampPalette(c("navy","white","firebrick3"))(100), #表示热图颜色,(100)表示100个等级

         legend = T, #表示是否显示图例,值为TRUE或FALSE

         legend_breaks = NA, #设置图例的范围legend_breaks=c(-2.5,0,2.5)表示图例断点的设置,默认为NA,

         legend_labels = NA, #表示图例断点的标签

         angle_col = "45", #表示列标签的角度

         gaps_row = NULL,  #仅在未进行行聚类时使用,表示在行方向上热图的隔断位置

         gaps_col = c(1,2,3,4,5,6),  #仅在未进行列聚类时使用,表示在列方向上热图的隔断位置

         labels_row = NULL, #表示使用行标签代替行名

         labels_col = c("sample1","sample2","sample3","sample4","sample5","sample6"),  #表示使用列标签代替列名

         filename = NA,  #表示保存图片的位置及命名

         width = NA, height = NA) #表示输出绘制热图的宽度/高度

Well, this sharing ends here. In the next issue, we will share the drawing of clustering heat map groups.

Scan the QR code and follow the official account to get more content, as well as corresponding code and demonstration data.

 

Guess you like

Origin blog.csdn.net/weixin_54004950/article/details/128177888