ClusterGVis clustering and visualization of gene expression time series

770129c630c1e2ae2025afc40055cef0.png

Didn't pay attention? point your finger here

2b3477eaeaad8fa53d1fd6c36f988ee0.png

8bfabfb600883d98efa9c6420dfa9a60.png

1 Introduction

A long time ago, I wrote the method of using Mfuzz and hclust to cluster and visualize genes. However, many people think that the Mfuzz output is not particularly beautiful, and there are few adjustment parameters provided. For related tweets, see:

In addition, we will also see the heat map and expression trend line graph of RNA-seq time series analysis and the pathway visualized together, for example:

df3f92e0366017b1451ef61bf3dbcbdd.png

So I sorted out the code, clustered and visualized the data based on the fuzzy c-means clustering algorithm of Mfuzz and the Kmeans clustering of row_km of ComplexHeatmap , so ClusterGVis was generated . This will greatly save the time for scientific researchers to write codes and use Ai to adjust and beautify pictures. Directly generate publication-quality graphics.最后致敬感谢这两个包作者的贡献!

github address:

https://github.com/junjunlab/ClusterGVis

If you find it helpful, you can leave your heart on github!

b19f9e5d64af49aa6094549d87ac2b15.png

2 installation

# install.packages("devtools")
devtools::install_github("junjunlab/ClusterGVis")

3 use

method selection

The input data is a standardized tpm/fpkm/rpkm expression matrix. There are two methods for clustering, namely the fuzzy c-means clustering algorithm of Mfuzz and the row_km Kmeans clustering algorithm of ComplexHeatmap . Users can choose according to their own preferences.

Notice:

Clustering is a random process . In order to ensure the repeatability of the results, a random seed is set inside the clusterData function. You can change the random seed to change the clustering result.

...
seed
set a seed for cluster analysis in mfuzz or Heatmap function, default 123.

getClusters select the number of clusters

The getClusters function calculates the mean square sum, and the user can determine the optimal number of clusters according to the inflection point . First, load the test data:

library(ClusterGVis)

# load data
data(exps)

# check
head(exps,3)
#           zygote  t2.cell  t4.cell  t8.cell   tmorula blastocyst
# Oog4   1.3132282 1.237078 1.325978 1.262073 0.6549312  0.2067114
# Psmd9  1.0917337 1.315989 1.174417 1.064756 0.8685598  0.4845448
# Sephs2 0.9859232 1.201026 1.123076 1.084673 0.8878931  0.7174088

drawing:

# check optimal cluster numbers
getClusters(exp = exps)
4d20ed00783b6529ad90f7663c212cef.png

You can also combine the heat map results to choose the best number for the number of specific clusters.

clusterData Clustering

For Mfuzz clustering, choose 8 cluster numbers:

# using mfuzz for clustering
# mfuzz
cm <- clusterData(exp = exps,
                  cluster.method = "mfuzz",
                  cluster.num = 8)

Kmeans clustering:

# using complexheatmap row_km for clustering
# kmeans
ck <- clusterData(exp = exps,
                  cluster.method = "kmeans",
                  cluster.num = 8)

The result returned by clustering is a list , which contains the long data format and wide data format of the clustering results . Of course, you can also use the data to draw yourself. For the result of Mfuzz clustering, there will be one more column of membership information.

2f62e6167a4ef852782c9af3136fa700.png
str(ck)
List of 3
 $ wide.res:'data.frame': 3767 obs. of  8 variables:
  ..$ zygote    : num [1:3767] -1.286 -1.205 -0.633 -0.788 -0.619 ...
  ..$ t2.cell   : num [1:3767] 0.594 0.523 -1.209 -1.197 -1.174 ...
  ..$ t4.cell   : num [1:3767] -0.239 -1.082 0.736 0.266 0.125 ...
  ..$ t8.cell   : num [1:3767] -1.011 -0.235 -0.792 -0.544 -0.6 ...
  ..$ tmorula   : num [1:3767] 1.091 1.154 0.66 1.038 0.752 ...
  ..$ blastocyst: num [1:3767] 0.851 0.846 1.238 1.226 1.516 ...
  ..$ gene      : chr [1:3767] "Cdc20" "Yrdc" "Cdca8" "Krt2" ...
  ..$ cluster   : chr [1:3767] "5" "5" "5" "5" ...
 $ long.res:'data.frame': 22602 obs. of  5 variables:
  ..$ cluster     : chr [1:22602] "5" "5" "5" "5" ...
  ..$ gene        : chr [1:22602] "Cdc20" "Yrdc" "Cdca8" "Krt2" ...
  ..$ cell_type   : Factor w/ 6 levels "zygote","t2.cell",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ norm_value  : num [1:22602] -1.286 -1.205 -0.633 -0.788 -0.619 ...
  ..$ cluster_name: chr [1:22602] "cluster 5 (3426)" "cluster 5 (3426)" "cluster 5 (3426)" "cluster 5 (3426)" ...
 $ type    : chr "kmeans"
str(cm)
List of 3
 $ wide.res:'data.frame': 3767 obs. of  9 variables:
  ..$ gene      : chr [1:3767] "-" "1110007A13Rik" "1110054O05Rik" "1500001M20Rik" ...
  ..$ zygote    : num [1:3767] -0.777 -0.711 -0.75 -0.581 -1.084 ...
  ..$ t2.cell   : num [1:3767] -0.656 -0.319 -0.679 -0.655 -0.716 ...
  ..$ t4.cell   : num [1:3767] -0.39 -0.446 -0.478 -0.502 -0.463 ...
  ..$ t8.cell   : num [1:3767] -0.3616 -0.6177 -0.2645 -0.2745 0.0123 ...
  ..$ tmorula   : num [1:3767] 0.2874 0.1466 0.2731 0.0359 0.5938 ...
  ..$ blastocyst: num [1:3767] 1.9 1.95 1.9 1.98 1.66 ...
  ..$ cluster   : num [1:3767] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ membership: num [1:3767] 0.974 0.565 0.992 0.842 0.495 ...
 $ long.res:'data.frame': 22602 obs. of  6 variables:
  ..$ cluster     : num [1:22602] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ gene        : chr [1:22602] "-" "1110007A13Rik" "1110054O05Rik" "1500001M20Rik" ...
  ..$ membership  : num [1:22602] 0.974 0.565 0.992 0.842 0.495 ...
  ..$ cell_type   : Factor w/ 6 levels "zygote","t2.cell",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ norm_value  : num [1:22602] -0.777 -0.711 -0.75 -0.581 -1.084 ...
  ..$ cluster_name: chr [1:22602] "cluster 1 (5754)" "cluster 1 (5754)" "cluster 1 (5754)" "cluster 1 (5754)" ...
 $ type    : chr "mfuzz"

visCluster plotting

The visCluster function receives the results from clusterData and supports generating three drawing results, including line chart , heat map , heat map + polyline (GO path) .

Draw a line chart:

# plot line only
visCluster(object = cm,
           plot.type = "line")
c98d76ed0c1a1968783986ea9fa4075c.png

Modify color:

# change color
visCluster(object = cm,
           plot.type = "line",
           ms.col = c("green","orange","red"))
e0d8a0b18335c754543dcdd423203e53.png

Remove median line:

# remove meadian line
visCluster(object = cm,
           plot.type = "line",
           ms.col = c("green","orange","red"),
           add.mline = FALSE)
9a6ac8d3a7e48b0a8047ba87e43b8355.png

The line graph of kmeans results, because there is no membership information, so there is no color mapping :

# plot line only with kmeans method
visCluster(object = ck,
           plot.type = "line")
0c8199232326490d85c79d0e8ccde5b7.png

A line chart is essentially a ggplot2 object, and you can add other related parameters to modify the details.


Plot a heatmap:

# plot heatmap only
visCluster(object = ck,
           plot.type = "heatmap")
a4e1fc75b46edd31615a995c4cc76a32.png

Add other Heatmap related parameters:

# supply other aruguments passed by Heatmap function
visCluster(object = ck,
           plot.type = "heatmap",
           column_names_rot = 45)
f4ae6bac1dd3252e44d8a94fa7d9fc30.png

Modify the comment bar color:

# change anno bar color
visCluster(object = ck,
           plot.type = "heatmap",
           column_names_rot = 45,
           ctAnno.col = ggsci::pal_npg()(8))
82506fc40f056c9578a768fb8325478c.png

Combination of heat map and line chart, note that the graphic annotations displayed in the window are not well aligned, so just save it as a pdf :

# add line annotation
pdf('testHT.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45)
dev.off()
fbfe4d8f4d42916694c41a0291a96c91.png

Of course you can also add boxplots :

# add boxplot
pdf('testbx.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T)
dev.off()
b71a11900371b39dc3beff943ddfdf79.png

Remove the polyline and modify the color of the boxplot:

# remove line and change box fill color
pdf('testbxcol.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T,
           add.line = F,
           boxcol = ggsci::pal_npg()(8))
dev.off()
c28330371ecac9435a2bbe4ad964d7d9.png

Add point :

# add point
pdf('testbxcolP.pdf',height = 10,width = 6)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           add.box = T,
           add.line = F,
           boxcol = ggsci::pal_npg()(8),
           add.point = T)
dev.off()
92fb6b635ba12ac63c2fb8728d687c88.png

Finally, you can also add annotations for enriched pathways:

# load term info
data("termanno")

# check
head(termanno,4)
#   id                               term
# 1 C1              developmental process
# 2 C1   anatomical structure development
# 3 C1 multicellular organism development
# 4 C2                 system development

# anno with GO terms
pdf('testHTterm.pdf',height = 10,width = 10)
visCluster(object = ck,
           plot.type = "both",
           column_names_rot = 45,
           annoTerm.data = termanno)
dev.off()
b1f0ba8637666431f482823d2bf8bfac.png

Note that the data of the enriched pathway must be the subgroup id and the pathway name, and the order cannot be reversed . In addition, the subgroup id and the name of the clustering result (C1...) should be consistent.

You can see that the annotation connection of pathway enrichment does not seem to be appropriate to the right side of the line chart , because it matches the annotation bar on the right side of the heatmap , and you can adjust the position of the line annotation at this time line.side = "left":

# change the line annotation side
pdf('testHTtermCmls.pdf',height = 10,width = 10)
visCluster(object = cm,
           plot.type = "both",
           column_names_rot = 45,
           annoTerm.data = termanno,
           line.side = "left")
dev.off()
1ad7a3dbbd26931caa77ec382558908a.png

Of course, you can also remove the clustering tree on the left to make it more concise:

# remove tree
pdf('testHTtermCmlsrt.pdf',height = 10,width = 10)
visCluster(object = cm,
           plot.type = "both",
           column_names_rot = 45,
           annoTerm.data = termanno,
           line.side = "left",
           show_row_dend = F)
dev.off()
2af7b6cc9af2b097306fd88d6a6a4768.png

4 end

If you have any questions and ideas, welcome to exchange and discuss on github!

a26e9904065e37a11d16c92ad5a27673.png

Welcome to the student letter exchange group . Add me on WeChat and I will also pull you into  the WeChat group chat  老俊俊生信交流群 (WeChat exchange group requires a fee of 20 yuan to join the group, once the fee is paid, it will not be refunded! (防止骗子和便于管理)) .

Old Junjun WeChat:

983f47997f69bfb67d75c68a13015b3c.jpeg

Knowledge Planet:

f06c14124c315907b6fb1f5d463a2ac8.jpeg

So have you studied today?

That's all for today's sharing, so stay tuned for the next one!

Finally, everyone is welcome to share and forward , your likes are my encouragement and affirmation !

If you feel that it is very helpful to you, enjoy a glass of happy water and drink it!

 Past review 

AverageHeatmap clusters single-cell marker genes

I heard you want to add subgroup number tags?

dtplyr Use dplyr to run data.table

Gene-rank map mark the gene name

GseaVis adds heatmaps for your pathway genes

cellRatioPlot shallowly draw a cell ratio

GseaVis multiple pathway visualization

trackVis modify sample order and grouping order

GSEA result volcano map visualization

Benchmarks about vroom

◀...

Guess you like

Origin blog.csdn.net/weixin_45822007/article/details/128566997