Concentric heatmaps to compare gene network features

By Nicolas Gambardella

A frequent outcome of network inference based on gene expression data is the discovery of “hubs”, that possibly represent master regulators of our system of interest. Analyzing and comparing those hubs is often at the core of new biological insights. A problem of visualizing network “hubs” is the sheer number of neighbours, that make identification of interesting nodes difficult, and might mask the overall message. The overlay of several features on a nodes, using for instance several colouring or size contributes to the general confusion. Here, we propose to go back from the hub to the wheel, representing each neighbour as a tile in a 360 degree heatmap. In addition, we will used several concentric heatmaps to enable a quick integration of different features. Insight will then come from the comparison of several such wheels.

Of course, tools already exist to represent complex datasets in exquisite circular representations, such as Circos or the R package circlize. But here we will only use ggplot2 and a bit of magic.

Here is the final figure, showing two transcription factors with their targets, characterised by their expression in cell types, brain regions and the strength of interaction with the hub TFs:

First, we need some data. Using a transcriptomics dataset, we inferred a gene regulatory network (this part of the work is beyond this blog post). The dataset was composed of two cells types coming from two different regions of the brain. For each gene, we computed the log fold difference between regions and between cell types. Our initial edges data table looks like:

TFx are the hub transcription factors that we identified as of interest. neighbor list the top interactors for each hub, weight is a dimensionless factor that represents the significativity of the edge. The higher the more probabl an actual inference exists. The table is ordered by decreasing edge weights.

We will need a few packages.

library(reshape)
library(ggplot2)
library(ggnewscale)   ## The magic

Then we will generate a wheel for a given transcriptions factor (we can, of course, generate a whole bunch of wheels in one go). We extract the information for all its neighbors, and we recast the table using the melt function of the package reshape. We then add an index (var2 below) that will decide where each data will be positioned on the concentric rings..

tf3 <- subset(edges,edges$hub == "TF3")[,-1]
tf3.m <- melt(tf3)
tf3.m$var2<-NA
tf3.m[tf3.m$variable == "weight",]$var2 <- 6
tf3.m[tf3.m$variable == "CellType",]$var2 <- 7
tf3.m[tf3.m$variable == "Region",]$var2 <- 8

Here is what the new table looks like:

The index starts at 6 so that we have an empty space in the centre corresponding to 5 rings. Now, let’s see the beauty of ggplot2 layered approach in action. We will start with what will be our external ring, the relative expression in different regions.

ptf3 <- ggplot( ) +
  geom_tile(data=tf3.m[tf3.m$variable == "Region",],
            aes(x=as.numeric(rownames(tf3.labs)),y=var2, fill=value),
            colour="white")
plot(ptf3)

The default colours are not so nice. Since we want to emphasize the extreme differences of expression, a divergent palette is better suited. Moreover, as we mention above, we want to compare this hub with others. Therefore, the colours must be scaled according to the values across the whole dataset (the initial table edges).

 ptf3 <- ptf3 +
  scale_fill_gradient2(midpoint=0, mid="white",
                       low=rgb(204/255,102/255,119/255),
                       high=rgb(17/255p,119/255,51/255),
                       limits = c(min(edges$Region),max(edges$Region)), name="Region" )
plot(ptf3)

We said above we wanted space in the middle. We also want space outside for further labelling.

ptf3 <- ptf3 +
  ylim(c(0, max(tf3.m$var2) + 1.5)) 
plot(ptf3)

Did we mention we wanted a circular plot?

ptf3 <- ptf3 +
  coord_polar(theta="x")
plot(ptf3) 

Now, let’s get rid of the useless graphical features that only serve to dilute the main message.

ptf3 <- ptf3 +
  theme(panel.background = element_blank(), # bg of the panel
        panel.grid.major = element_blank(), # get rid of major grid
        panel.grid.minor = element_blank(), # get rid of minor grid
        axis.title=element_blank(),
        panel.grid=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks=element_blank(),
        axis.text.y=element_text(size=0))
plot(ptf3) 

Finally, we can plot the gene names. In order to optimize the readability and use of space, we will plot the names in a radial fashion, but try to make them upright as much as possible. To do so, we need first to compute an angle that depends on the position in the ring. And then, we will compute the required “horizontal” justification, which, when use in conjunction with polar coordinates, produces something quite non-intuitive.

tf3.labs <- subset(tf3.m, variable == "weight")
tf3.labs$ang <- seq(from=(360/nrow(tf3.labs))/1.5, 
                    to=(1.5* (360/nrow(tf3.labs)))-360, 
                    length.out=nrow(tf3.labs))+80

tf3.labs$hjust <- 0
tf3.labs$hjust[which(tf3.labs$ang < -90)] <- 1
tf3.labs$ang[which(tf3.labs$ang < -90)] <- (180+tf3.labs$ang)[which(tf3.labs$ang < -90)]

Now we can add the labels to the plot. Now the use of extra space around the ring becomes clear.

ptf3 <- ptf3 +
  geom_text(data=tf3.labs, 
            aes(x=as.numeric(rownames(tf3.labs)), 
                y=var2+2.5, 
                label=neighbor, angle=ang, hjust=hjust), 
            size=2.5)
plot(ptf3) 

All good. Now that we have plotted the relative expression in different regions, let’s plot the relative expression in different cell types. The first intuitive idea would be to just add a new heatmap inside the previous one. However, since we are talking about a different feature, we want a different colour scale.

ptf3 <- ptf3 + 
  geom_tile(data=tf3.m[tf3.m$variable == "CellType",],
            aes(x=as.numeric(rownames(tf3.labs)),y=var2, fill=value),
            colour="white") +
  scale_fill_gradient2(midpoint=0, mid="white",
                       low=rgb(221/255,204/255,119/255),
                       high=rgb(68/255,119/255,170/255),
                       limits = c(min(edges$CellType),max(edges$CellType)), name="CellType" )
plot(ptf3)

Arrgh! First we get an error “Scale for ‘fill’ is already present. Adding another scale for ‘fill’, which will replace the existing scale.” And indeed, the CellType colour scale replaced the Region one for the outer ring. This is because we can only use a single colour scale within a given ggplot2 plot. But not all is lost, thanks to the package ggnewscale which allows us to redefine the colour scale.

NB: Packages like ComplexHeatmap allow to use different colorRamps for different heatmaps. However, they do not allow the use of polar coordinates.

So, here is what we are going to do. We will redefine the scale twice, in order to plot three features on three different concentric heatmaps.

ptf3<-ggplot( ) +
  geom_tile(data=tf3.m[tf3.m$variable == "Region",],
            aes(x=as.numeric(rownames(tf3.labs)),y=var2, fill=value),
            colour="white") +
  scale_fill_gradient2(midpoint=0, mid="white",
                       low=rgb(204/255,102/255,119/255),
                       high=rgb(17/255,119/255,51/255),
                       limits = c(min(edges$Region),max(edges$Region)), name="Region" )+
  new_scale("fill") +   #### MAGIC
  geom_tile(data=tf3.m[tf3.m$variable == "CellType",],
            aes(x=as.numeric(rownames(tf3.labs)),y=var2, fill=value),
            colour="white") +
  scale_fill_gradient2(midpoint=0, mid="white",
                       low=rgb(221/255,204/255,119/255),
                       high=rgb(68/255,119/255,170/255),
                       limits = c(min(edges$CellType),max(edges$CellType)), name="CellType" )+
  new_scale("fill") +   #### MAGIC
  geom_tile(data=tf3.m[tf3.m$variable == "weight",],
            aes(x=as.numeric(rownames(tf3.labs)),y=var2, fill=value),
            colour="white") +
  scale_fill_gradient2(midpoint=0, mid="white",
                       low=rgb(250/255,250/255,250/255),
                       high=rgb(0/255,0/255,0/255),
                       limits = c(min(edges$weight),max(edges$weight)), name="Weight" )+
  ylim(c(0, max(tf3.m$var2) + 1.5)) +
  coord_polar(theta="x") +
  theme(panel.background = element_blank(), # bg of the panel
        panel.grid.major = element_blank(), # get rid of major grid
        panel.grid.minor = element_blank(), # get rid of minor grid
        axis.title=element_blank(),
        panel.grid=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks=element_blank(),
        axis.text.y=element_text(size=0))+
  geom_text(data=tf3.labs, 
            aes(x=as.numeric(rownames(tf3.labs)), 
                y=var2+2.5, 
                label=neighbor, angle=ang, hjust=hjust), 
            size=2.5)
plot(ptf3)

Now we can compare the differential expression of our hub’s neighbours between cell types and between regions. We can do that for several hubs, and compare them, which brings us to our complete figure below. We can see that while the expression of TF3’s neighbours tend to differ strongly between cell types (vivid blues and yellows), they do not differ much between regions (pale greens and magentas). We observe the opposite pattern for TF7, suggesting that TF3 could be regulating spatial-independent cell identity and TF7 could be regulating spatial-dependent cell features.

We are grateful to the following sources that benefited much to this blog post:

  • https://stackoverflow.com/questions/13887365/circular-heatmap-that-looks-like-a-donut
  • https://eliocamp.github.io/codigo-r/2018/09/multiple-color-fill-scales-ggplot2/

Leave a Reply