Understanding Network diffusion , GCN and GAT, how do they relate .
Graph convolution, Graph Attention has been a trending topic lately and it’s likely that you’ve come across it in link prediction , Molecular property prediction etc . However, network propagation, despite being less familiar, is a powerful technique in computational biology that enables learning on networks. This article will provide an in-depth exploration of the concepts and underlying principles of network propagation and how it relates to GCN and GAT. Furthermore, we’ll uncover how network propagation is, in fact, a specific form of graph convolution. Two different views of network propagation: random walk vs. diffusion. My PhD research was on Network Based algorithms like random walks , information diffusion these methods intrigued me to use them in drug target prediction. These methods were very popular in computational biology for link prediction, disease gene prioritization , specially Guilt by Association (one of the core algorithms in network propagation).
Considering a protein — protein interaction network the principle of Guilt By Association (GBA) proposes that proteins which have close relationships with one another, whether through physical interaction or similarity measures like gene co-expression, are likely to be involved in similar biological processes or pathways. GBA is a powerful principle that has been used extensively in the analysis of biological networks to identify novel genes or proteins that may be involved in specific biological processes or diseases.
Network information Propagation
Given an (undirected) graph G=(V, E, w), with vertex set V of n vertices, edge set E, and weight function w, let A be the corresponding n-by-n dimensional adjacency matrix as follows :
Using the diagonal degree matrix D, whose diagonal entries are the degree of the corresponding nodes, we can either normalize A row-wise or column-wise, resulting in two new matrices, P and W, respectively.
Finally, let p0 be the one-hot encoded label vector where the entries of p0 corresponding to nodes with positive labels are ones, and zero everywhere else. We can perform network propagation by random walks on the network. In this case, the key question we are asking is the following.
What is the probability of starting from the target node and ended up in any one of the nodes with a positive label, via 1-hop propagation ?
Random walk with restart is a multi-hop propagation technique, where the propagation is not limited to a single hop. In this technique, random walks are simulated on a network, starting from a seed node and traversing through the network for multiple steps. At each step, the random walk can move to any of the neighboring nodes with a certain probability, which is calculated based on the properties of the network. The process is repeated multiple times, and the probabilities of visiting each node in the network are calculated. The resulting probabilities are then used to generate prediction scores for each node in the network. The equation of random walk with restart can be represented as a matrix-vector multiplication operation, where a probability matrix P and an initial probability vector p0 are used to calculate the prediction score vector y.
- y is the prediction score vector, representing the probability of each node being associated with a particular label or trait.
- α is the restart probability, which controls the contribution of the initial probability vector to the final prediction scores.
- P is the transition probability matrix, representing the probability of moving from one node to another in the network.
- p0 is the initial probability vector, representing the probability of starting the random walk from each node in the network.
Another popular approach is the hot-net2 a network-based computational method used for identifying sets of genes or proteins that are likely to be involved in a particular disease or biological process. HotNet2 uses a multi-hop diffusion approach, which involves propagating information across the network by simulating random walks that can traverse multiple edges and nodes.
Another way to think network propagation is diffusion through the network which is based on the heat diffusion. Mathematically, this operation corresponds to the matrix-vector multiplication between P and p0~ (a normalized version of p0), yielding the prediction score vector y~ .
Although the 1-hop propagation approach is straightforward and useful, it has limitations when dealing with limited labeled data, which is often the case in computational biology. The 1-hop propagation approach can only generate meaningful prediction scores for genes that are directly connected to disease genes. However, considering that the human genome contains over 20 thousand genes, this approach can lead to suboptimal predictions. To overcome this limitation, we can expand the propagation to include more distant nodes, such as 2-hop, 3-hop, or even more. The figure at top
provided in this context demonstrates the k-hop propagation, where the propagation is extended from k = 1 to k = 2. This expansion can improve the accuracy of predictions by considering a wider range of connections and relationships between genes in the network.
Connecting Network Propagation with GCN
Graph neural networks (GNNs) are a class of neural networks that enable learning over irregular graph datasets. Each vertex (and often each edge) of the input graph usually comes with an input feature vector that encodes the semantics of a given task. Input feature vectors are transformed in a series of GNN layers . Finally, last GNN layer produces output feature vectors, which are then used for the downstream ML tasks such as node classification or graph classification. Similar to above methods, Graph convolution network follows the layer-wise propagation rule how this form of a graph-based neural network model can be used for fast and scalable semi-supervised classification of nodes in a graph. In a GCN, the propagation is performed by a localized convolution-like operation. This operation can be described as the aggregation of information from neighboring nodes and the subsequent transformation of the aggregated information using learnable parameters. In more formal terms, the propagation rule of a layer in a GCN can be defined as,
- H^(l) represents the node features (embeddings) at layer l.
- A is the adjacency matrix of the graph, which captures the connectivity information.
- D is the degree matrix, which is a diagonal matrix with the degree of each node on the diagonal.
- W^(l) is the learnable weight matrix at layer l.
- σ is an activation function, like ReLU or sigmoid.
Now here comes the interesting part, with the following substitution, we can fully reconstruct Network propagation as a special case of graph convolution.
- Replace spectral normalized self-connected adjacency matrix with either the row-normalized (P) or column-normalized (W) version
- Replace H(l) with p(l)
- Replace nonlinearity and W(l) with identity (or simply ignores these transformations)
Network propagation can be seen as a special case of graph convolution without any feature transformations.
Graph Attention Networks (GATs) are a type of graph neural network that utilize attention mechanisms to perform node feature aggregation. Like Graph Convolutional Networks (GCNs), GATs can be considered a special case of network propagation due to the way they propagate information through the graph. The key difference between GATs and GCNs is that GATs use an attention mechanism to weigh the contributions of neighboring nodes when aggregating information, making the aggregation process adaptive and more flexible.
In a GAT, the propagation is performed using an attention-based operation. The attention mechanism computes weights for the edges between the nodes based on the node features. This allows the model to emphasize certain neighbors over others during the aggregation process. The propagation rule of a layer in a GAT can be defined as:
Here:
- h_i^(l) and h_j^(l) represent the node features (embeddings) of nodes i and j at layer l.
- N(i) is the set of neighbors of node i.
- W^(l) is the learnable weight matrix at layer l.
- α_ij is the attention coefficient that quantifies the importance of node j’s features to node i.
- σ is an activation function, like ReLU or sigmoid.
GCNs are a type of neural network that operate on graph structures. They use a convolutional operation to propagate information between neighboring nodes in a graph. This allows them to learn features that are representative of the graph structure, which can be used for tasks such as node classification or link prediction. GCNs have shown great success in various applications such as recommendation systems, social network analysis, and drug discovery.
On the other hand, GATs use an attention mechanism to weigh the importance of different neighbors when propagating information. This attention mechanism allows GATs to learn different weights for different neighbors, which can be more expressive than the fixed weights used in GCNs. GATs have shown improved performance compared to GCNs on various benchmark datasets, especially when the graph is sparse or the task involves learning long-range dependencies.
References :
[1] T. N. Kipf, M. Welling, Semi-Supervised Classification with Graph Convolutional Networks (2016)
[2] P. Velickovic, G. Cucurcull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph Attention Networks (2018)
If you enjoyed reading this, and/or want to be kept in the loop about the next blog, follow me on Medium.
Feel free to connect with me on LinkedIn and tell me if you end up using this code in your research/job.
if you feel to leave some tips send it here .