distillpub / post--gnn-intro Goto Github PK

View Code? Open in Web Editor NEW

94.0 4.0 28.0 37.33 MB

License: Apache License 2.0

JavaScript 90.38% Shell 0.04% TeX 4.13% HTML 3.33% TypeScript 1.86% CSS 0.26%

post--gnn-intro's Introduction

GNN-distill

This is the repo for a distill post on graph neural nets

To run the demo

Install dependencies:

yarn

Watch the demo for changes with a local server:

yarn start

The demo can then be accessed at http://localhost:1234/

To make updates

Make edits to index.html
Run yarn build
Push to origin/main

post--gnn-intro's People

Contributors

Stargazers

Watchers

post--gnn-intro's Issues

Edge List representation intended in place of Adjacency List.

In the section The challenges of using graphs in machine learning, a paragraph starts with the sentence:

One elegant and memory-efficient way of representing sparse matrices is as adjacency lists.

From the representation described later, it is clear that an Edge List representation of the graph is meant.

Also from the side text:

Another way of stating this is with Big-O notation, it is preferable to have 𝑂(𝑛𝑒𝑑𝑔𝑒𝑠), rather than 𝑂(𝑛2𝑛𝑜𝑑𝑒𝑠).

It is confirmed that the Edge List representation is meant here as an Adjacency List representation takes O(V+E) space.

Double comma

Hello, maybe it's silly, but I noticed that in the "Graphs and where to find them", when separating the idea, a double comma ",," is being used.

¡Beyond that, I can only thank you for this wonderful work!

Missing $g$ fucction

Under section Passing messages between parts of the graph

For each node in the graph, gather all the neighboring node embeddings (or messages), which is the 𝑔 function described above.

But 𝑔 function seems never mentioned previously.

Peer Review #1

GNN Intro Review

The following peer review was solicited as part of the Distill review process.

The reviewer chose to waive anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service they offer to the community.

Distill is grateful to Chaitanya K. Joshi for taking the time to review this article.

General Comments

Introduction

Researchers have developed neural networks that operate on graph data (called graph neural networks, or GNNs) for over a decade, and many recent developments have increased their capabilities, and have now been used in many practical applications.

So many and's...maybe reframe it into 3 parts: GNNs have been developed for over a decade. Recent developments have focused on increased capability (and maybe expressive power/representational capacity?). In the past years, we are starting to see practical applications (and maybe some examples, like Halicin-MIT-Broad Institute, Fake News-Twitter, RecSys-Pinterest).

To start, let’s establish what a graph is. A graph represents the relations (edges) between a collection of entities (nodes).

The associated diagram may be better presented if the global attributes row is the last one...

To further describe each node, edge or the entire graph, we can store information in each of these pieces of the graph to further describe them. We can further specialize graphs by associating directionality to edges (directed, undirected).

This sentence needs editing to make less use of the word 'further'!

Graphs and where to find them

You’re probably already familiar with some types of graph data, such as social networks. However, graphs are an extremely powerful and general representation of data, and you can even think of images and text as a graph.

Images as graphs

It can be confusing to start with images and text...maybe its better to start with real-world graphs and then later mention images/text as add-ons. Essentially, images/text are euclidean data while most graph data is irregular and non-euclidean. So GNNs would definitely not be the top choice for anyone working with them. Thus, speaking about images/text as graph structures, while interesting, may diverge from the main point of the article, which is to be a gentle introduction to GNNs. (P.S. the visualizations for image/text graphs are super good, though!)

This refers to the way text is represented in RNNs; other models, such as Transformers, where text can be viewed as a fully connected graph. See more in Graph Attention Networks.

This sentence needs editing for clarity.

In the graph grid/adjacency matrix, the color scheme can be better as both the blues are very similar.

Summary statistics on graphs found in the real world. Numbers are dependent on featurization decisions. More useful statistics and graphs can be found in KONECT

Maybe, instead of just giving the title of the dataset, you can talk about the data domain...e.g. QM9 - molecular graphs, Cora - academic citation network, etc. As this table is being referred to in later parts of the article and is also allowing the reader to really grapple with the complexity of graph datasets out there, it would be great to present this one better.

One example of edge-level inference is in image classification. Often deep learning models will identify objects in images, but besides just identifying objects, we also care about the relationship between them.

Could it be better to use as an example something more common like predicting possible friendships on a social network?

The challenges of using graphs in machine learning

Two adjacency matrices representing the same graph.

If the authors are planning to add interaction to this figure, it would be interesting if, e.g. I highlight one row on the left adjacency matrix and the corresponding row on the right adjacency matrix is also activated.

One elegant and memory-efficient way of representing sparse matrices is as adjacency lists. These describe the connectivity of edge ek between nodes ni and nj as a tuple (i,j) in the k-th entry of an adjacency list. They are O(nedges), rather than O(n2nodes), and since they are used to directly index the node information, they are permutation invariant.

As a practitioner, I can quickly make the link between why we want input formats to be O(n_edges) rather than O(n_nodes^2). However, it may be better to frame this in simple english as opposed to Big-O notation. Alternatively, it may be worth introducing the idea that adjacency matrices on their own are O(n_nodes^2) earlier.

Graph Neural Networks

We’re going to build GNNs using the “message passing neural network” framework proposed by Gilmer et al. using the architecture Graph Nets schematics introduced by Battaglia et al.

Isn't the graph nets framework already encompassing MPNNs? E.g. If I say I'll build GNNs based on Graph Nets from Battaglia et al., it may be sufficient already?

Also, ""architecture Graph Nets schematics"" --> ""Graph Nets architecture schematics""?

With the numerical representation of graphs that we’ve constructed above, we are now ready to build a GNN. We will start with the simplest GNN architecture, one where we learn new embeddings for all graph attributes (nodes, edges, global), but where we do not yet use the connectivity of the graph.

Maybe I am being nitpicky/the authors have made the choice for pedagogical reasons, but, at this point in the article, they are introducing the concept of vectors/embeddings as features per node/edge/global. Previously, all these features had been scaler values, so I wonder if the sudden change will confuse readers? E.g. the diagram with the caption 'Hover and click on the edges, nodes, and global graph marker to view and change attribute representations. On one side we have a small graph and on the other the information of the graph in a tensor representation.'

I would suggest (preferably) using feature vectors from the start across all diagrams, or making a note about this to explain to readers.

However, it’s not always so simple. For instance, you might have information in the graph stored in edges, but no information in nodes, but still need to make predictions on nodes.

Consider giving an example? I had a hard time thinking of one, but maybe biological interaction networks exhibit this particular scenario.

If we only have node-level features, and are trying to predict binary edge-level information, the model looks like this.

Examples would be nice to help readers.

One solution would be to have all nodes be able to pass information to each other, but for large graphs, this quickly becomes computationally expensive (although this approach, called ‘virtual edges’ has been used for small graphs, like molecules).

Sentence can be broken into two for clarity.

I also have a broader comment on this section: in the previous section, the reader spends a lot of time understanding what is an edge list and its advantage over the adjacency matrix format. This is great, because this is how many graph libraries are processing graphs, e.g. NetworkX, PyTorch Geometric. However, how does this edge list format link to the current section? You have described message passing, but how is the edge list actually used for message passing?

I think the reader would be interested to connect the two sections of this article together, e.g. you could consider describing how one could do a simple round of message passing with the edge list format. (On a tangential note, it may also be useful to show how a matrix multiplication of the adjacency and feature matrix also implements message passing with a summation aggregation.)

GNN Playground

Scatterplot of a hyperparameter sweep of several GNN architectures. Hover over a point to see the GNN architecture parameters.

What did we learn about GNN design through this exercise? Are there any global insights about GNN architecture design choices that one can draw from this experiment, e.g. does global node help? And do these intuitions line up with some recent works on benchmarking and comparing GNN architectural paradigms, e.g. Dwivedi-etal, 2020; You-etal, 2020?

In the Final Thoughts section, the authors say ""We’ve walked through some of the important design choices that must be made when using these architectures, and hopefully the GNN playground can give an intuition on what the empirical results of these design choices are.""

The playground it very welcome but it may be nice to concretely state some of these intuitions.

Or even just highlight what the top architectural elements were for this particular dataset.

And then discuss whether they align well/are opposed to conventional ideas in the literature.

Into the weeds

In general, I would have liked to see more citations to recent work and new ideas in GNN literature in this section. Figures would also be nice.

Other types of graphs (multigraphs, hypergraphs, hypernodes)

There are several recent and interesting works generalizing GNNs for hypergraphs and multigraphs that could be mentioned here. One recent I am aware of is Yadati-etal, 2019.

Batching in GNNs

It may be worth talking about/citing prevalent GNN sampling algorithms in literature, e.g. GraphSaint, ClusterGCN.

Inductive biases

It may be interesting to speak about the link between inductive biases and generalization/extrapolation beyond training distribution, e.g. recent work on GNNs for neural execution of graph algorithms by groups from DeepMind (Petar Velickovic's work) as well as MIT (Keyulu Xu's work).

Since this operation is one of the most important building blocks of these models, let’s dig deeper into what sort of properties we want in aggregation operations, and which types of operations have these sorts of properties.

Missing text after this?"

Structured Review

Distill employs a reviewer worksheet as a help for reviewers.

The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.

Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: General (Introduction to an emerging research topic)

Advancing the Dialogue	Score
How significant are these contributions?	4/5

Outstanding Communication	Score
Article Structure	3/5
Writing Style	3/5
Diagram & Interface Style	3/5
Impact of diagrams / interfaces / tools for thought?	3/5
Readability	3/5

Comments on Communication:

I think there are a few places where the writing may be polished (and I've mentioned these in my long-form comments).

The article structure is coherent overall, but there are places where I feel the various sections lack a sense of harmony/continuity with each other.

The diagrams are well designed and useful for understanding the concepts.

Scientific Correctness & Integrity	Score
Are claims in the article well supported?	3/5
Does the article critically evaluate its limitations? How easily would a lay person understand them?	3/5
How easy would it be to replicate (or falsify) the results?	4/5
Does the article cite relevant work?	3/5
Does the article exhibit strong intellectual honesty and scientific hygiene?	3/5

Comments on Scientific Correctness & Integrity:

On ""Does the article critically evaluate its limitations? How easily would a lay person understand them?"" --> I am not sure this is relevant for this particular article.

The GNN playground interactive diagram in this article is really worth commending and would fit right in with my understanding of what good Distil articles should do. However, I would have like to see it accompanied with the authors discussing their findings via their playground tool. I have emphasized this in my long-form review.

Misnomer in diagram subtitle

I believe there's a misnomer in the diagrams subheading, right above the "Learning Edge Representations" sections. I believe it should be GNN, instead of GCN. Please review it.

Schematic for a ~~GCN~~ GNN architecture, which updates node representations of a graph by pooling neighboring nodes at a distance of one degree.

I believe GCN refers to graph convolutional neural nets, The diagram discusses only GNNs. Please review it.

Some typos

Thanks for the wonderful article!

A few minor typos and comments below:

Edge pooling figure has wrong number of embeddings?

Usually "generalized linear model" refers to the link function in binary classification not the generalization to multi-class in: "adapted to multi-class classification using a generalized linear model". Consider rephrasing to "adapted to multi-class classification using multinomial logistic regression".
Typo (two -> to): "We wish two highlight two general directions"
Typo (our -> or): "If we care about preserving structure at a neighborhood level, one way would be to randomly sample a uniform number of nodes, our node-set"
Subscript is off:

End of sentence space missing in "neighboring nodes we could also consider using a weighted sum.The challenge then is to associate"
Missing "as" in "This has been done via an auto-regressive model, such a RNN"

Typo: We wish two highlight two general directions

Also the figure that breaks down performance in terms of aggregation type has a shade of yellow that at least to my eyes and display makes it impossible to see the white line. It is visible with the other shade of yellow in other figures.

Nice article!

Incorrect boxplots + one minor typo

Dear authors,

Your article "A Gentle Introduction to Graph Neural Networks" on distill is very helpful and well written.
Thank you!

Allow me to suggest two minor corrections:

In the GNN playground section, the third plot (Architectures colored by type of Aggregations) has incorrect box-plots on the right hand-side. It seems like the grouping by aggregation types has not been performed.
In one of the concluding paragraphs of the "GNN playground" section: "We wish two highlight two general directions" --> We wish to highlight.

Best wishes,
Florian

Regarding Definition of Adjacency List

Thanks for the amazing explanation to GNNs. In the section where the graph representations are discussed I believe the node and edge representation are not clear along with the difference between edge list and adjacency list.
Thanks

Incorrect interactive figure

It seems that the figure illustrating edge pooling (with caption "Hover over a node (black node) to visualize which edges are gathered and aggregated to produce an embedding for that target node.") is wrong. It shows one too many boxplots, e.g. hovering over the bottom left node shows 3 boxes that are aggregated even though there are only 2 edge.
I believe that what it is showing is the message passing operation rather than the pooling operation.

Pooling Symbol

There seems to be a mistake in this sentence. It starts by stating that the pooling symbol is $\rho$ but it then uses p.

Replace: $p_{E_n \to V_{n}}$

with: $\rho_{E_n \to V_{n}}$

Typo: John A or John H

In "Node-level task" paragraph, there is John H (Administrator) in Zach’s karate club. In the figure below, there is John A.
I guess 'John A' is correct.
BTW, John H appears 5 times.

A minor spelling error

Peer Review #2

The following peer review was solicited as part of the Distill review process.

Distill is grateful to Patricia Robinson for taking the time to review this article.

General Comments

Overall a pleasure to review. Caught a few simple errors that might be worth changing. ie "Which graph attributes we update and in which order we update them is one design design behind GNNs." (change to: one design) Might be good to run through an editor to catch these spelling and grammar. :). My biggest ask would be adding in my engaging visuals if possible. Wonderful work. I learned a lot.

Distill employs a reviewer worksheet as a help for reviewers.

Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest
What type of contributions does this article make?: Exposition on an emerging research direction

Advancing the Dialogue	Score
How significant are these contributions?	4/5

Outstanding Communication	Score
Article Structure	5/5
Writing Style	5/5
Diagram & Interface Style	3/5
Impact of diagrams / interfaces / tools for thought?	3/5
Readability	5/5

Comments on Readability

Language is really the strength in this piece. Authors take readers on a delightful journey. Truly fun to read. Visuals fell a little short. Would have been engaged to see more interactive moments. Aesthetics aside, I think more compelling diagrams have the potential to help first timers learn in more dynamic and intuitive ways.

Scientific Correctness & Integrity	Score
Are claims in the article well supported?	5/5
Does the article critically evaluate its limitations? How easily would a lay person understand them?	5/5
How easy would it be to replicate (or falsify) the results?	5/5
Does the article cite relevant work?	5/5
Does the article exhibit strong intellectual honesty and scientific hygiene?	5/5

Comments on Scientific Integrity

Enjoyed the balance of citations that are more recent (5 years) and important foundational works (circa 1969 and 1976). Found sources very trustworthy. Reproducibility seems fairly simple.

Minor Speling Error

A minor spelling error in the third last passage of Some empirical GNN Design Lessons

Correction:
There are many directions you could go from here to get better performance. We wish to highlight two general directions, one related to more sophisticated graph algorithms and another towards the graph itself.

Pooling types: max, mean, sum

Thank you for this interactive article!

In the sentence: "No pooling type can always distinguish between graph pairs such as max pooling on the left and sum / mean pooling on the right."
"sum" should be "max".

An alternative might be to replace the image so it shows why "mean" also is not uniformly the best choice. Then, the sentence "There is no operation that is uniformly the best choice." would match the image.

Peer Review #3

The following peer review was solicited as part of the Distill review process.

Distill is grateful to Humza Iqbal for taking the time to review this article.

General Comments

Highly enjoyed the article! It was a great look into GNNs and various aspects of them as well as the problems they are used in. My favorite part was how thorough the article was in exploring the mechanics; diving into various aspects such as different pooling functions, how to batch them and so on. The diagrams were very fun to play around with, being able to manipulate the graphs and understand how they were effected by changing the different building blocks was very easy to see.

One thing that may be nice to add or at reference is this article which talks about the equivalence between Transformers and GNNs https://graphdeeplearning.github.io/post/transformers-are-gnns/. I thought about this when Transformers were mentioned in the article "This refers to the way text is represented in RNNs; other models, such as Transformers". I think an aside could be added in the section where Graph Attention Networks are mentioned.

It may also be good to point out that there is research going on in message passing to find the optimal way to get information to flow through. As an example, this paper https://arxiv.org/abs/2009.03717 deals with the issue of encoding global information well. On that note, it might be good to add a sentence talking about the limitations of message passing (ie if I increase my window size too much I risk my node representations converging and losing my ability to update)

Distill employs a reviewer worksheet as a help for reviewers.

Advancing the Dialogue	Score
How significant are these contributions?	4/5

Outstanding Communication	Score
Article Structure	5/5
Writing Style	4/5
Diagram & Interface Style	4/5
Impact of diagrams / interfaces / tools for thought?	4/5
Readability	4/5

Comments on Readability

The diagrams were overall quite good. One nitpick I have is that for the diagram showing the difference between max, sum, and mean pooling it might be better to write "No pooling type can always distinguish between graph pairs such as max pooling on the left and sum / mean pooling on the right".

Some minor grammatical nitpicks:

in the section on Graph Attention Networks the Latex doesn't seem formatted quite right for the phrase "( f(node_i, node_j))" perhaps there was some slight Latex error?
the phrase "design design" appears in the section on "Learning Edge Representations" when I believe "design decision" was meant.
In the section "GNN Playground" I believe ‘allyl alcohol’ and ‘depth’ were meant to be italicized

Scientific Correctness & Integrity	Score
Are claims in the article well supported?	4/5
Does the article critically evaluate its limitations? How easily would a lay person understand them?	4/5
How easy would it be to replicate (or falsify) the results?	4/5
Does the article cite relevant work?	4/5
Does the article exhibit strong intellectual honesty and scientific hygiene?	3/5

Comments on Scientific Integrity

The article talks about the limitations involved in setting up GNNs and working with them (such as the tradeoffs between different aggregation functions) however it would have been nice to see some notes on how well GNNs work on various problems such as generative modeling or interpretability. I put the overall score for the limitations category at a 4 however if I were to break limitations down into how well particular limitations were explained and overall limitation coverage I would give each a score of 4 and 3 respectively.

Redundant edge in "Graph Attention Networks" figure

This issue refers to this figure.

I believe that the edge between the yellow "key" node and the very light grey node should not be there, as in this case the light grey node should also be a query node and used for computing a pairwise score with the query node.

Btw, congratulations for the excellent article! By far the best introductory resource on GNNs I have come across, thank you for this.

In line 365, the sentence reads "which is the $g$ function described above", but the $g$ function isn't described yet. I believe we want to describe the function P_En->Vn as the gather function.
In line 460, I think the conclusion should be, "The first thing to notice is that, surprisingly, a higher number of parameters does not necessarily correlate with higher performance."
In line 503, it should be "We wish to highlight two general directions..."

Happy to make pull requests if these changes make sense!

distillpub / post--gnn-intro Goto Github PK

post--gnn-intro's Introduction

GNN-distill

To run the demo

To make updates

post--gnn-intro's People

Contributors

Stargazers

Watchers

Forkers

post--gnn-intro's Issues

General Comments

Introduction

Graphs and where to find them

The challenges of using graphs in machine learning

Graph Neural Networks

GNN Playground

Into the weeds

Structured Review

General Comments

General Comments

Recommend Projects

Recommend Topics

Recommend Org

Jobs