tkipf / pygcn Goto Github PK
View Code? Open in Web Editor NEWGraph Convolutional Networks in PyTorch
License: MIT License
Graph Convolutional Networks in PyTorch
License: MIT License
Hello, thank you for your great work.
I want to extend gcn which involves message passing,
but I'm new to GCN so I have a minor question.
I have to types of node A, B.
Basically I want to train different weights jointly. (Weight_AA, Weight_AB, Weight_BA, Weight_BB)
During the node representation update,
A(t+1) = Weight_AA*A(t)adj(AA) + Weight_ABB(t)adj(BA)
B(t+1) = Weight_BBB(t)adj(BB) + Weight_BAA(t)*adj(AB)
The first terms are simple graph convolution layer with adj(AA), adj(BB) are both square
but for the adj(BA), adj(AB) it might not be square, (# of two types of nodes will differ)
Can I use non-square adj matrix during the whole process? (normalize, forward, ...)
Hello and thanks for your work.
I would like to apply the GCN architecture on a graph whose nodes have no features, and also very few nodes have labels. More specifically, this is going to be a graph of words, where related words are connected with an edge, and also I have some document nodes that are connected to the words they contain. Some document nodes have labels, and the rest are left to be predicted. Word nodes are just there to help associate document nodes with one another, and, hopefully, propagate the labels from the training document nodes to the testing document nodes. The first dataset I tried is OHSUMED, if that makes a difference.
I started transforming the code to fit my needs, but I have a couple of issues:
What do I replace the feature matrix with? F is a no. of nodes x features size matrix, that I have no way to populate. What I tried was to set it as an identity matrix, but that seems random. Also, I tried to set this node to features matrix as another trainable parameter.
In the original problem, every node has a label associated with it. However, in my case, less than .1% of the nodes have a label. I decided to just provide the indexes of the adjacency matrix that are associated with the training/validating/testing document nodes. Is there an optimal way to represent non-labeled nodes?
So far, I haven't been able to get the model to work in this problem. With several permutations of the modifications above, I get an accuracy of about 20%, far below my other baselines. Am I missing something obvious in the model definition or the optimization process?
Any help is welcome.
Early Stop scheme could enhance the performance to a certain extent. Why it's not used in this codes?
What is difference between tensorflow version and pytorch version?
and why did you change? (split of training dataset and add dropout layer)
Hi! I'd be very curious to know what the words in the vocabulary are. Could I find that somewhere?
In the module named "MODELS" :
class GCN(nn.Module):
def init(self, nfeat, nhid, nclass, dropout):
but the values of these paramters are not specified in the code. I am confused how the code gets these values! nfeat, nhid, nclass
would be great helping me out
Hi,
Thank you for your implementation, both in Tensorflow and Pytorch.
I would like to ask if you can provide the order of nodes used for the Cora, Citeseer and Pubmed datasets in your Tensorflow implementation.
I look into your code in this repository and I can see the order comes from cora.content
.
However, it must be different from the order in the Tensorflow repo (where binary files are provided) because when I use the adjacency matrix created by this repo (Pytorch) for the Tensorflow code, the result produced by the Tensorflow code is very low, thus the two adjacency matrices must be different.
Many thanks.
Hi @tkipf, thank you so much for providing the code.
I'm wondering if it's possible to scale this implementation up to millions of nodes (obviously the number of edges must scale linearly), for example a grid. I'm not familiar with PyTorch's sparse matrix implementation, so I'm not sure if representing the adjacency matrix as a sparse matrix is enough to deal with large graphs?
In your code, use below code to create adj matrix
adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
shape=(labels.shape[0], labels.shape[0]),
dtype=np.float32)
That meansadj[edges[:, 0][k], edges[:, 1][k]] = np.ones(edges.shape[0])[k]
. But in the file data/cora/README
, it says that the direction of the link is from right to left. Details are as follows:
The .cites file contains the citation graph of the corpus. Each line describes a link in the following format:
<ID of cited paper> <ID of citing paper>
Each line contains two paper IDs. The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation. The direction of the link is from right to left. If a line is represented by "paper1 paper2" then the link is "paper2->paper1".
I would like to ask why the code that produces the adjacency matrix is not like this:
adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 1], edges[:, 0])),
shape=(labels.shape[0], labels.shape[0]),
dtype=np.float32)
That meansadj[edges[:, 1][k], edges[:, 0][k]] = np.ones(edges.shape[0])[k]
.
Thanks a lot ~
Hi,
calling output = model(features, adj) does not give probability output? if I want model to return probability, what should I change?
If I change log_softmax to softmax, the loss function F.nll_loss should be changed?
thanks.
Hi, I notice that in this PyTorch version code, the adjacency matrix is row-normalized instead of symmetrically normalized. However, the accuracy (82.5%) is higher than the TensorFlow version code (81.6%). Moreover, I also tried to symmetrically normalize the adjacency matrix in this PyTorch version, but the result dropped (to 79.9%). Nevertheless, result of TensorFlow version does not change after modification of normalization. For summarization, this is the experiments I did:
Cora dataset | TensorFlow | PyTorch |
---|---|---|
Symmetrically Normalization | 81.6 | 79.9 |
Row Normalization | 81.6 | 82.5 |
Is there any idea why does this happen?
I have my own array for adjacency matrix and features, thus i did not use the load function in utils.py
Could I ask is it right for me to generate adjacency and feature data for training graph cnn. features and adj are full numpy array:
features = sp.csr_matrix(features, dtype=np.float32)
features = normalize(features)
features = np.array(features.todense())
adj = sp.csr_matrix(adj)
adj = normalize(adj + sp.eye(adj.shape[0]))
adj = sparse_mx_to_torch_sparse_tensor(adj)
Another question is that why need "adj + sp.eye(adj.shape[0])" if adj is an adjacency matrix? thanks.
Hi,
I have a multi-label classification problem, where one node can have multi labels, DO I need change the code for multi-class classification? thanks.
Hi @tkipf , excellent work on pygcn! Really nice engineering setting up sparse adjacency multiplications and super clean code. I'm curious to hear how you suggest dealing with batch operations? Unless I am misunderstanding, it looks in train.py
that each epoch operates on a single large graph, and the labels are per-node labels. If this interpretation is correct, do you have any suggestions for datasets consisting of many graphs (a series of sparse matrices) each mapped to a graph-level output/label? this would be solved if PyTorch could accept a list of tensors as an input but that does not seem (easily) supported right now. Thanks for any advice!
Cheers,
Evan
PS Great to meet you at Stanford a few weeks ago!
I saw you assign train/val/test follow this way:
idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)
I don't understand why you assign test sample number much more than train and val?
hi, I cannot find the implementation of the convolved signal matrix as the following picture:
can u tell me?
@tkipf
Hello, thanks for the amazing work. In your implementation, you use D^-1A, but I noticed that some other work use D^-1/2AD^-1/2, I suppose these two calculation won't get the same normalized adjacency matrix. Which one should I choose? Or they will have the same performance?
I think maybe in a large graph(A is very big), D^-1/2AD^-1/2 will roughly equal to D^-1A, is that correct?
Hi @tkipf, thanks for your awesome work and providing this code!
I have a bit of a novice question: I'm trying to process my graph's features through a GCN in mini-batches. I.e. let's say I have a 1000-node graph and I want to process it through the GCN in mini-batches of size 50.
It doesn't seem like the code currently supports this because we have to multiply by the full adjacency matrix in the GCN layer's forward
pass - do you have any sense of how I can support these "batch" operations? Better yet, do you have example code that does this?
My code looks something like:
z0 = F.tanh(self.gcn1(x, self.fancy_adj))
where x is a sampled batch of size 50 x F (not the full batch of 1000 x F) and self.fancy_adj is the adjacency matrix transformed as suggested in your paper ( adjacency + identity + row normalized). The problem, of course, is that self.fancy_adj is a 1000 x 1000 matrix. Even if I take just the rows of self.fancy_adj corresponding to the 50 points in the batch, then the adjacency matrix becomes a 50 x 1000 matrix which can't be multiplied by the 50 x F sampled batch.
Dear professor,
Hello!
I am very interesting in your recent GCN work.
Thanks for sharing the code, I used the GCN network to run the citeseer database, but the accuracy could not reach 70.3. How did you set the parameters to run so high? Thanks a lot for sharing the code, anyway.
Many thanks for your help.
I notice the standard deviation for initialization used output size instead of input size. Is this implementation intended?
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
stdv = 1. / math.sqrt(self.weight.size(1))
I want to know how I can get the feature matrix of nodes with my own training data, suppose that the number of nodes is N and the dim of every featue vector is d, how can I get the inputs X whose shape is N*d?
Thanks a lot!
` adj = sp.coo_matrix((np.ones(edges.shape[0]), (edges[:, 0], edges[:, 1])),
shape=(labels.shape[0], labels.shape[0]),
dtype=np.float32)
# build symmetric adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)`
I don't understand how the last line code produces the symmetric matrix.
And I think it is intuitive to build the symmetric like this
adj = adj + adj.T
Can anyone help to answer my questions? thanks a lot
Hi
Thanks for providing this clean example of GCN in pytorch. I am new to graph machine learnig so I would like to ask is this work fit the following scenario (See the figure and text below)?
Given many small graphs (G1,G2,...Gn), each has node vector X (#node * #features) and adjacent matrix A (#node * #node), I just want to use GCN as an encoder to get the representation of each graph for the downstream task (The label is kind of sequential tags)
(There is only one GCN layers, All blue blocks are the same.)
In addition, after checking the batch operation issue (#1 ). In my scenario, I wonder whether it is possible to just stack multiple node vector X and its adjacent matrix A like below then apply the same batch operation as usual instead of creating a big block-diagonal matrix?
Hello,Kpitf !
Thanks for your share! In the trainng process,features are 2708 dims, does it involve test samples?
thank you very much!
Hi @tkipf . I have some confusion about residual connection in GCN.
def forward(self, x, adj):
# x size: [2708, 1433]
# adj size: [2708, 2708]
x = F.relu(self.gc1(x, adj)) # [2708, 16]
......
input size: [2708, 1433] , but first layer output's size is [2708, 16]
If I implement residual like equation above.
x = F.relu(self.gc1(x, adj)) + x
Error! size miss match.
Dear authors,
I was wondering why in the function normalize, the power is to -1 and not -1/2 as in the tensorflow code. Is there any reason for this ?
Thanks for the answer !
As I trained, the result of this repository is more accurate than the original paper (SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS) in cora dateset.
What is different from the original code?
Hi tkipf, thank you for sharing the source code.
I ran it on Pytorch 0.4.0 and Python 2.7, but got this type error. However, if I used python 3.5 it can be run.
Loading cora dataset...
Traceback (most recent call last):
File "train.py", line 49, in <module>
dropout=args.dropout)
File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 11, in __init__
self.gc2 = GraphConvolution(nhid, nclass)
File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 43, in __init__
self.bias = Parameter(torch.Tensor(out_features))
TypeError: expected torch.FloatTensor (got torch.LongTensor)
May I ask how to solve this issue? Thank you
Hi,
The normalize function in utils.py only normalize the row of adjacency matrix. While for the tensorflow version the implementation is different. You normalized both row and column. I am wondering will this lead to a difference for performance of GCN in accuracy ?
Best,
Xiaoyun
Hi! How do you process the data and save it as .content format ? what is the the content of 'cora.cities' and 'cora.content' ? Thanks!
In the cora.content file, I don't know what is the features mean.
Hello,
In your Pytorch version, where the Chebyshev convolution is implemented ?
l don't find it in https://github.com/tkipf/pygcn/tree/master/pygcn
l would like to see the kernel size of convolutional filters
Thank you
Thank you for sharing the work. Is the model for detecting communities? (classifying nodes with community labels?) Please include the intended output in the Readme. Then it will be easier for beginners like me.
I have multiple graphs for node classification task. All the examples I've seen so far was for graph classification(or there is just one graph for node classification task). Although I've seen building block diagonal adjacency matrix, I'm not sure if it is for graph classification or node. Also I didn't understand whether should I create a block diagonal matrix with feature matrix and labels or not.
Let's suppose I have 20 different graphs(with different number of nodes, edges, features). And each node of every graph is labeled.
All the nodes in first 10 graphs are for training, all the nodes of the next 5 graphs are for val, and all the nodes of last 5 graphs are for test, and what I try to do is predicting labels of the nodes for the graphs in test-set. How can I input multiple graphs into GCN with these conditions for node classification task(not for graph classification).? If the solution is diagonal adj. matrix, should I do the same for labels and feature matrix?
Why in this pytorch implement,use row-normalize sparse matrix both for features and adj which is different from tensorflow implement?
Hi,
Could I get the node embedding with a certain length based on this code? I should extract the output of which step?
Thanks,
Hi tkipf, thank you for your amazing work!
in utils.py, starts from Line 36,
# build symmetric adjacency matrix
adj = adj + adj.T.multiply(adj.T > adj) - adj.multiply(adj.T > adj)
As far as I understand, these lines are turning a directed adjacency matrix into an undirected adjacency matrix?
Since adj is a 0-1 matrix, then for the positions adj[i,j]
where adj.T > adj
, we should have adj[i,j] = 0
, so the - adj.multiply(adj.T > adj)
part is always zero.
Then what's the purpose of having that part, or am I understand it incorrectly?
Hi,
When I run train.py there is a error message says:
Loading cora dataset...
Traceback (most recent call last):
File "train.py", line 104, in
train(epoch)
File "train.py", line 69, in train
output = model(features, adj)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/models.py", line 15, in forward
x = F.relu(self.gc1(x, adj))
File "/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.py", line 206, in call
result = self.forward(*input, **kwargs)
File "build/bdist.linux-x86_64/egg/pygcn/layers.py", line 61, in forward
return output + self.bias
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 745, in add
return self.add(other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 283, in add
return self._add(other, False)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 277, in _add
return Add(inplace)(self, other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py", line 20, in forward
return a.add(b)
RuntimeError: inconsistent tensor size at /b/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:831
I noticed that SpecialSpmmFunction is the subclass of torch.autograd.Function and there is only one object in class SpGraphAttentionLayer.
`class SpGraphAttentionLayer(nn.Module):
def init(self, in_features, out_features, dropout, alpha, concat=True):
super(SpGraphAttentionLayer, self).init()
self.in_features = in_features
self.out_features = out_features
self.alpha = alpha
self.concat = concat
self.W = nn.Parameter(torch.zeros(size=(in_features, out_features)))
nn.init.xavier_normal_(self.W.data, gain=1.414)
self.a = nn.Parameter(torch.zeros(size=(1, 2*out_features)))
nn.init.xavier_normal_(self.a.data, gain=1.414)
self.dropout = nn.Dropout(dropout)
self.leakyrelu = nn.LeakyReLU(self.alpha)
self.special_spmm = SpecialSpmm()`
But in official documents, there is a saying that Each function object is meant to be used only once (in the forward pass). I found the self.special_spmm forward twice in
`e_rowsum = self.special_spmm(edge, edge_e, torch.Size([N, N]), torch.ones(size=(N,1), device=dv))
# e_rowsum: N x 1
edge_e = self.dropout(edge_e)
# edge_e: E
# Each function object is meant to be used only once (in the forward pass).
h_prime = self.special_spmm(edge, edge_e, torch.Size([N, N]), h)`
Have I misunderstand sth.?
Hi,
Thanks for your work. I'm totally a green hand of computer vision.
Recently, I'm trying to achieve the medical image segmentation using GCNs.
However, I read your code and find that the input is a whole graph, which includes the training data, test data, and validation data. However, in my framework, the training image is regarded as a graph, and there is not a connection between each image.
Thus, I'm confused about how to achieve this.
Besides, I want to know whether the input feature can be extracted patch from the target node.
Look forward to your reply.
Thanks,
lei
Hi
Thank you for sharing your implementation in Pytorch.
I am using a similar GCN structure for regression analysis. Therefor the last layer would be the same as others. My proposed GCN follows the below structure.
model GCN(
(gc1): GraphConvolution (2 -> 2)
(gc2): GraphConvolution (2 -> 20)
(gc3): GraphConvolution (20 -> 20)
(gc4): GraphConvolution (20 -> 20)
(gc5): GraphConvolution (20 -> 2)
(gc6): GraphConvolution (2 -> 2)
)
The inputs are locations of 2D vertices and adjacency matrix of synthetic data (for simplicity a circular shape graphs).
The activation functions are tanh and the loss function is L2norm (because the problem is regression).
I’ve also initialized the weights and bias parameters as following:
def reset_parameters(self):
stdv = 1. / math.sqrt(10/self.nhid)
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.fill_(0)
I feed the network with some noisy data (as input graphs) and the target is a circle. It is expected to networks can regressed a circular shape but outputs have elliptic shape. I got that this network comes to hight sensitivity respect to weight initialization.
Why this GCN couldn’t work to solve a regression problem? Could you please give me your advise and some feedback about this.
Hi,
Thank you for providing such a great model. I would like to ask , can we apply Multiple Instance learning on text-GCN ? and which level that would be , graph classification level or node classification level.
Thank you in advance
Hello, thank you for you amazing work! I am a starter in this field, and I'm confused about the result. I ran the code in PyCharm, I was wondering why I got different output every time I ran the code. As I see, the random seeds are fixed at the begining, am I missing something?
Hi tkipf, thanks for your sharing.
There are a total of 2708 lines in cora.content.However in utils.py,data division is as following:
idx_train = range(140)
idx_val = range(200, 500)
idx_test = range(500, 1500)
May I ask what is the reason for splitting in this way? Thank you
Hi Kipf,
Nice work. I am trying to run the model but failed to get the same result everytime even with the below lines.
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)
When training with the default setting, GPU usage is 0% while CPU is 100%.
The training code seems to be using CUDA, however, it doesn't seem helpful in boosting the speed.
Why?
Hello,tkipf !
Thanks for your share! In the trainng process,features are 2708 dims, does it involve test samples? thank you very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.