GithubHelp home page GithubHelp logo

graphsage-simple's People

Contributors

williamleif avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphsage-simple's Issues

TypeError: float() argument must be a string or a number, not 'map'

Traceback (most recent call last):
File "/home/machinelearning/anaconda3/envs/graphsage-simple-master/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/machinelearning/anaconda3/envs/graphsage-simple-master/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 183, in
run_cora()
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 69, in run_cora
feat_data, labels, adj_lists = load_cora()
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 49, in load_cora
feat_data[i, :] = map(float(), info[1:-1])
TypeError: float() argument must be a string or a number, not 'map'

Is this pytorch version support batch-level learning?

Hi Dear Author,

Seems like the run functions are now targeting on transductive (node-level) classification. To implement the inductive (graph-level) with batched dataloader, we may need batch to store adjacencies and feature matrix, if original shape is (N, N) and (N, F) for them, now with batch they would change to (B, N, N) and (B, N, F), where B is batch-size, F is feature dim, N is node num.

Thus, to support batch-level calculation, is there any sections in Encoder and Aggregator need change?

Can not run the sample until fix a little code in encoders.py

Hi William,

I clone the code and try run_cora. At first the interpreter raise the dimension problem until I fix the code in encoders.py(line 39) from:

neigh_feats = self.aggregator.forward(nodes, [self.adj_lists[node] for node in nodes],

to

neigh_feats = self.aggregator.forward(nodes, [self.adj_lists[int(node)] for node in nodes],

Then It works fine. I think it is a type issue. Hope it will be helpful for others.

Can the code for inductive learning on the PPI dataset be added to the PyTorch implementation?

I'm following the PyTorch version of the GraphSage code and looks like the implementation is for a transductive setting (Cora and PubMed datasets). It would be really helpful if the implementation of the PPI dataset (inductive setting - graph level learning) can be added to the PyTorch implementation. In that way, one can train on a bunch of graphs and then test on a completely new set of graphs of different dimensions just like the PPI dataset.

question about calling agg1 and enc1 twice

I wonder why do we need to call enc2 in self.features in both the encoder and aggregator?
Why don't we just use the raw features as the agg2 already aggregates features from neighbors?

A question about the learning rate.

Hello!

I have a question about the learning rate. As is stated in the appendix of GraphSAGE, and are adopted by many other works, the learning rate is usually set to 1e-2, etc. Meanwhile, they usually normalize the input features.

However, in your work, the learning rate is set to 0.7, which is surprisingly high. You do not normalize the input features either. When I try to reset the learning rate to a common one and use the normalized features to train, I find that the model could only converge to a extremely bad performance.

This issue confuses me a lot. Could you help explain a bit?

Dataset seperation

Can anyone explain to me the logic behind train/valid/test node separation of this code?
For the cora dataset out of shuffled 2708 nodes, first 1000 is taken as test nodes, next 500 as valid nodes and the rest as train nodes. Similarly for the pubmed dataset, out of shuffled 19717 nodes, first 1000 is taken as test nodes. next 500 as valid nodes and rest as the train nodes.
So, test:valid:train proportion of cora is 36.9 : 18.5 : 44.6, pubmed is 5.1 : 2.5 : 92.4.

  1. Don't we have to keep the same ratio between test:valid:train nodes?

  2. How can I seperate a new dataset to these categories?

I belive that we need to seperate nodes into train/valid/test categories for a node classification problem. What about the link prediction problem?

Memory Leak Problem

I am focusing on adversarial attack on graph. When I use GraphSAGE as my target model, I find that if I perform model.forward() for many times, the usage of memory gradually grows (and finally occupied all of the memory).

Does not support very large dataset

I note that the feature matrix of all nodes is feed into the initialization of model (encoder, aggregator), which will cause great memory when the feature matrix is too large.
The corresponding code is here:

agg = MeanAggregator(features, cuda=True)
enc = Encoder(features, n_feats_dim, args.n_hidden, refined_adj_lists, agg, gcn=args.gcn, cuda=args.cuda)

I think this method support large feature matrix if we do not implement it like this. I mean maybe we can assign the batch feature matrix into the model each time.

And this will make the code more applicable.
Any advice? Guys.

question about the GCN type aggregators

the code in run_cora(), model.py
agg1 = MeanAggregator(features, cuda=True) enc1 = Encoder(features, 1433, 128, adj_lists, agg1, gcn=True, cuda=False) agg2 = MeanAggregator(lambda nodes : enc1(nodes).t(), cuda=False) enc2 = Encoder(lambda nodes : enc1(nodes).t(), enc1.embed_dim, 128, adj_lists, agg2, base_model=enc1, gcn=True, cuda=False)
enc1's gcn flag is set to True, while the agg1's gcn flag is default False and never be set to True in enc1's code.

so the self-loop of GCN is not considered in fact.

I wonder if there might be something wrong with it, isn't it?

Applicable license

Hi,

There is no license file in the repository. Under what license is this code released?

Potential typo - enc.num_sample[s]

In the model.py, whenever we are setting the number of samples for the encoders (enc1.num_samples), we should use enc1.num_sample rather than enc.num_samples

Is this for inductive learning?

Hello.
It seems that GraphSage incorporates feature information from neighbors even if the neighbor belongs to test data while building the model. I think this is not allowed in inductive learning.
Would you let me know if this code is for inductive learning?

Is this the implementation of transductive learning?

I want to confirm if the implementation is of inductive learning or transductive learning. From the code it looks like Variable adj_lists contain all the edges in the graph and therefore test edges are also used during training. However the results reported in the paper correspond to inductive learning. Please let me know if I am missing anything.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.