williamleif / graphsage-simple Goto Github PK

View Code? Open in Web Editor NEW

967.0 967.0 241.0 14.18 MB

Simple reference implementation of GraphSAGE.

Python 100.00%

graphsage-simple's People

Contributors

Stargazers

Watchers

Forkers

manoelhortaribeiro dawnranger emaadmanzoor jsonac alge24 shantanusinghal fulquan zwytop psavine42 sungjinlees songfgh zhaofenqiang afcarl thanhtrunghuynh93 bala-srini-32 jakezhaojb hyzcn kotorinyanya lidaiqing johndpope qss2012 zbwade kanm05 zfjsail chubbymaggie georgeberry locussam conancui suyanzhou626 cloudzy benchengy tsingzao vilcek ferplascencia ashuein hbsun2113 conca1016 zhhhzhang idmy kagres sidney1994 jeongchanwoo jy5380 jennyleestat junhaowang wuk32 m-ak shalika-madhushanki fangego llwc louise-lulin shiyintan ciphoo simon-be-happy denis-xiao wh-forker sweeteryu aaaeeee patriciaxiao selimam vcjy2017 winddd guoswang marco2018 hoangdzung shouhengtuo garfield0428 suanrong mhnnunes sidneyaraujomelo adaliubc marcusandreassvensson murphyyhuang collapseyu skbai1996 xueqing-chen jialrs coffeeclh pwforks seongjinahn subbyte luciferkonn mawy610 theaperdeng lhyciomp littlebadrobot deadpoolssr gijswijnholds ckaelig litszon xiangs18 ocean1 liun-online trrrrht mcastro2 liunianxuxie zhichao-li shengzhang90 ammieqi personx000

graphsage-simple's Issues

TypeError: float() argument must be a string or a number, not 'map'

Traceback (most recent call last):
File "/home/machinelearning/anaconda3/envs/graphsage-simple-master/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/machinelearning/anaconda3/envs/graphsage-simple-master/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 183, in
run_cora()
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 69, in run_cora
feat_data, labels, adj_lists = load_cora()
File "/home/machinelearning/Documents/graphsage-simple-master/graphsage/model.py", line 49, in load_cora
feat_data[i, :] = map(float(), info[1:-1])
TypeError: float() argument must be a string or a number, not 'map'

what is the meaning of "self.base_model = base_model" in encoders.py ?

Hi, it seems that the code "self.base_model = base_model" is ineffective, since self.base_model is not used in encoders.py.

However, when I remove the statement, the result changes a lot.
So, what is the meaning of the statement in encoders.py?

Is this pytorch version support batch-level learning?

Hi Dear Author,

Seems like the run functions are now targeting on transductive (node-level) classification. To implement the inductive (graph-level) with batched dataloader, we may need batch to store adjacencies and feature matrix, if original shape is (N, N) and (N, F) for them, now with batch they would change to (B, N, N) and (B, N, F), where B is batch-size, F is feature dim, N is node num.

Thus, to support batch-level calculation, is there any sections in Encoder and Aggregator need change?

Can not run the sample until fix a little code in encoders.py

Hi William,

I clone the code and try run_cora. At first the interpreter raise the dimension problem until I fix the code in encoders.py(line 39) from:

neigh_feats = self.aggregator.forward(nodes, [self.adj_lists[node] for node in nodes],

neigh_feats = self.aggregator.forward(nodes, [self.adj_lists[int(node)] for node in nodes],

Then It works fine. I think it is a type issue. Hope it will be helpful for others.

Can the code for inductive learning on the PPI dataset be added to the PyTorch implementation?

I'm following the PyTorch version of the GraphSage code and looks like the implementation is for a transductive setting (Cora and PubMed datasets). It would be really helpful if the implementation of the PPI dataset (inductive setting - graph level learning) can be added to the PyTorch implementation. In that way, one can train on a bunch of graphs and then test on a completely new set of graphs of different dimensions just like the PPI dataset.

question about calling agg1 and enc1 twice

I wonder why do we need to call enc2 in self.features in both the encoder and aggregator?
Why don't we just use the raw features as the agg2 already aggregates features from neighbors?

so many bugs when gcn=False ?

It seems that the aggregator does not work and the settings here is also strange

A question about the learning rate.

Hello!

I have a question about the learning rate. As is stated in the appendix of GraphSAGE, and are adopted by many other works, the learning rate is usually set to 1e-2, etc. Meanwhile, they usually normalize the input features.

However, in your work, the learning rate is set to 0.7, which is surprisingly high. You do not normalize the input features either. When I try to reset the learning rate to a common one and use the normalized features to train, I find that the model could only converge to a extremely bad performance.

This issue confuses me a lot. Could you help explain a bit?

Pytroch version why is it slower can it be fixed?

Dear author, why is the Pytorch version so much slower? Can it be optimised if so which sections?

Can not run the sample beacause of the bug from aggregators.py

TypeError: unsupported operand type(s) for +: 'set' and 'set'
Do you want to operate the two set by union

Dataset seperation

Can anyone explain to me the logic behind train/valid/test node separation of this code?
For the cora dataset out of shuffled 2708 nodes, first 1000 is taken as test nodes, next 500 as valid nodes and the rest as train nodes. Similarly for the pubmed dataset, out of shuffled 19717 nodes, first 1000 is taken as test nodes. next 500 as valid nodes and rest as the train nodes.
So, test:valid:train proportion of cora is 36.9 : 18.5 : 44.6, pubmed is 5.1 : 2.5 : 92.4.

Don't we have to keep the same ratio between test:valid:train nodes?
How can I seperate a new dataset to these categories?

I belive that we need to seperate nodes into train/valid/test categories for a node classification problem. What about the link prediction problem?

Memory Leak Problem

I am focusing on adversarial attack on graph. When I use GraphSAGE as my target model, I find that if I perform model.forward() for many times, the usage of memory gradually grows (and finally occupied all of the memory).

Does not support very large dataset

I note that the feature matrix of all nodes is feed into the initialization of model (encoder, aggregator), which will cause great memory when the feature matrix is too large.
The corresponding code is here:

agg = MeanAggregator(features, cuda=True)
enc = Encoder(features, n_feats_dim, args.n_hidden, refined_adj_lists, agg, gcn=args.gcn, cuda=args.cuda)

I think this method support large feature matrix if we do not implement it like this. I mean maybe we can assign the batch feature matrix into the model each time.

And this will make the code more applicable.
Any advice? Guys.

question about the GCN type aggregators

the code in run_cora(), model.py
agg1 = MeanAggregator(features, cuda=True) enc1 = Encoder(features, 1433, 128, adj_lists, agg1, gcn=True, cuda=False) agg2 = MeanAggregator(lambda nodes : enc1(nodes).t(), cuda=False) enc2 = Encoder(lambda nodes : enc1(nodes).t(), enc1.embed_dim, 128, adj_lists, agg2, base_model=enc1, gcn=True, cuda=False)
enc1's gcn flag is set to True, while the agg1's gcn flag is default False and never be set to True in enc1's code.

so the self-loop of GCN is not considered in fact.

I wonder if there might be something wrong with it, isn't it?

Applicable license

Hi,

There is no license file in the repository. Under what license is this code released?

Potential typo - enc.num_sample[s]

In the model.py, whenever we are setting the number of samples for the encoders (enc1.num_samples), we should use enc1.num_sample rather than enc.num_samples

Is this for inductive learning?

Hello.
It seems that GraphSage incorporates feature information from neighbors even if the neighbor belongs to test data while building the model. I think this is not allowed in inductive learning.
Would you let me know if this code is for inductive learning?

Is this the implementation of transductive learning?

I want to confirm if the implementation is of inductive learning or transductive learning. From the code it looks like Variable adj_lists contain all the edges in the graph and therefore test edges are also used during training. However the results reported in the paper correspond to inductive learning. Please let me know if I am missing anything.

Do you have plan to implement unsupervised versions of GraphSAGE?

Hi, do you have plan to implement unsupervised versions of GraphSAGE? thanks.

Implementation of pool aggregator?

Hi William, since only Mean aggregator exists in this version, is Pooling aggregator also available? That would be quite helpful.

williamleif / graphsage-simple Goto Github PK

graphsage-simple's People

Contributors

Stargazers

Watchers

Forkers

graphsage-simple's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs