muhanzhang / seal Goto Github PK

SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). "M. Zhang, Y. Chen, Link Prediction Based on Graph Neural Networks, NeurIPS 2018 spotlight".

MATLAB 9.02% C 66.35% Python 3.12% Shell 0.82% Lua 0.35% C++ 11.81% Makefile 0.72% Perl 0.09% HTML 2.13% Java 2.60% M4 2.01% Batchfile 0.02% Scala 0.98%

seal's Introduction

SEAL -- learning from Subgraphs, Embeddings, and Attributes for Link prediction

About

Code for SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction). SEAL is a novel framework for link prediction which systematically transforms link prediction to a subgraph classification problem. For each target link, SEAL extracts its h-hop enclosing subgraph A and builds its node information matrix X (containing structural node labels, latent embeddings, and explicit attributes of nodes). Then, SEAL feeds (A, X) into a graph neural network (GNN) to classify the link existence, so that it can learn from both graph structure features (from A) and latent/explicit features (from X) simultaneously for link prediction.

For more information, please check our paper:

M. Zhang and Y. Chen, Link Prediction Based on Graph Neural Networks, Advances in Neural Information Processing Systems (NIPS-18). [PDF]

Version

SEAL is implemented in both MATLAB and Python. The MATLAB version was used to generate the experimental results in the paper, which also contains the evaluation code of other baseline methods. The Python software has better flexibility and scalability.

There is also a PyTorch Geometric implementation here, which tests SEAL on the open graph benchmark (OGB) datasets. It also supports Planetoid datasets such as Cora and CiteSeer, as well as custom Pytorch Geometric datasets.

Note

Neither embeddings nor attributes are necessary for SEAL. In most networks, SEAL can learn a very good model without using any embeddings or attributes (thus leveraging purely graph structures). As the experiments show, including embeddings in X might even hurt the performance. SEAL becomes an inductive link prediction model if we do not include node embeddings in X.

Reference

If you find the code useful, please cite our paper:

@inproceedings{zhang2018link,
  title={Link prediction based on graph neural networks},
  author={Zhang, Muhan and Chen, Yixin},
  booktitle={Advances in Neural Information Processing Systems},
  pages={5165--5175},
  year={2018}
}

Muhan Zhang, Washington University in St. Louis [email protected] 9/5/2018

seal's People

Contributors

Stargazers

Watchers

Forkers

songfgh jianxingma masoudmlk jiangyunchao sunshine-xy xusshuai zxtxjtu klqulei wangzhen263 wakupoo ml-lab zzwloveai oj9040 for30ds lihfan kkteru zwytop iababio 520-99-025 shishi0129 miliana wallace-he hhh920406 juexinwang tensorflow-pool hercodes ydsun guangyuanhuang jhb115 aditya-iitd gsj1029 simba2017 feilong0309 hanxuebo thanhduyluu hkharryking adg4343433 sheldonresearch paarthgupta govindjsk amalnammouchi seungjun45 snubeaver abel0828 dreamingraven youngadvance leicaiwsu wei0699 nicfang reagankan kunwu522 amirhekmatshoar chaevans-maven antoalli spyderweb-abdul freekang xuchensjtu loppol38 shualite hwolfeng fhuang233 changee3000 zoom-wang112358 baekg arnabjana1999 anish-mm kinkir mldl gxm1141 sawndip zj-lover peizhenbai chenwencuhk shahabmosallaie malcolmgreaves wayfear chaoswin denhim luciusdecker yishingene shubhampachori12110095 jqmcginnis fduluke xiaoqianliu1 chenhuayou alexzzlin ajrhys fanta007 kangjiahua tianshan95 hanbei969 lyw1001 ht445 yliuhz ggzhang0071 dankejame leiqh549 xiexiaqing tiaotiaopig yeongukang

seal's Issues

Questions about the paper

In your paper, I find some information are included in the supplementary material, but I found there is no such part in the paper. Where can I get it? Thank you.

Share baselines code

Great job of the paper and code!!! It is really impressive for me. BTW, would u mind also sharing the MF and SBM baselines code in Table 2 (publicly or privately)?

Inquiry about scalability

Hi Muhan,

Thank you for sharing this great code.

I am trying on large datasets like around 1800 nodes, 28000 links. However, the subgraph extraction process takes a lot of time (on USAir, it is pretty fast). Is this something anticipated? What is the complexity of this process? or if I am doing something wrong? Thank you!

GPU running

Hello
Excuse me, when i run the main.py with USAir, the code running on my CPU
How can i run SEAL on GPU
I installed CUDA, torch and DGCNN too.
thanks

DGCNN

In order to use this code, shell I download pytorch_DGCNN first? And the DGCNN is based on structure2vec, so could you please simplify the dependency and I believe DGCNN will become a benchmark on graph deep learning! Thanks a lot!

A small question

Hi, Mr.Zhang, Can I speak Chinese? My English is too bad.
非常感谢您的代码，他对我很有用。我下载之后使用Python版本，想跑一下，但是这里一直有问题，util_function里面的form util import GNNGraph，我试图安装第三方包，但是一直没有成功，想问一下util这个文件是您自己实现的吗？如果是的话，可能您没给出来麽？

main program

Hello, run the main program to ask the following questions:
'rm' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
Saving split-1 of 3...
错误使用 SEAL (line 90)
由于 'tempdata\USAir_1' 不存在，无法创建 'split_1.mat'。

出错 Main (line 83)
parfor (ith_experiment = 1:numOfExperiment, workers)

how to solve it, thank you

NameError: name 'cmd_args' is not defined

张博士，您好！使用python3.5执行SEAL遇到错误，NameError: name 'cmd_args' is not defined，不知如何解决！谢谢！

WCN, WAA

Hi, I found weighted WCN and WAA in your project, but I'm confusing about how 'alpha' parameter is assigned? Depends on cases? Can you please tell me anything I could do if I want to experiment on WCN and WAA? Thanks a lot

Requirement for the DRNL algorithm

Have you conducted experiments without the DRNL algorithm? Did it make a difference to the results?

关于node2vec的设

请问，对比实验中，node2vec是用到了那种edge feature?原文中有四种方式，但是原文中在link prediction实验中参数也没有说清楚。请问你用的哪种方式？参数又是如何设置的？还请赐教。

Error in loading .dat file

Running on USAir_1...
/home/RAAVAN/torch/install/bin/luajit: cannot open <tempdata/USAir_1/USAir_1.dat> in mode r at /tmp/luarocks_torch-scm-1-5020/torch7/lib/TH/THDiskFile.c:673
stack traceback:
[C]: at 0x7f0fb4144160
[C]: in function 'DiskFile'
/home/RAAVAN/.luarocks/share/lua/5.1/torch/File.lua:405: in function 'load'
../DGCNN/main.lua:331: in function 'load_data'
../DGCNN/main.lua:747: in main chunk
[C]: in function 'dofile'
...AVAN/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x004056e0

This error is coming from DCGNN

dilibgnn.so error

hello
when i run the main.py i have error like this:
pythonw.exe - Bad Image
E: SEAL-master pytorch DGCNN\lib build\dilibgnn.so is either not designed to run on Windows or it contains an error. Try installing the program again using the original installation media or contact your system administrator or the software vendor for support. Error status Oxc000012f.
what should i do?

how to run SEAL in notebook jupyter google colab

hi Mr. Muhan !
I am a student in master data scientist in Morocco , I a have a project in link prediction in Social network , and I have read some papers and I have implemented them, and now I am reading your paper but when I try to implemented in Google Colab, i get a problem because cpp doesn't run in google jupyter notebook how can run the SEAL in jupyter notebook ?

Thanks for your time and help!

alias_setup function in node2vec.py

Could you tell me the math theory about this function? And the website you refer can't connect, I can't see the details

Some application problems

My English is not very good，please allow me to use Chinese.
张博士你好，我的本科毕设论文是基于图卷积神经网络的社交网络链路预测，想问一下您的SEAL算法能够符合我这个人题目吗？以及我是否可以不配置嵌入与节点属性来完成这个课题？

Reproduce the experiment in the paper

I want to reproduce the experiments in the paper Link Prediction Based on Graph Neural Networks. However, I cannot get the sota results recorded in the paper. Take USAir for example, I run python Main.py --data-name USAir --hop 'auto' --test-ratio 0.1 and get best accuracy at 0.92. Are there any special settings that I ignore? Thanks.

怎样在您的代码里面添加一个指标Precision

如何在您的代码里面的最终结果中在添加一个指标Precision？

Questions regarding methods

Hi,

I had a couple questions about the methods:

1). The GNN training is for each subgraph extraction, yes? If so, how are the GNNs related to one another if the parameterization will be different for each GNN?
2). How exactly is the edge information encoded into the learning? To my understanding, you would be taking two nodes, producing the feature matrix for the neighbors in the enclosing subgraph, apply the GNN, and produce a probability score. If the enclosing subgraph could be different sizes depending on the topology, how is this addressed?

Thanks!

from mian import *

hello , main package in Ubuntu environment, download is unsuccessful。

Running setup.py install for main ... error
Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;file='/tmp/pip-build-fJikpq/main/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record /tmp/pip-pB5uyz-record/install-record.txt --single-version-externally-managed --compile --user --prefix=:
running install
running build
running build_scripts
creating build
creating build/scripts-2.7
error: file '/tmp/pip-build-fJikpq/main/main' does not exist

Node Attributes and Edge Attributes

Hi, I am trying to use the SEAL to figure out link prediction problem on multi-relational graphs. And I want to ask you two questions:

parser.add_argument('--use-attribute', action='store_true', default=False, help='whether to use node attributes')
Is the code above for adding node attributes in training process?
And For example, in my own dataset they are 1000 nodes, divided into 6 classes. I also want to use the class information for training the model.
How can I build the file (attribute.txt/attribute.mat)? What is the format of the file for adding node attributes?
You model can only predict whether two nodes have link or not, am I right? Is there any way to extend it to multi relational link prediction?

Thanks

Question about the dataset

Hi, this is a such an interesting paper!
I want to train SEAL - Python version - to a specific dataset, e.g. The Human Interactome. I saw that the data is expressed in .mat form composed by net and group, for explicit feature, and in raw form which is just a .txt file. But most of the interactome data is in form with the protein-protein which is an alphanumeric name. How have you done the mapping between node name <-> node number?

Saved models for given datasets

Hi Dr. Zhang!

First of all - thank you for your great work. Your paper helped me to understand many concepts and inspired me a lot.

Do you occasionally have saved models and hyper parameters (.pkl and .pth files) for some of your included datasets? Particularly for the "Facebook"?

I am trying to train it myself but it seems like it is going to take very long time.

continue waiting for output?

it' a great job! I want to know in the status below, what should i going to do, waiting or some error already happened.

label

Hello, there is no label data set in your code data set. How did you get the label data set?

Some issues about reproducing the baseline

Hi, Dr. Zhang
The code you provided is very detailed so I can learn a lot, thank you! However, I still have some problems when reproducing the benchmarks. So, I hope you can give me some guidance.
The first problem, I reproduced the node2vec on "facebook" without changing anything, but the AUC is just about 0.5(in your paper, it is 0.99 ), I have tried many times and used diffferent "p" and "q", but the result had no improve, is there any detail I haven’t noticed? Meanwhile, the embedding method "SPC" could produce the some result as in the paper.
The second problem, when I used the dataset "arxiv", the code "cmd = sprintf('python ../../software/node2vec/src/main.py --input %s.edgelist --output %s.emd --p %f --q %f --dimensions 128 --window-size 10', data_name, data_name, p, q); system(cmd);" could run successfully but the result "data_name.emd" could not be saved in tempdata, but in other datasets, this problem could not happen, so what could be the reason?
By the way, I reproduced these experiments under windows10, using matlab2017a.
Thank you very much!

'Namespace' object has no attribute 'attr_dim'

Hi, Muhan

I just found that it seems main.py in GCNN misses something as following:

Could you give me some advice? What does this attr_dim? Then what does the feat_dim mean? Thank you~

Xu Chen

Link prediction with multiple node types

Hey there Muhan,

I was wondering if SEAL worked on graphs with multiple types of nodes (i.e. heterogeneous networks). Take the graph below, for example:

If I would like to my GNN to predict only if there is a link between between green circles and blue triangles, would I just make sure that the central nodes x and y are blue triangles and green circles, or vice versa during the subgraph extraction process?

Thanks!

New Nodes => New Edges

Hello,
if I have new nodes on my graph, ie, no observed links for it, and I want tp predict the possible links for these new nodes, how to process ?

AttributeError: module 'node2vec' has no attribute 'Graph'

当我尝试python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding出现了这错误
以下是我的完整的错误报告（包括我自己添加的打印出A的结果）
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=True, data_name='NS', hop='auto', max_nodes_per_hop=None, max_train_num=100000, no_cuda=False, no_parallel=False, only_predict=False, save_model=False, seed=1, test_name=None, test_ratio=0.5, train_name=None, use_attribute=False, use_embedding=True)
sampling negative links for train and test
Traceback (most recent call last):
File "Main.py", line 138, in
embeddings = generate_node2vec_embeddings(A, 128, True, train_neg)
File "/home/czw/SEAL/Python/util_functions.py", line 216, in generate_node2vec_embeddings
G = node2vec.Graph(nx_G, is_directed=False, p=1, q=1)
AttributeError: module 'node2vec' has no attribute 'Graph'
(base) czw@sy-NF5280M5:~/SEAL/Python$ python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=True, data_name='NS', hop='auto', max_nodes_per_hop=None, max_train_num=100000, no_cuda=False, no_parallel=False, only_predict=False, save_model=False, seed=1, test_name=None, test_ratio=0.5, train_name=None, use_attribute=False, use_embedding=True)
sampling negative links for train and test
(1588, 0) 1.0
(4, 1) 1.0
(5, 1) 1.0
(5, 3) 1.0
(1, 4) 1.0
(5, 4) 1.0
(1, 5) 1.0
(3, 5) 1.0
(4, 5) 1.0
(7, 6) 1.0
(8, 6) 1.0
(10, 6) 1.0
(6, 7) 1.0
(6, 8) 1.0
(9, 8) 1.0
(10, 8) 1.0
(1423, 8) 1.0
(1531, 8) 1.0
(8, 9) 1.0
(10, 9) 1.0
(6, 10) 1.0
(8, 10) 1.0
(9, 10) 1.0
(12, 11) 1.0
(1047, 11) 1.0
: :
(336, 1571) 1.0
(630, 1571) 1.0
(1569, 1571) 1.0
(1570, 1571) 1.0
(1572, 1571) 1.0
(336, 1572) 1.0
(630, 1572) 1.0
(1571, 1572) 1.0
(782, 1573) 1.0
(1575, 1574) 1.0
(1577, 1574) 1.0
(1574, 1575) 1.0
(1576, 1575) 1.0
(1575, 1576) 1.0
(1577, 1576) 1.0
(1574, 1577) 1.0
(1576, 1577) 1.0
(1583, 1582) 1.0
(1582, 1583) 1.0
(1586, 1584) 1.0
(1584, 1586) 1.0
(75, 1587) 1.0
(521, 1587) 1.0
(0, 1588) 1.0
(1083, 1588) 1.0
Traceback (most recent call last):
File "Main.py", line 138, in
embeddings = generate_node2vec_embeddings(A, 128, True, train_neg)
File "/home/czw/SEAL/Python/util_functions.py", line 216, in generate_node2vec_embeddings
G = node2vec.Graph(nx_G, is_directed=False, p=1, q=1)
AttributeError: module 'node2vec' has no attribute 'Graph'

btw，我使用readme.md其他的类型的跑代码运行是可以顺利跑通的。

实际上我测试了python Main.py --train-name PB_train.txt --test-name PB_test.txt --hop 1这个，是可以正确运行的，包括我自己定义了对应的数据集也是可以正确取得结果的。但是我加上--use-embedding就会出现和这个一样的错误，最后我才直接测试了python Main.py --data-name NS --test-ratio 0.5 --hop 'auto' --use-embedding出现了这样的结果，打印出来的A的形式是一样的。不知道哪里出现了问题

which os are you working on?

I am testing your USair dataset. I found I encounter several error concerning cuda issues? Are you working on linux or windows system. I am currently working on mac system, maybe this is cause for this error?

Determining the value of h

In util_functions. py the value of "h" gets selected between 1 and 2, on the basis of
if val_auc_AA >= val_auc_CN: h = 2 print('\033[91mChoose h=2\033[0m') else: h = 1 print('\033[91mChoose h=1\033[0m')

From what I understood from the paper, the value of h can remain small because of the gamma decaying property. Which is why,

Is there a particular reason why you check if the val_auc_AA >= val_auc_CN to set the value of h ?
Can I set the value of h arbitrarily between 1 and 2? Will it not give accurate results then?

Thank you.

EColi .txt file

Hi Muhan,

I am using the datasets you provided in the repo, however I cannot find the Ecoli dataset in .txt version. Do you know where I could find it or I am looking somewhere wrong?

Thanks,
Jozef

Where should change if apply this method in directed graph?

In the paper, the G assume to be undirected graph, if i want to apply the SEAL on directed network, where are the different points?
Thank you!

编码

loss: 0.27501 acc: 0.87500: 100%|鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚枅鈻堚
鈻堚枅鈻堚枅鈻堚枅| 9/9 [00:00<00:00, 12.10batch/s]

请问我应该在哪里修改？

OSError: dlopen(.../pytorch_DGCNN/lib/build/dll/libgnn.so, 6): no suitable image found.

Hi, when I try to run, I receive the OSError:

OSError: dlopen(.../pytorch_DGCNN/lib/build/dll/libgnn.so, 6): no suitable image found.

I've found online and can not solve this problem(even I use virtualenv), do you know what's wrong with it?

Actually, whatever python3 or python2.7 I use, as long as I run:

import ctypes
ctypes.CDLL(".../pytorch_DGCNN/lib/build/dll/libgnn.so")

I've got OSError:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ctypes/__init__.py", line 366, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(.../pytorch_DGCNN/lib/build/dll/libgnn.so, 6): no suitable image found.  Did find:
	.../pytorch_DGCNN/lib/build/dll/libgnn.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
	.../pytorch_DGCNN/lib/build/dll/libgnn.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00

Inquiry about consumption of memory

Dear Dr. Zhang,

I am running on 12-core server with 220G memory and 4 GPUs. The data has 2511 nodes, 37154 edges and 9073 attributes, and the average degree is 29.59.

In SEAL, max-train-num is set to 10000 and args.max_nodes_per_hop is set to 100.

The algorithm runs successfully when hop = 1. However, in the case of hop =2, it throws memory error due to memory exhaustion.

Could you give me suggestions pls?

Thanks
Wei

Details about the train/test data

I saw that in your PB_train.txt and PB_test.txt, there are observed links for training, as well as "future" links for testing, right?

Do they share the same node set? meaning, if test file contains a node id that doesn't appear in the train file, it it ok? or all the node ids should be specified in train file first? (I know that, if there is a future link both nodes of which didn't appear in train file, it would not be predictable since it is too far away from the community. But I just wonder for an engineering-concern about coding. 😄 )

Another question, for the testing file, do you mean that, if a link is in the test file, it is TRUE, otherwise it is FALSE, when you compute the ACC ? (Meaning the test file shouldn't contain only a subset of test links )

Thank you very much!! 😸

how can add explicit features into SEAL?

hi,
as your readme.md said, "The node attributes are assumed to be saved in the group of the .mat file."(In the Usage part), I look into the file and it's look like as following:

I want to know that

how to interpret these lines? such as (131, 0) 1.0
how to add explicit features into the SEAL if i have some profile information of each nodes on the network?

thanks.

About only_predict

Hi, Dr. Zhang.
I have a question regarding to the only_predict method. With the command "python Main.py --train-name PB_train.txt --test-name PB_test.txt --hop 1 --only-predict", does the model first build a network based on the PB_train.txt, then predict if the link pairs in PB_test.txt is valid in the PB_train network?

Thanks

node2vec : TypeError: object of type 'map' has no len()

I tried to run the node2vec code on Zachary's karate club network by executing !python3 src/main.py --input graph/karate.edgelist --output emb/karate.emd on google colab but I get this error :

Walk iteration:
1 / 10
2 / 10
3 / 10
4 / 10
5 / 10
6 / 10
7 / 10
8 / 10
9 / 10
10 / 10
Traceback (most recent call last):
File "src/main.py", line 104, in
main(args)
File "src/main.py", line 100, in main
learn_embeddings(walks)
File "src/main.py", line 87, in learn_embeddings
model = Word2Vec(walks, size=args.dimensions, window=args.window_size, min_count=0, sg=1, workers=args.workers, iter=args.iter)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 767, in init
fast_version=FAST_VERSION)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 759, in init
self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/base_any2vec.py", line 936, in build_vocab
sentences=sentences, corpus_file=corpus_file, progress_per=progress_per, trim_rule=trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1571, in scan_vocab
total_words, corpus_count = self._scan_vocab(sentences, progress_per, trim_rule)
File "/usr/local/lib/python3.6/dist-packages/gensim/models/word2vec.py", line 1556, in _scan_vocab
total_words += len(sentence)
TypeError: object of type 'map' has no len()

PrepareFeatureLabel throwing out of bounds error

Hi, thanks for sharing your code and related paper . I tried training it on my own data and saved the model and then tried to predict link probability using the command below. Please could you throw some light on what could have caused the error below

python Main.py --data-name DATA --train-name DATA_train.txt --test-name DATA_test.txt --hop 1 --use-attribute --max-nodes-per-hop 50 --max-train-num 50000 --only-predict
====== begin of gnn configuration ======
| msg_average = 0
====== end of gnn configuration ======
Namespace(all_unknown_as_negative=False, batch_size=50, cuda=False, data_name='DATA', hop='1', max_nodes_per_hop='50', max_train_num=50000, no_cuda=False, no_parallel=False, only_predict=True, save_model=False, seed=1, test_name='DATA_test.txt', test_ratio=0.1, train_name='DATA_train.txt', use_attribute=True, use_embedding=False)
sampling negative links for train and test
Enclosing subgraph extraction begins...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 32/32 [02:18<00:00, 2.18s/it]
Time eplased for subgraph extraction: 147.22653317451477s

test: 62717

Initializing DGCNN
Traceback (most recent call last):
File "Main.py", line 190, in
predictions.append(classifier(batch_graph)[0][:, 1].exp().cpu().detach())
File "/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in call_impl
result = self.forward(*input, **kwargs)
File "/Code/SEAL/Python/../../pytorch_DGCNN/main.py", line 119, in forward
feature_label = self.PrepareFeatureLabel(batch_graph)
File "/Code/SEAL/Python/../../pytorch_DGCNN/main.py", line 90, in PrepareFeatureLabel
node_tag.scatter(1, concat_tag, 1)
RuntimeError: index 38 is out of bounds for dimension 1 with size 33

The use of the network

I get a new graph and i want to add some edges. I have already trained this network but how i can use this trained network to predict links in my graph. And see the result. I want to konw changes of my graph. Thank you so much.

A.eliminate_zeros() # make sure the links are masked when using the sparse matrix in scipy-1.3.x AttributeError: 'numpy.ndarray' object has no attribute 'eliminate_zeros'

A.eliminate_zeros() # make sure the links are masked when using the sparse matrix in scipy-1.3.x
AttributeError: 'numpy.ndarray' object has no attribute 'eliminate_zeros'
张博士，您好！使用python3.8执行SEAL遇到错误，我也把scipy包版本降下来了依旧报错，不知如何解决！谢谢！

How to determine the index of the nodes in the enclosing subgraph?

Hi~
From your paper, we can know that the GNN takes (A, X) as input, I have a question from this:
In my opinion, the label of enclosing subgraph assigned by the method called Double-Radius Node Labeling is nodes' tag which equivalent to the nodes type in data mutag or proteins mentioned in the GNN, not the index of nodes, so how to determine the index of node in the enclosing subgraph and construct the another input A?

About running time

Hi muhan,

Thanks for sharing the code, I've run it by both python Main.py --data--name USAir --hop 'auto' --batch-size 1 and python Main.py --data--name USAir --hop 1 --batch-size 1, the sampling of subgraph run smoothly, but it seems to suck after print Initializing DGCNN.
The previous command has run over 2 days still without output file auc_results.txt. And now the second command has run over 4 hours without printing a log or file.
I thought it was because the subgraph is very large due to 'auto -hop', but still suck after changing hop arg to 1 hop
does it correct, did u spend time on training and get auc results?

Data Leakage Issue

Hi there,

I just had a question about data leakage regarding SEAL. Say that there are two subgraphs where one is in the training set and the other is in the testing set. They happen to overlap a bit (i.e. share some of the same nodes and edges), but they are distinct subgraphs, and the link trying to be predicted is different between the two of them. If I am understanding correctly, this is not a data leakage issue because SEAL treats each subgraph as its own entity and does not care about the subgraphs position in the larger network, and DRNL is done on individual subgraphs. Thus, they happen to overlap in the larger network, but that has no implications on SEAL's performance, so it is okay for one of them to be in the testing set and the other to be in the training set. Sorry if this is a basic question; I just wanted to clarify my thinking.

Cheers!

How are your predicted results stored and saved?

Hi. great work. How can i have access to your predicted link results on those eight datasets? Any codes for that ? Currently, I got the auc and acc results stored in txt files.

Great work..

About DGCNN hyper-parameter

Hi, Dr. Zhang

I have a question regarding to the DGCNN hyper-parameter "cmd_args.feat_dim". It seems like this parameter is set to 16. May I ask what is this parameter is used for? I was trying to run test on my data, and I run into an error. In main.py line 87, the concat_tage contains values greater than 16, which it lead to invalid index error when the "node_tag.scatter_(1, concat_tag, 1)" is called. Is there any suggestion that can help me solve this problem?

Thanks.

ImportError: No module named networkx

Run Main.m using MATLAB, turns out this error:

Traceback (most recent call last):
  File "../../software/node2vec/src/main.py", line 14, in <module>
    import networkx as nx
ImportError: No module named networkx
Error using dlmread (line 62)
The file 'data/embedding/USAir_1.emd' could not be opened because: No such file
or directory

Error in generate_embeddings (line 59)
    node_embeddings = dlmread(['data/embedding/', data_name, '.emd']);

Error in graph2mat (line 62)
    node_embeddings = generate_embeddings(A1, data_name_i, emd_method);

Error in SEAL (line 66)
    [data, max_size] = graph2mat([train_pos; train_neg], [test_pos; test_neg],
    A, h, ith_experiment, 0, data_name, include_embedding, include_attribute);

Error in Main (line 65)
    parfor (ith_experiment = 1:numOfExperiment, workers)

I think the problem is MATLAB does not detect the correct python environment. Can you please help me to solve this?