hkust-knowcomp / mne Goto Github PK

Source Code for IJCAI 2018 paper "Scalable Multiplex Network Embedding"

C++ 2.25% Shell 4.40% Batchfile 0.01% Makefile 25.09% C 65.38% Objective-C 0.10% M4 0.15% Python 0.63% Roff 0.03% TeX 1.96%

network-embedding multiplex-networks link-prediction

mne's Introduction

MNE(Multiplex Network Embedding)

This is the source code for IJCAI 2018 paper "Scalable Multiplex Network Embedding".

The readers are welcome to star/fork this repository and use it to train your own model, reproduce our experiment, and follow our future work. Please kindly cite our paper:

@inproceedings{zhang2018MNE,
  author    = {Hongming Zhang and
               Liwei Qiu and
               Lingling Yi and
               Yangqiu Song},
  title     = {Scalable Multiplex Network Embedding},
  booktitle = {Proceedings of the Twenty-Seventh International Joint Conference on
               Artificial Intelligence, {IJCAI} 2018, July 13-19, 2018, Stockholm,
               Sweden.},
  pages     = {3082--3088},
  year      = {2018},
  url       = {https://doi.org/10.24963/ijcai.2018/428},
  doi       = {10.24963/ijcai.2018/428},
  timestamp = {Sat, 28 Jul 2018 14:39:21 +0200}
}

Note that due to the size limitation of the repository, we only provide few small datasets to help you understand our code and reproduce our experiment. You are welcome to download those largest datasets by yourself or use your own dataset.

Requirement

Python 3
networkx >= 1.11
sklearn >= 0.18.1
gensim >= 3.4

Dataset

Here we provide Vickers dataset as an example, you can download all the other datasets from Twitter Higgs，Multiplex (old), or Multiplex (new). You can also use your own multiplex network dateset, as long as it fits the following template.

edge_type head tail weight
    r1     n1   n2    1
    r2     n2   n3    1
    .
    .
    .

Model

Before training, you should first create a folder 'Model' with:

mkdir Model

The embedding model will be automatically saved into that folder.

Train

To train the embedding model, simply run:

python3 train_model.py data/Vickers-Chan-7thGraders_multiplex.edges

You can replace the name of provided dataset with your own dataset.

Demo

To repeat the experiment in the paper, simply run:

python3 main.py data/Vickers-Chan-7thGraders_multiplex.edges

Acknowledgment

We built the training framework based on the original Gensim Word2Vec. We used the code from LINE, Node2Vec, and algorithm from PMNE to complete our experiment.

Others

If you have some questions about the code, you are welcome to open an issue or send me an email, I will respond to that as soon as possible.

mne's People

Contributors

Stargazers

Watchers

mne's Issues

How to adapt with our own method of embedding ?

Hello,

I have my own method of multiplex network embedding and I would like to test it on link prediction and compare it with your method. My embedding method needs the whole training multiplex graph.
How can I use the same graph for training ? I am a bit confused which data I have to use.
For example, line 457 in main.py, I don't understand why training_data_by_type['Base'] has more entries than edges in Vickers.edges.

Thank you

Does this program work only with directed graphs ?

From the data/###.edges file it is evident that this example graph is directed. If I want to use an undirected graph, do I need to make for every edge "a--b" one more edge "b--a" in edge list file?

From the 262. line in MNE.py, there is parameter directed = False but in your example, you have directed graph. Further, in line 331. it says:

directed = if False (default), take neighbour from two directions, if True, take neighbour from one direction.

Does this mean if we want to use a directed graph, we use directed = False and vice versa, if we have undirected graph we change directed = True ?

RuntimeWarning: overflow encountered in add

When use the dataset of C.ELEGANAS to train the model, i encounter the RuntimeWarning in MNE.py.

*"RuntimeWarning: overflow encountered in add
model.in_local[context_index] += (1-base_weight)dot(neu1e * lock_factor, model.in_tran.T)"

how can i fix it?

txt file

hello，Is the file "LINE_tmp_embedding1.txt" a node embedded representation of LINE? Can you provide this file?

Which vector do you use for node classification?

Hi, Hongming. I am wondering which vector do you use for node classification? Every node has a series of vectors, how can we get a unified vector for each node?

may you help understand why you have C++ code

do we need C++ code to use or only python
https://github.com/HKUST-KnowComp/MNE/tree/master/C%2B%2B

Error demo with main.py

Hello, I have an erro while trying to use the demo command:

Input: python3 main.py data/Vickers-Chan-7thGraders_multiplex.edges
Output; usage: main.py [-h] [--input [INPUT]] [--output [OUTPUT]]
[--dimensions DIMENSIONS] [--walk-length WALK_LENGTH]
[--num-walks NUM_WALKS] [--window-size WINDOW_SIZE]
[--iter ITER] [--workers WORKERS] [--p P] [--q Q] [--weighted]
[--unweighted] [--directed] [--undirected]
main.py: error: unrecognized arguments: data/Vickers-Chan-7thGraders_multiplex.edges

something about common embedding bn

Hi, Hongming ZHANG.
First, I would like to say that it is very nice of you to share your code with the community.
And I have a question. In your paper, the common bn is changed layer by layer. However, in the code, the bn is generated by all the walks before training for Xn and Un. Would it spends too much time?

Error with big multiplex network

Hi,
I applied your method on bigger network like for example Drosophila_Multiplex_Genetic, 7 layers Multiplex, Nodes: 8215, Edges: 43366, from https://comunelab.fbk.eu/data.php.

I have this error with your parameters:
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

I reduced the iter parameter to 10 but it is still the same. Any idea ?

Best.

Training the model is taking too much time.

Hi,
I'm trying to run the code with 2 layers and 13224 number of nodes. It's been more than 12 hours and the train_model.py script is still running.
Am I using too large a dataset or am I missing out on something?
Can you please help.

C extension not loaded for Word2Vec, training will be slow

MNE.py:680: UserWarning: C extension not loaded for Word2Vec, training will be slow. Install a C compiler and reinstall gensim for fast training

I met this problem and I tried everything(use conda to install, change the gensim, scipy, numpy version or use pip install on different version), still didn't work.

My PC environment is Win10, and I even tried three PCs to restart the whole MNE, still didn't work.
It cost me nearly 6 hours to find solutions on Google, Stack Overflow, Bing, Baidu and all kind of blogs. It's just not working on my PC. Hope someone can help me.

positive and negative edges

Hi, I have a question about the positive and negative edge sampling.
In your code, line 488, (I pasted here)

for edge in evaluation_data_by_type[edge_type]:
if edge[0] in tmp_training_nodes and edge[1] in tmp_training_nodes:
if edge[0] == edge[1]:
continue
selected_true_edges.append(edge)
if len(selected_true_edges) == 0:
continue

I don't quite understand how you select the positive edges. Why both nodes appear in the training set is a positive edge?

Getting an error during training the model

When I use the command "python3 train_model.py data/Vickers-Chan-7thGraders_multiplex.edges" I get this error. Would you help me solve the issue?

We are loading data from: data/Vickers-Chan-7thGraders_multiplex.edges
Finish loading data
Traceback (most recent call last):
File "train_model.py", line 13, in
model = train_model(edge_data_by_type)
File "../MNE/MNE.py", line 118, in train_model
base_network = network_data['Base']
KeyError: 'Base'

main.py

Implementation of MNE encountered the following problems:
D: \ phy \ MNE> python main.py data / Vickers-Chan-7thGraders_multiplex.edges
usage: main.py [-h] [--input [INPUT]] [--output [OUTPUT]]
                [--dimensions DIMENSIONS] [--walk-length WALK_LENGTH]
                [--num-walks NUM_WALKS] [--window-size WINDOW_SIZE]
                [--iter ITER] [--workers WORKERS] [--p P] [--q Q] [--weighted]
                [--unweighted] [--directed] [--undirected]
main.py: error: unrecognized arguments: data / Vickers-Chan-7thGraders_multiplex.edges.
How to solve it? Thank you!

C extension not loaded for Word2Vec

Hi everyone,

I tried to run MNE project but I got a warning "C extension not loaded for Word2Vec, training will be slow. Install a C compiler and reinstall gensim for fast training." and it takes much time to return results.

I tried to uninstall gensim and reinstall it but this does not resolve the problem.

Any help, please!

Segmentation fault

I recompile line.cpp，then ./line.. ,but I get an "Segmentation fault (core dumped)" error。
And when I run main.py, I get an "TypeError: object of type 'dict_keyiterator' has no len()" error.

Does base embedding only update when sending aggregate graph?

Hi, I have a question about your code.
Correct me if I am wrong. It seems that you use the base_weight to control whether to update the vectors. But base vectors are updated only when base_weight=0, that means, your base vector is not co-learned with the local vectors and the transition matrix, right? So you first compute the base vectors using the aggregate graph(base network in your code) and then compute local vectors and transition matrix for other graphs? Or do I miss something?

may you help understand how it is better than other approaches

VERSE: https://arxiv.org/abs/1803.04742
https://github.com/xgfs/verse
and

node2vec: Scalable Feature Learning for Networks.

https://snap.stanford.edu/node2vec/
https://github.com/eliorc/Medium/blob/master/Nod2Vec-FIFA17-Example.ipynb
https://towardsdatascience.com/node2vec-embeddings-for-graph-data-32a866340fef
https://github.com/aditya-grover/node2vec
plain python code
https://github.com/eliorc/node2vec

Why are not the program implemented with tensorflow

Why is not the program implemented with tensorflow.

error in Node2Vec_LayerSelect.py", line 47, in node2vec_walk if len(cur_nbrs) > 0:

Hongming
great thanks
the code responsible for LINE is commented but now there is another error
Ma you please help to fix it:

File "E:\graphs ML\code\Scalable_Multiplex_Network_Embedding_MNE_27may_removed_LINE_Cpp_code\MNE-master\MNE_main_original.py", line 537, in <module>
  performance_1, performance_2, performance_3 = Evaluate_PMNE_methods(merged_networks)
File "E:\graphs ML\code\Scalable_Multiplex_Network_Embedding_MNE_27may_removed_LINE_Cpp_code\MNE-master\MNE_main_original.py", line 412, in Evaluate_PMNE_methods
  MK_walks = MK_G.simulate_walks(args.num_walks, args.walk_length)
File "E:\graphs ML\code\Scalable_Multiplex_Network_Embedding_MNE_27may_removed_LINE_Cpp_code\MNE-master\Node2Vec_LayerSelect.py", line 77, in simulate_walks
  walks.append(self.node2vec_walk(walk_length=walk_length, start_node=node, G=G))
File "E:\graphs ML\code\Scalable_Multiplex_Network_Embedding_MNE_27may_removed_LINE_Cpp_code\MNE-master\Node2Vec_LayerSelect.py", line 47, in node2vec_walk
  if len(cur_nbrs) > 0:

builtins.TypeError: object of type 'dict_keyiterator' has no len()

2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 5 more threads
2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 4 more threads
2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 3 more threads
2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 2 more threads
2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 1 more threads
2018-05-27 08:26:06,719 : INFO : worker thread finished; awaiting finish of 0 more threads
2018-05-27 08:26:06,719 : INFO : EPOCH - 100 : training on 5111 raw words (975 effective words) took 0.0s, 255010 effective words/s
2018-05-27 08:26:06,719 : INFO : training on a 511100 raw words (96045 effective words) took 1.7s, 57795 effective words/s
Performance of PMNE method two: 0.8566698841698841

how to use Twitter data set?

as stated in paper only Twitter data set may show real code performance
which dataset from
http://deim.urv.cat/~manlio.dedomenico/data.php
should be used and what is python code file to build an interface for this repo
thanks

there are many data sets

HIGGS TWITTER | The Higgs dataset has been built after monitoring the spreading processes on Twitter before, during and after the announcement of the discovery of a new particle with the features of the elusive Higgs boson on 4th July 2012. The messages posted in Twitter about this discovery between 1st and 7th July 2012 are considered.Ref: M. De Domenico, A. Lima, P. Mougel and M. Musolesi. The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013).
-- | --

and
Friends/follower graph

Nodes: 456631
Edges: 14855875
Graph of who retweets whom

Nodes: 425008
Edges: 733647
Graph of who replies to who

Nodes: 37366
Edges: 30836
Graph of who mentions whom

Nodes: 302975
Edges: 449827

HIGGS MULTIPLEX | Multiplex of social interactions in Twitter corresponding to the different actions (friendship, replying, mentioning and retweeting) monitored in the Higgs dataset (see above)There are two multiplex networks: 1) two layers, friendship and aggregated interactions, respectively; 2) four layers, friendship and each type of interaction in each layer separately. See more details in the webpage dedicated to Higgs RumorRef: M. De Domenico, A. Lima, P. Mougel and M. Musolesi The Anatomy of a Scientific Rumor. (Nature Open Access) Scientific Reports 3, 2980 (2013).
-- | --



2-layers Multiplex

Nodes: 456631
4-layers Multiplex

Nodes: 456631

About the file question

Hello author, it seems that some necessary documents are missing here

Traceback (most recent call last):
  File "E:\Mylibrary\Multiplex Embedding\MNE\main.py", line 547, in <module>
    LINE_model = train_LINE_model(training_data_by_type[edge_type])
  File "E:\Mylibrary\Multiplex Embedding\MNE\main.py", line 275, in train_LINE_model
    first_order_embedding = read_LINE_vectors('LINE_tmp_embedding1.txt')
  File "E:\Mylibrary\Multiplex Embedding\MNE\main.py", line 250, in read_LINE_vectors
    file = open(file_name, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'LINE_tmp_embedding1.txt'

LINE_tmp_embedding1.txt

Hi, MNE execution with python2.7 and python3.5 both encountered the following problems:
'LD_LIBRARY_PATH' is not an internal or external command and is not a runnable program
Or batch files.
finish training
Traceback (most recent call last):
   File "main.py", line 513, in
     LINE_model = train_LINE_model (training_data_by_type [edge_type])
   File "main.py", line 267, in train_LINE_model
     first_order_embedding = read_LINE_vectors ('LINE_tmp_embedding1.txt')
   File "main.py", line 243, in read_LINE_vectors
     file = open (file_name, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'LINE_tmp_embedding1.txt'

Also, how is Vickers-Chan-7thGraders_multiplex.edges generated? Thank you

Node classification

Hi,

I would like to replicate the results on the node classification task. Could you guide me on what should I do in order to replicate your results?

Thank you!
Looking forward to your answer!

Output files

I trained the model with its default dataset. However, it's not clear how to use the output. There are files named "tran_.dat" , "addition_.dat" and "base.dat". When I open them with a text editor, none of them is readable. It seems they are stored as a binary file. What steps should I take now?

main.py: error: unrecognized arguments: data/Vickers-Chan-7thGraders_multiplex.edges

when I run the demo python3 main.py data/Vickers-Chan-7thGraders_multiplex.edges as recommended, this error occurs. I think it's because the command line arguments not including the dataset.

line.cpp:(.text+0x13a8): undefined reference to `gsl_rng_uniform'

hello，I can't generate LINE's embedded file "LINE_tmp_embedding1.txt", so I want to recompile “line.cpp”, but there is an "line.cpp:(.text+0x13a8): undefined reference to `gsl_rng_uniform'
" error, how can I solve it?

error: builtins.TypeError: init() missing 1 required positional argument: 'vector_size'

Please help to fix error:
was used Windows computer
nx.version
'2.1'
gs.version
'3.4.0'

messages from start
C:\Users\sndr\Anaconda3\lib\site-packages\gensim\utils.py:1197: UserWarning: detected Windows; aliasing chunkize to chunkize_serial
warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")
We are loading data from: Vickers-Chan-7thGraders_multiplex.edges
Finish loading data
finish building the graph
Slow version of MNE is being used

error
File "E:\graphs ML\code\Scalable_Multiplex_Network-EmbeddingMNE_master\MNE-master\train_model.py", line 14, in
model = train_model(edge_data_by_type)
File "E:\graphs ML\code\Scalable_Multiplex_Network-EmbeddingMNE_master\MNE-master\MNE.py", line 116, in train_model
base_embedding, _, _, context_embedding, index2word = train_embedding(None, base_walks, 'Base', 100, 10, 1)
File "E:\graphs ML\code\Scalable_Multiplex_Network-EmbeddingMNE_master\MNE-master\MNE.py", line 104, in train_embedding
new_model = MNE(training_data, size=200, window=5, min_count=0, sg=1, workers=4, iter=iter, small_size=info_size, initial_embedding=initial_embedding, base_weight=base_weight)
File "E:\graphs ML\code\Scalable_Multiplex_Network-EmbeddingMNE_master\MNE-master\MNE.py", line 322, in init
self.initialize_word_vectors()
File "E:\graphs ML\code\Scalable_Multiplex_Network-EmbeddingMNE_master\MNE-master\MNE.py", line 366, in initialize_word_vectors
self.wv = KeyedVectors()

builtins.TypeError: init() missing 1 required positional argument: 'vector_size'

error location
def initialize_word_vectors(self):
self.wv = KeyedVectors()

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

Hi Hongming,

Thanks greatly for your job! I ran your code with the given dataset successfully but encountered a problem with my own datasets.

The format of my datasets is:

layer	head	tail
l1	n1	n2
l2	n2	n3
...

There are two datasets(two layers multiplex networks). One has 609 nodes and 6604 edges(including two layers), and the other has 892 nodes and 21704 edges. I ran the codes with them and got the same error just shown in title. So I want to know whether my datasets is too large for my computer or other problems of my experience.

The configuration of my computer is:

  Memory 7.7GiB
  Process Intel® Core™ i7-4790 CPU @ 3.60GHz × 8 
  OS type 64-bit
  Disk 115.8GB