mSHINE

The official Tensorflow implementation of mSHINE: A Multiple-meta-paths Simultaneous Learning Framework for Heterogeneous Information Network Embedding. mSHINE.

Package Version

 Keras-Preprocessing==1.1.2
 numpy==1.19.2
 scikit-learn==0.24.1
 scipy==1.4.1
 tensorflow-gpu==2.2.0

Data Preparation

The experimental datasets used in paper are available at https://drive.google.com/file/d/1g3Ln0fzCIqUO7A1GTZpKntXSUaiJdqgZ/view?usp=sharing

To run mSHINE on your HIN whose name is XXX(we use the dataset cora as an example blow), two files should be provided: 1)XXX.hin (HIN dataset); 2)XXX.config (Meta-path info)

The supported input HIN format is an edgelist (separated by space):

 node_type_1:node_id_1 node_type_2:node_id_2 edge_weights node_type_1:node_type_2
 ...

Generate XXX.config:
- XXX.relation file which lists all the possible egde types should be provided. The format of XXX.relation:
```
 node_type_1-node_type_2
 node_type_2-node_type_1
 node_type_1-node_type_3
 ...
```
(NOTE: The edge is assumed to be directed by default, you need to use two directed edges to represent an undirected edge type.)
- To generate XXX.config:
```
 python config_file_gen/metapath_gen.py --dn cora --output_dir config_file_gen/config_files/ --relation_dir config_file_gen/relation_files/
```
Generate other necessary dataset files(i.e. XXX.HIN_dic, XXX_id_to_index.p and XXX_info_dict.p):
- Be sure XXX.hin and XXX.config are put in the folder: data/XXX/, then:
```
 cd data_prepare/
 python data_prepare.py --input_data cora
```
The resulted files can be found in data/XXX/

Excute

To run mSHINE:

 python main.py --graph_name cora --dimensions 128 --batch_size 128 --iter 1000

Output

The node representaions XXX_{epoch}_{emb_size}.emb can be found in: data/XXX/experi_data/node_emb/

the embeddings are stored in the form of dictionary in a pickle file and can be loaded:
```
 import pickle
   
 emb = pickle.load(open('{path_of_emb_file}'),'rb'))
 state_embedding = emb['s_embedding']
 input_embedding = emb['i_embedding']
 output_embedding = emb['o_embedding']
```
(NOTE: the mapping from index of node in state_embedding matrix to the node id can be found in XXX_id_to_index.p )
The transform matrix stored in trans_metric_{epoch}_{emb_size} can be found in: data/XXX/experi_data/record/

Evaluation

An example of evalution code is available in node_emb_classification.py where a label file XXX_label.txt is required.

The format of XXX_label.txt is:
```
 node_id_1 node_label
 node_id_2 node_label
 ...
```
To run the evaluation:
```
  python node_emb_classification.py --dimensions 128 --experi_dir experi_data --target_node_type p --graph_name cora
```
during which, XXX_TEST.label is generated for storing classification related info.

(NOTE: the result reported in the paper is evaluated through SVM with kernel='rbf', gamma='scale').

Citing

If you find mSHINE is useful for your research, please consider citing the following paper:

@ARTICLE{9201301,  
    author={X. {Zhang} and L. {Chen}},  
    journal={IEEE Transactions on Knowledge and Data Engineering},   
    title={mSHINE: A Multiple-meta-paths Simultaneous Learning Framework for Heterogeneous Information Network Embedding},   
    year={2020},
    volume={},
    number={},
    pages={1-1},
    doi={10.1109/TKDE.2020.3025464}}

Please send any questions you might have about the codes and/or the algorithm to [email protected].

yangkeyin / mshine Goto Github PK

mshine's Introduction

mSHINE

Package Version

Data Preparation

Excute

Output

Evaluation

Citing

mshine's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs