GithubHelp home page GithubHelp logo

sherylhyx / gatne Goto Github PK

View Code? Open in Web Editor NEW

This project forked from thudm/gatne

0.0 0.0 0.0 9.52 MB

Source code and dataset for KDD 2019 paper "Representation Learning for Attributed Multiplex Heterogeneous Network"

License: MIT License

Python 99.88% Shell 0.12%

gatne's Introduction

GATNE

Representation Learning for Attributed Multiplex Heterogeneous Network.

Yukuo Cen, Xu Zou, Jianwei Zhang, Hongxia Yang, Jingren Zhou, Jie Tang

Accepted to KDD 2019 Research Track!

โ— News

Our GATNE models have been implemented by many popular graph toolkits:

Some recent papers have listed GATNE models as a strong baseline:

Please let me know if your toolkit includes GATNE models or your paper uses GATNE models as baselines.

Prerequisites

  • Python 3
  • TensorFlow >= 1.8 (or PyTorch)

Getting Started

Installation

Clone this repo.

git clone https://github.com/THUDM/GATNE
cd GATNE

Please install dependencies by

pip install -r requirements.txt

Dataset

These datasets are sampled from the original datasets.

  • Amazon contains 10,166 nodes and 148,865 edges. Source
  • Twitter contains 10,000 nodes and 331,899 edges. Source
  • YouTube contains 2,000 nodes and 1,310,617 edges. Source
  • Alibaba contains 6,163 nodes and 17,865 edges.

Training

Training on the existing datasets

You can use ./scripts/run_example.sh or python src/main.py --input data/example or python src/main_pytorch.py --input data/example to train GATNE-T model on the example data. (If you share the server with others or you want to use the specific GPU(s), you may need to set CUDA_VISIBLE_DEVICES.)

If you want to train on the Amazon dataset, you can run python src/main.py --input data/amazon or python src/main.py --input data/amazon --features data/amazon/feature.txt to train GATNE-T model or GATNE-I model, respectively.

You can use the following commands to train GATNE-T on Twitter and YouTube datasets: python src/main.py --input data/twitter --eval-type 1 or python src/main.py --input data/youtube. We only evaluate the edges of the first edge type on Twitter dataset as the number of edges of other edge types is too small.

As Twitter and YouTube datasets do not have node attributes, you can generate heuristic features for them, such as DeepWalk embeddings. Then you can train GATNE-I model on these two datasets by adding the --features argument.

Training on your own datasets

If you want to train GATNE-T/I on your own dataset, you should prepare the following three(or four) files:

  • train.txt: Each line represents an edge, which contains three tokens <edge_type> <node1> <node2> where each token can be either a number or a string.
  • valid.txt: Each line represents an edge or a non-edge, which contains four tokens <edge_type> <node1> <node2> <label>, where <label> is either 1 or 0 denoting an edge or a non-edge
  • test.txt: the same format with valid.txt
  • feature.txt (optional): First line contains two number <num> <dim> representing the number of nodes and the feature dimension size. From the second line, each line describes the features of a node, i.e., <node> <f_1> <f_2> ... <f_dim>.

If your dataset contains several node types and you want to use meta-path based random walk, you should also provide an additional file as follows:

  • node_type.txt: Each line contains two tokens <node> <node_type>, where <node_type> should be consistent with the meta-path schema in the training command, i.e., --schema node_type_1-node_type_2-...-node_type_k-node_type_1. (Note that the first node type in the schema should equals to the last node type.)

If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.

Cite

Please cite our paper if you find this code useful for your research:

@inproceedings{cen2019representation,
  title = {Representation Learning for Attributed Multiplex Heterogeneous Network},
  author = {Cen, Yukuo and Zou, Xu and Zhang, Jianwei and Yang, Hongxia and Zhou, Jingren and Tang, Jie},
  booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year = {2019},
  pages = {1358--1368},
  publisher = {ACM},
}

gatne's People

Contributors

cenyk1230 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.