GithubHelp home page GithubHelp logo

dgb's Introduction

Towards Better Evaluation for Dynamic Link Prediction

NeurIPS arXiv Video Link Blog Post

  • All dynamic graph datasets can be downloaded from here.
  • Our package is now available on pip (pip install dgb). Detailed documentations can be found here.

Introduction

Despite the prevalence of recent success in learning from static graphs, learning from time-evolving graphs remains an open challenge. In this work, we design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations, to better compare the strengths and weaknesses of methods. First, we create two visualization techniques to understand the reoccurring patterns of edges over time and show that many edges reoccur at later time steps. Based on this observation, we propose a pure memorization baseline called EdgeBank. EdgeBank achieves surprisingly strong performance across multiple settings because easy negative edges are often used in current evaluation setting. To evaluate against more difficult negative edges, we introduce two more challenging negative sampling strategies that improve robustness and better match real-world applications. Lastly, we introduce six new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research.

The ranking of different methods changes in the proposed negative sampling settings which contains more difficult negative edges. Our proposed baselines (horizontal lines) show competitive performance, in particular in standard setup.

Running the experiments

Set up Environment

conda create -n dgb python=3.9

then run

source install.sh

Datasets and Processing

All dynamic graph datasets can be downloaded from here. Then, they can be located in "DG_data" folder. For conducting any experiments, the required data should be in the data folder under each model of interest.

  • For example, to train a TGN model on Wikipedia dataset, we can use the following command to move the edgelist to the right folder:
cp DG_data/wikipedia.csv tgn/data/wikipedia.csv
  • Then, the edgelist should be pre-processed to have the right format. Considering the example of Wikipedia edgelist, we can use the following command for pre-processing the dataset:
# JODIE, DyRep, or TGN
python tgn/utils/preprocess_data.py --data wikipedia

# TGAT
python tgat/preprocess_data.py --data wikipedia

# CAWN
python CAW/preprocess_data.py --data wikipedia

Model Training

  • Example of training different graph representation learning models on Wikipedia dataset:
data=wikipedia
n_runs=5

# JODIE
method=jodie
prefix="${method}_rnn"
python tgn/train_self_supervised.py -d $data --use_memory --memory_updater rnn --embedding_module time --prefix "$prefix" --n_runs "$n_runs" --gpu 0

# DyRep
method=dyrep
prefix="${method}_rnn"
python tgn/train_self_supervised.py -d "$data" --use_memory --memory_updater rnn --dyrep --use_destination_embedding_in_message --prefix "$prefix" --n_runs "$n_runs" --gpu 0

# TGN
method=tgn
prefix="${method}_attn"
python tgn/train_self_supervised.py -d $data --use_memory --prefix "$prefix" --n_runs "$n_runs" --gpu 0

# TGAT
prefix="TGAT"
python -u tgat/learn_edge.py -d $data --bs 200 --uniform --n_degree 20 --agg_method attn --attn_mode prod --gpu 0 --n_head 2 --prefix $prefix --n_runs $n_runs --gpu 0

# CAWN
python CAW/main.py -d $data --pos_dim 108 --bs 32 --n_degree 64 1 --mode i --bias 1e-5 --pos_enc lp --walk_pool sum --seed 0 --n_runs "$n_runs" --gpu 0

  • Example of using EdgeBank for dynamic link prediction with standard random negative sampler:
data=Wikipedia
mem_mode=unlim_mem
n_runs=5
neg_sample=rnd  # can be one of these options: "rnd": standard randome, "hist_nre": Historical NS, or "induc_nre": Inductive NS
python EdgeBank/link_pred/edge_bank_baseline.py --data "$data" --mem_mode "$mem_mode" --n_runs "$n_runs" --neg_sample "$neg_sample"

Testing Trained Models

  • Testing trained models with different negative edge sampling strategies:
n_runs=5
data=wikipedia
neg_sample=hist_nre  # can be either "hist_nre" for historical NS, or "induc_nre" for inductive NS

# JODIE
method=jodie
python tgn/tgn_test_trained_model_self_sup.py -d "$data" --use_memory --memory_updater rnn --embedding_module time --model $method --neg_sample $neg_sample --n_runs $n_runs --gpu 0

# DyRep
method=dyrep
python tgn/tgn_test_trained_model_self_sup.py -d "$data" --use_memory --memory_updater rnn --dyrep --use_destination_embedding_in_message --model $method --neg_sample $neg_sample --n_runs $n_runs --gpu 0

# TGN
method=tgn
python tgn/tgn_test_trained_model_self_sup.py -d $data --use_memory --model $method --neg_sample $neg_sample --n_runs $n_runs --gpu 0

# TGAT
prefix="TGAT"
python tgat/tgat_test_trained_model_learn_edge.py -d $data --uniform --n_degree 20 --agg_method attn --attn_mode prod --gpu 0 --n_head 2 --prefix $prefix --n_runs $n_runs --neg_sample $neg_sample

# CAWN
python CAW/caw_test_trained_model_main.py -d $data --pos_dim 108 --bs 32 --n_degree 64 1 --mode i --bias 1e-5 --pos_enc lp --walk_pool sum --seed 0 --n_runs "$n_runs" --gpu 0 --neg_sample $neg_sample

Visualizing Dynamic Graphs

For visualizing the dynamic networks, their edgelists should be located in the "visualization/data/" folder.

  • For generating TEA plots:
python visualization/TEA_plots.py

(Different networks can be selected directly in the "visualization/TEA_plots.py" file.)

  • For generating TET plots:
python visualization/TET_plots.py

(Different networks can be selected directly in the "visualization/TET_plots.py" file.)

The outputs are saved in "visualization/figs/TEA" or "visualization/figs/TET" folder for the TEA or TET plots, respectively.

Environment Requirements

  • python >= 3.7, PyTorch >= 1.4
  • Other requirements:
pandas==1.1.0
scikit_learn==0.23.1
tqdm==4.41.1
numpy==1.16.4
matploblib==3.3.1

Maintenance Plan

Since investigation of temporal graphs has significant impacts on several domains, we plan to develop and extend this project in several ways. In particular, we plan on the following directions:

  • Including more temporal graph datasets from even larger selection of domains; e.g. biological, chemical graphs, etc.
  • Including additional temporal graph learning methods as they become available.
  • Including inductive link prediction settings.

Acknowledgment

We would like to thank the authors of TGAT, TGN, and CAWN for providing access to their projects' code.

Citation

If this work is useful for your research, please cite us:

@inproceedings{dgb_neurips_D&B_2022,
    title={Towards Better Evaluation for Dynamic Link Prediction},
    author={Poursafaei, Farimah and Huang, Shenyang and Pelrine, Kellin and and Rabbany, Reihaneh},
    booktitle={Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks},
    year={2022}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.