GithubHelp home page GithubHelp logo

liangke23 / dink-net Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yueliu1999/dink-net

1.0 0.0 0.0 105.97 MB

[ICML 2023] An official source code for paper "Dink-Net: Neural Clustering on Large Graphs".

License: MIT License

Shell 0.88% Python 99.12%

dink-net's Introduction

Deep graph clustering, which aims to group the nodes of a graph into disjoint clusters with deep neural networks, has achieved promising progress in recent years. However, the existing methods fail to scale to the large graph with million nodes. To solve this problem, a scalable deep graph clustering method (Dink-Net) is proposed with the idea of dilation and shrink. Firstly, by discriminating nodes, whether being corrupted by augmentations, representations are learned in a self-supervised manner. Meanwhile, the cluster centers are initialized as learnable neural parameters. Subsequently, the clustering distribution is optimized by minimizing the proposed cluster dilation loss and cluster shrink loss in an adversarial manner. By these settings, we unify the two-step clustering, i.e., representation learning and clustering optimization, into an end-to-end framework, guiding the network to learn clustering-friendly features. Besides, Dink-Net scales well to large graphs since the designed loss functions adopt the mini-batch data to optimize the clustering distribution even without performance drops. Both experimental results and theoretical analyses demonstrate the superiority of our method.

stars forks  issues  visitors

Table of Contents
  1. Usage
  2. Acknowledgement
  3. Citation

Usage

Datasets

Dataset Type # Nodes # Edges # Feature Dimensions # Classes
Cora Attribute Graph 2,708 5,278 1,433 7
CiteSeer Attribute Graph 3,327 4,614 3,703 6
Amazon-Photo Attribute Graph 7,650 119,081 745 8
ogbn-arxiv Attribute Graph 169,343 1,166,243 128 40
Reddit Attribute Graph 232,965 23,213,838 602 41
ogbn-products Attribute Graph 2,449,029 61,859,140 100 47
ogbn-papers100M Attribute Graph 111,059,956 1,615,685,872 128 172

Requirements

codes are tested on Python3.7

dgl==0.6.1
munkres==1.1.4
networkx==2.8.3
numpy==1.23.2
scikit_learn==1.3.0
scipy==1.6.0
torch==2.0.1
torch-scatter==2.0.9
torch-sparse==0.6.12
torch-spline-conv==1.2.1
torch-geometric==2.1.0.post1
tqdm==4.65.0
wandb=0.15.4

Configurations

--device     |  running device
--dataset    |  dataset name
--hid_units  |  hidden units
--activate   |  activation function
--tradeoff   |  tradeoff parameter
--lr         |  learning rate
--epochs     |  training epochs
--eval_inter |  evaluation interval
--wandb      |  wandb logging

Quick Start

clone this repository and change directory to Dink-Net

git clone https://github.com/yueliu1999/Dink-Net.git
cd ./Dink-Net

unzip the datasets and model parameters

unzip -d ./data/ ./data/datasets.zip
unzip -d ./models/ ./models/models.zip

run codes with scripts

bash ./scripts/train_cora.sh

bash ./scripts/train_citeseer.sh

bash ./scripts/train_amazon_photo.sh

or directly run codes with commands

python main.py --device cuda:0 --dataset cora --hid_units 512 --lr 1e-2 --epochs 200 --wandb

python main.py --device cuda:0 --dataset citeseer --hid_units 1536 --lr 5e-4 --epochs 200 --wandb

python main.py --device cuda:0 --dataset amazon_photo --hid_units 512 --lr 1e-2 --epochs 100 --wandb

tips: remove "--wandb" to disable wandb logging if logging error happened.

Results

main_results

Table 1. Clustering performance (%) of our method and fourteen state-of-the-art baselines. The bold and underlined values are the best and the runner-up results. “OOM” indicates that the method raises the out-of-memory failure. “-” denotes that the methods do not converge.

main_results_vis

Figure 1. t-SNE visualization of seven methods on the Cora dataset.

Acknowledgements

Our code are partly based on the following GitHub repository. Thanks for their awesome works.

  • Awesome Deep Graph Clustering: a collection of deep graph clustering (papers, codes, and datasets).
  • Graph-Group-Discrimination: the official implement of Graph Group Discrimination (GGD) model.
  • S3GC: the official implement of Scalable Self-Supervised Graph Clustering (S3GC) model.
  • HSAN: the official implement of Hard Sample Aware Network (HSAN) model.
  • DCRN: the official implement of Dual Correlation Reduction Network (DCRN) model.

Citations

If you find this repository helpful, please cite our paper.

@inproceedings{Dink-Net,
  title={Dink-Net: Neural Clustering on Large Graphs},
  author={Liu, Yue and Liang, Ke and Xia, Jun and Zhou, Sihang and Yang, Xihong and Liu, Xinwang and Li, Stan Z.},
  booktitle={International Conference on Machine Learning},
  year={2023},
  organization={PMLR}
}

(back to top)

dink-net's People

Contributors

yueliu1999 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.