GithubHelp home page GithubHelp logo

pygcl / pygcl Goto Github PK

View Code? Open in Web Editor NEW
860.0 7.0 97.0 677 KB

PyGCL: A PyTorch Library for Graph Contrastive Learning

Home Page: https://PyGCL.readthedocs.io

License: Apache License 2.0

Python 100.00%
contrastive-learning graph-representation-learning machine-learning graph-contrastive-learning

pygcl's Introduction

logo

PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standardized evaluation, and experiment management.

Made with Python PyPI version Documentation Status GitHub stars GitHub forks Total lines visitors


What is Graph Contrastive Learning?

Graph Contrastive Learning (GCL) establishes a new paradigm for learning graph representations without human annotations. A typical GCL algorithm firstly constructs multiple graph views via stochastic augmentation of the input and then learns representations by contrasting positive samples against negative ones.

👉 For a general introduction of GCL, please refer to our paper and blog. Also, this repo tracks newly published GCL papers.

Install

Prerequisites

PyGCL needs the following packages to be installed beforehand:

  • Python 3.8+
  • PyTorch 1.9+
  • PyTorch-Geometric 1.7
  • DGL 0.7+
  • Scikit-learn 0.24+
  • Numpy
  • tqdm
  • NetworkX

Installation via PyPI

To install PyGCL with pip, simply run:

pip install PyGCL

Then, you can import GCL from your current environment.

A note regarding DGL

Currently the DGL team maintains two versions, dgl for CPU support and dgl-cu*** for CUDA support. Since pip treats them as different packages, it is hard for PyGCL to check for the version requirement of dgl. We have removed such dependency checks for dgl in our setup configuration and require the users to install a proper version by themselves.

Package Overview

Our PyGCL implements four main components of graph contrastive learning algorithms:

  • Graph augmentation: transforms input graphs into congruent graph views.
  • Contrasting architectures and modes: generate positive and negative pairs according to node and graph embeddings.
  • Contrastive objectives: computes the likelihood score for positive and negative pairs.
  • Negative mining strategies: improves the negative sample set by considering the relative similarity (the hardness) of negative sample.

We also implement utilities for training models, evaluating model performance, and managing experiments.

Implementations and Examples

For a quick start, please check out the examples folder. We currently implemented the following methods:

  • DGI (P. Veličković et al., Deep Graph Infomax, ICLR, 2019) [Example1, Example2]
  • InfoGraph (F.-Y. Sun et al., InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization, ICLR, 2020) [Example]
  • MVGRL (K. Hassani et al., Contrastive Multi-View Representation Learning on Graphs, ICML, 2020) [Example1, Example2]
  • GRACE (Y. Zhu et al., Deep Graph Contrastive Representation Learning, GRL+@ICML, 2020) [Example]
  • GraphCL (Y. You et al., Graph Contrastive Learning with Augmentations, NeurIPS, 2020) [Example]
  • SupCon (P. Khosla et al., Supervised Contrastive Learning, NeurIPS, 2020) [Example]
  • HardMixing (Y. Kalantidis et al., Hard Negative Mixing for Contrastive Learning, NeurIPS, 2020)
  • DCL (C.-Y. Chuang et al., Debiased Contrastive Learning, NeurIPS, 2020)
  • HCL (J. Robinson et al., Contrastive Learning with Hard Negative Samples, ICLR, 2021)
  • Ring (M. Wu et al., Conditional Negative Sampling for Contrastive Learning of Visual Representations, ICLR, 2021)
  • Exemplar (N. Zhao et al., What Makes Instance Discrimination Good for Transfer Learning?, ICLR, 2021)
  • BGRL (S. Thakoor et al., Bootstrapped Representation Learning on Graphs, arXiv, 2021) [Example1, Example2]
  • G-BT (P. Bielak et al., Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs, arXiv, 2021) [Example]
  • VICReg (A. Bardes et al., VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning, arXiv, 2021)

Building Your Own GCL Algorithms

Besides try the above examples for node and graph classification tasks, you can also build your own graph contrastive learning algorithms straightforwardly.

Graph Augmentation

In GCL.augmentors, PyGCL provides the Augmentor base class, which offers a universal interface for graph augmentation functions. Specifically, PyGCL implements the following augmentation functions:

Augmentation Class name
Edge Adding (EA) EdgeAdding
Edge Removing (ER) EdgeRemoving
Feature Masking (FM) FeatureMasking
Feature Dropout (FD) FeatureDropout
Edge Attribute Masking (EAR) EdgeAttrMasking
Personalized PageRank (PPR) PPRDiffusion
Markov Diffusion Kernel (MDK) MarkovDiffusion
Node Dropping (ND) NodeDropping
Node Shuffling (NS) NodeShuffling
Subgraphs induced by Random Walks (RWS) RWSampling
Ego-net Sampling (ES) Identity

Call these augmentation functions by feeding with a Graph in a tuple form of node features, edge index, and edge features (x, edge_index, edge_attrs) will produce corresponding augmented graphs.

Composite Augmentations

PyGCL supports composing arbitrary numbers of augmentations together. To compose a list of augmentation instances augmentors, you need to use the Compose class:

import GCL.augmentors as A

aug = A.Compose([A.EdgeRemoving(pe=0.3), A.FeatureMasking(pf=0.3)])

You can also use the RandomChoice class to randomly draw a few augmentations each time:

import GCL.augmentors as A

aug = A.RandomChoice([A.RWSampling(num_seeds=1000, walk_length=10),
                      A.NodeDropping(pn=0.1),
                      A.FeatureMasking(pf=0.1),
                      A.EdgeRemoving(pe=0.1)],
                     num_choices=1)

Customizing Your Own Augmentation

You can write your own augmentation functions by inheriting the base Augmentor class and defining the augment function.

Contrasting Architectures and Modes

Existing GCL architectures could be grouped into two lines: negative-sample-based methods and negative-sample-free ones.

  • Negative-sample-based approaches can either have one single branch or two branches. In single-branch contrasting, we only need to construct one graph view and perform contrastive learning within this view. In dual-branch models, we generate two graph views and perform contrastive learning within and across views.
  • Negative-sample-free approaches eschew the need of explicit negative samples. Currently, PyGCL supports the bootstrap-style contrastive learning as well contrastive learning within embeddings (such as Barlow Twins and VICReg).
Contrastive architectures Supported contrastive modes Need negative samples Class name Examples
Single-branch contrasting G2L only SingleBranchContrast DGI, InfoGraph
Dual-branch contrasting L2L, G2G, and G2L DualBranchContrast GRACE
Bootstrapped contrasting L2L, G2G, and G2L BootstrapContrast BGRL
Within-embedding contrasting L2L and G2G WithinEmbedContrast GBT

Moreover, you can use add_extra_mask if you want to add positives or remove negatives. This function performs bitwise ADD to extra positive masks specified by extra_pos_mask and bitwise OR to extra negative masks specified by extra_neg_mask. It is helpful, for example, when you have supervision signals from labels and want to train the model in a semi-supervised manner.

Internally, PyGCL calls Sampler classes in GCL.models that receive embeddings and produce positive/negative masks. PyGCL implements three contrasting modes: (a) Local-Local (L2L), (b) Global-Global (G2G), and (c) Global-Local (G2L) modes. L2L and G2G modes contrast embeddings at the same scale and the latter G2L one performs cross-scale contrasting. To implement your own GCL model, you may also use these provided sampler models:

Contrastive modes Class name
Same-scale contrasting (L2L and G2G) SameScaleSampler
Cross-scale contrasting (G2L) CrossScaleSampler
  • For L2L and G2G, embedding pairs of the same node/graph in different views constitute positive pairs. You can refer to GRACE and GraphCL for examples.
  • For G2L, node-graph embedding pairs form positives. Note that for single-graph datasets, the G2L mode requires explicit negative sampling (otherwise no negatives for contrasting). You can refer to DGI for an example.
  • Some models (e.g., GRACE) add extra intra-view negative samples. You may manually call sampler.add_intraview_negs to enlarge the negative sample set.
  • Note that the bootstrapping latent model involves some special model design (asymmetric online/offline encoders and momentum weight updates). You may refer to BGRL for details.

Contrastive Objectives

In GCL.losses, PyGCL implements the following contrastive objectives:

Contrastive objectives Class name
InfoNCE loss InfoNCE
Jensen-Shannon Divergence (JSD) loss JSD
Triplet Margin (TM) loss Triplet
Bootstrapping Latent (BL) loss BootstrapLatent
Barlow Twins (BT) loss BarlowTwins
VICReg loss VICReg

All these objectives are able to contrast any arbitrary positive and negative pairs, except for Barlow Twins and VICReg losses that perform contrastive learning within embeddings. Moreover, for InfoNCE and Triplet losses, we further provide SP variants that computes contrastive objectives given only one positive pair per sample to speed up computation and avoid excessive memory consumption.

Negative Sampling Strategies

PyGCL further implements several negative sampling strategies:

Negative sampling strategies Class name
Subsampling GCL.models.SubSampler
Hard negative mixing GCL.models.HardMixing
Conditional negative sampling GCL.models.Ring
Debiased contrastive objective GCL.losses.DebiasedInfoNCE , GCL.losses.DebiasedJSD
Hardness-biased negative sampling GCL.losses.HardnessInfoNCE, GCL.losses.HardnessJSD

The former three models serve as an additional sampling step similar to existing Sampler ones and can be used in conjunction with any objectives. The last two objectives are only for InfoNCE and JSD losses.

Utilities

PyGCL provides a variety of evaluator functions to evaluate the embedding quality:

Evaluator Class name
Logistic regression LREvaluator
Support vector machine SVMEvaluator
Random forest RFEvaluator

To use these evaluators, you first need to generate dataset splits by get_split (random split) or by from_predefined_split (according to preset splits).

Contribution

Feel free to open an issue should you find anything unexpected or create pull requests to add your own work! We are motivated to continuously make PyGCL even better.

Citation

Please cite our paper if you use this code in your own work:

@article{Zhu:2021tu,
author = {Zhu, Yanqiao and Xu, Yichen and Liu, Qiang and Wu, Shu},
title = {{An Empirical Study of Graph Contrastive Learning}},
journal = {arXiv.org},
year = {2021},
eprint = {2109.01116v1},
eprinttype = {arxiv},
eprintclass = {cs.LG},
month = sep,
}

pygcl's People

Contributors

dependabot[bot] avatar dongkwan-kim avatar linyxus avatar sxkdz avatar zlpure avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pygcl's Issues

Could not find dgl>=0.7 when installing PyGCL

Hello pygcl team, thanks for your excellent work!

It seems that the latest version of DGL is 0.6.1 when I install it following the official instruction. Then I meet the error "Could not find dgl>=0.7" when I try to install PyGCL. Could you please give me some suggestions to fix this issue?

what is your version of pl_bolts

Dear authors,
Thank you for your code.I have some problem in GBT Implementation in pl_bolts.

Using backend: pytorch
Traceback (most recent call last):
File "GBT.py", line 13, in
from pl_bolts.optimizers import LinearWarmupCosineAnnealingLR
File "/usr/local/lib/python3.7/dist-packages/pl_bolts/init.py", line 19, in
from pl_bolts import ( # noqa: E402
File "/usr/local/lib/python3.7/dist-packages/pl_bolts/datamodules/init.py", line 5, in
from pl_bolts.datamodules.experience_source import DiscountedExperienceSource, ExperienceSource, ExperienceSourceDataset
File "/usr/local/lib/python3.7/dist-packages/pl_bolts/datamodules/experience_source.py", line 24, in
class ExperienceSourceDataset(IterableDataset):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_typing.py", line 273, in new
return super().new(cls, name, bases, namespace, **kwargs) # type: ignore[call-overload]
File "/usr/lib/python3.7/abc.py", line 126, in new
cls = super().new(mcls, name, bases, namespace, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_typing.py", line 371, in _dp_init_subclass
", but found {}".format(sub_cls.name, _type_repr(hints['return'])))
TypeError: Expected 'Iterator' as the return annotation for __iter__ of ExperienceSourceDataset, but found typing.Iterable

Some questions about the repo.

Excellent work. It will be helpful to the development of graph contrastive learning.
Here are some questions:

  1. Can you specify the papers referenced by negative sampling strategies?
  2. Can you report the experiment results of these methods in examples folder?

issue caused by ordered augmentor

Just to bring a little problem:

Augmentor.Compose([aug1, aug2,...]) combines all aug operations in order, yet some of the ops are opposite themselves. For example if we applied EdgeRemoving and EdgeAdding some edges may be removed and then added thus two operations might cancel each other.

Although this problem might not appear very often, but it does exist.

ConvergenceWarning: Liblinear failed to converge

When I use the code in your example about graphCL, and then without modification will appear ConvergenceWarning: Liblinear failed to converge, increase the number of iterations. Increasing the number of iterations did not help, may I ask why?

Running PyGCL algos on own dataset

Is there any way to run your own dataset with PyGCL,
which files should I change and update? I have a dataset in NetworkX format.

Thanks in advance.

Installation problem for people in China

Installation Issue, While I already have installed dgl manually. but it looks like you have a requirements file that imposing to install the wrong version or something.
note: I am trying to install via pip mirror because I am in China. I tried both USTC and Tshinghua mirror and both ends up with the same error.

DGL latest version is 0.6.1 as per the below screenshot, while Pygcl requires 0.7 ?
I already installed the latest version as shown in error.
pip install -i https://mirrors.aliyun.com/pypi/simple/ dgl==0.7a210527

What I am doing wrong here?

image

image

Is there anyway to read the documentation?

Thanks for opensourcing such good contribution.
I wrote a test code like this:

import torch
import GCL.augmentors as A
aug=A.Compose([A.EdgeAdding(pe=0.4),A.FeatureDropout(pf=0.5)])
edge_index=torch.randint(0,10,(2,10)) 
x=torch.randn((10,128))
auged=aug(x,edge_index)
print(auged)

and it got this:

num_edges = edge_index.size()[1]
IndexError: tuple index out of range

It would be appreciated if anyone could help tell me how to read the documentation.

After I checked the code in add_edge function in functional.py, there could be a little problem with these piece of code:

def add_edge(edge_index: torch.Tensor, ratio: float) -> torch.Tensor:
    num_edges = edge_index.size()[1]
    num_nodes = edge_index.max().item() + 1
    num_add = int(num_edges * ratio)

    new_edge_index = torch.randint(0, num_nodes - 1, size=(2, num_add)).to(edge_index.device)
    edge_index = torch.cat([edge_index, new_edge_index], dim=1)
    
    # here it could be wrongly written. [0] might be removed.
    edge_index = sort_edge_index(edge_index)[0]


    return coalesce_edge_index(edge_index)[0]

The augmentation "add edge"

Hi! I found that the augmentation "add edge" may not support for the mini-batch graph-level contrastive learning. In my understanding, edges should be added for each graph in the mini-batch case. Could you please check this?

关于训练数据划分

image
你好,你们的GCL.eval中,关于训练集和测试集的划分部分,我认为我标红的部分,即val和test是否应当变换顺序?

Negative mining strategies

您好,关于负采样,想问一下 有详细的说明文档或者示例吗 我在examples中没有找到负采样的示例...

关于Develop分支

您好,我看到你们更新了新的Develop分支,请问我该如何将我的PyGCL更新为Develop分支中的新版本呢?

Multi-network contrastive learning

All the cases are for the comparison learning of a graph, may I ask whether multi-network comparison learning will be added, such as "Contrastive Multi-View Multiplex Network Embedding with Applications to Robust Network Alignment"

About the ''edge_weight'' in ''PPRDiffusion'' augment

Thanks for opensourcing.

When I read examples "MVGRL_graph.py", I found only augmented 'edge_index2' and 'x2' were used as the input of 'gcn2'. Why did not input the augmented ''edge_weight2'' to 'gcn2' ?

It would be appreciated if anyone could help tell me the reason.

label leakage

z, _, _ = encoder_model(data)

In the supervised contrastive learning, the nodes with the same labels are collected as the postive pairs. So in my opinion, in the constrative pretraining stage, we can't utilize all the data, otherwise the labels are leaked in the tuning stage

EdgeRemoving

Awesome work! But I am confused about A.EdgeRemoving(pe=0.3), which means keep 0.3 or drop 0.3?

Overflow encountered at GCL.augmentors.functional.random_walk_subgraph

Hi! I found that when I use the A.RWSampling(), the augmenters sometimes run into a crush with:

python3.8/site-packages/torch_geometric/utils/subgraph.py", line 40, in subgraph
    n_mask[subset] = 1
IndexError: index 4542161131129163139 is out of bounds for dimension 0 with size 1698

Then I dive into the random_walk_subgraph function, the node_idx here is the parameter pass into subgraph as subnet, which causes the index overflow(some indexes in the node_idx is above the upper limit).

def random_walk_subgraph(edge_index: torch.LongTensor, edge_weight: Optional[torch.FloatTensor] = None, batch_size: int = 1000, length: int = 10):
    num_nodes = edge_index.max().item() + 1

    row, col = edge_index
    adj = SparseTensor(row=row, col=col, sparse_sizes=(num_nodes, num_nodes))

    start = torch.randint(0, num_nodes, size=(batch_size, ), dtype=torch.long).to(edge_index.device)
    node_idx = adj.random_walk(start.flatten(), length).view(-1)
    edge_index, edge_weight = subgraph(node_idx, edge_index, edge_weight)

However, I met trouble when studying how the adj.random_walk generates the wrong node_idx, then I add a checker myself:

 node_idx = adj.random_walk(start.flatten(), length).view(-1)
 for index, value in enumerate(node_idx):
     if value >= edge_index.max().item() + 1 or value < 0:
         node_idx[index] = random.randint(0, edge_index.max().item())
    edge_index, edge_weight = subgraph(node_idx, edge_index, edge_weight)

But it seems not a proper solution, could you tell me why this problem happens or is there anything wrong with the adj.random_walk?

Thanks!

torch 1.10.0+cu113
torch-cluster 1.6.1
torch-geometric 1.7.0
torch-scatter 2.1.1
torch-sparse 0.6.13
torch-spline-conv 1.2.2
cuda11.3

@SXKDZ @Linyxus @dongkwan-kim @zlpure @AzureLeon1

The augmentation "Node Dropping"

Hi, I‘m curious about the augmentation "Dropping Node", I find both of your implementation and the code published by the author of GraphCL just isolated the selected nodes but don't remove the selected nodes from the node feature matrix. In this situation, when we do the graph classification task and use some operations like summation, the isolated nodes will still have an impact on the final learned representation. So, shouldn't we remove the selected nodes from the feature matrix or this is a standard for graph augmentation?

Screen Shot 2022-04-24 at 9 51 40 PM

Can PyGCL used in torch_geometric.data.HeteroData

Hi, the PyGCL is really useful. I want to know if it is possible to extend it to torch_geometric.data.HeteroData. I scanned the code and guessed it can not yet. Do you plan to do this in the future?

Why there is no batch size implementation

Hello, in your implementation of GRACE i could not find the batch size option, I am unable to execute GRACE with DBLP and PubMed on 24 GB GPU.

Any solution or suggestion for that?

Eval.py get_split error?

Hi,

I found a issue of get_split() in eval.py. I think the index of valid and test should be switched. For example, the correct one would be:

def get_split(num_samples: int, train_ratio: float = 0.1, test_ratio: float = 0.8):
assert train_ratio + test_ratio < 1
train_size = int(num_samples * train_ratio)
test_size = int(num_samples * test_ratio)
indices = torch.randperm(num_samples)
return {
'train': indices[:train_size],
'test': indices[train_size: test_size + train_size],
'valid': indices[test_size + train_size:]
}

Thanks!

Is it possible to use BGRL or BarlowTwins in the G2G setting?

I would like to implement the BGRL and or BarlowTwins in the G2G setting as my downstream task involves graph-level predictions. However, in the papers, these algorithms appear to be developed for node-level tasks. I see in the readme that BarlowTwins at least should be usable in the G2G setting, but I'm not sure how to implement it. Could you provide some guidance?

About the installation

I have installed dgl-cu11 0.7.2. When I installed pygcl, I was prompted with "error: no matching distribution found for DGL > = 0.7 (from pygcl)". Do I have to install DGL without CUDA?

关于增强方法Identity()

`
class Identity(Augmentor):
def init(self):
super(Identity, self).init()

def augment(self, g: Graph) -> Graph:
    return g

`
该方法返回一个原图,请问一下为什么这个原图被视为一个ego-graph呢?
谢谢解答

About different results running the same code

Though I have fix the seed, the results after running the same code are different. Do u have any ideas about that?
` random.seed(0)

np.random.seed(0)
torch.manual_seed(0)
torch.cuda.manual_seed_all(0)    
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False`

Can't find trial.py

Thanks for opensourcing.

When I read "A.2. Instructions for Reproducing Results in Our Work" section in the paper, this section describles how to reproduce the experiments by running trial.py. But I can't find trial.py file in this repository.

It would be appreciated if anyone could help tell me where to find the file.

关于Negative Sampling Strategies

您好,我正在尝试给GRACE.py加上一些负采样策略,直接使用GCL.losses.DebiasedInfoNCE或GCL.losses.HardnessInfoNCE都会报错,检查后发现是图中80行dim的问题,我将80行修改为81行能够跑通,但是测试f1特别低,为0.2,0.3的样子,请问我修改的是否有问题?以为对于其他负采样,ring,hardmixing的使用有没有更详细的说明文档?
image

How to run GraphCL with Node classification tasks?

Dear authors, this is a great job in Graph Contrastive learning field. But I am encountering 2 small problems.

  1. When I run GRACE with large datasets such as WikiCS, then the CUDA out of memory problem occurs.
  2. How to run GraphCL on node classification problem?
    I am much thankful if you can reply to me !

JSD loss implementation does not seem to match the formula in your paper

I am having trouble figuring out how the following implementation matches the softplus version of JSD described in equation (4) of Appendix F of your paper. I would really appreciate if you can provide any clarification.

def compute(self, anchor, sample, pos_mask, neg_mask, *args, **kwargs):
        num_neg = neg_mask.int().sum()
        num_pos = pos_mask.int().sum()
        similarity = self.discriminator(anchor, sample)'

        E_pos = (np.log(2) - F.softplus(- similarity * pos_mask)).sum()
        E_pos /= num_pos

        neg_sim = similarity * neg_mask
        E_neg = (F.softplus(- neg_sim) + neg_sim - np.log(2)).sum()
        E_neg /= num_neg

        return E_neg - E_pos

image

Edge Flipping

Hello dear author(s),
Thank you for sharing your paper's code. May I know where the edge flipping code sections are? Best

无法检测到本地其他通道安装的 torch_geometric 的 BUG

前提

  • 目前的 setup.cfg 中要求安装 torch_geometric

    install_requires =
      torch >= 1.9
      torch-geometric >= 1.7
      numpy
      tqdm
      scipy
      networkx
      scikit-learn
  • torch_geometric在使用 pip 和 conda 安装时,可能存在本地包名并不为 “torch_geometric” 的情况:

    • ✅ 使用 pip 安装稳定版时叫 “torch_geometric”;
    • ❌ 使用 conda 安装稳定版时叫 “pyg”;
    • ❌ 使用 pip 安装 nightly 版时叫 "pyg-nightly"。
  • 我本地的部分环境以供参考:

    pytorch 2.3.0 py3.12_cuda12.1_cudnn8.9.2_0 pytorch
    pytorch-cuda 12.1 ha16c6d3_5 pytorch
    pytorch-lightning 2.2.2 pyhd8ed1ab_0 conda-forge
    pytorch-mutex 1.0 cuda pytorch
    torch-cluster 1.6.3 pypi_0 pypi
    torch-scatter 2.1.2+pt23cu121 pypi_0 pypi
    torch-sparse 0.6.18+pt23cu121 pypi_0 pypi
    torch-spline-conv 1.2.2+pt23cu121 pypi_0 pypi
    torchaudio 2.3.0 py312_cu121 pytorch
    torchdiffeq 0.2.2 pyhd8ed1ab_0 conda-forge
    torchmetrics 1.4.0.post0 pyhd8ed1ab_0 conda-forge
    pyg-lib 0.4.0+pt23cu121 pypi_0 pypi
    pyg-nightly 2.6.0.dev20240710 pypi_0 pypi
    pygments 2.15.1 py312h06a4308_1 defaults

结果

我使用 pip 安装的 pyg-nightly 环境会要求再使用 pip 安装一次 torch_geometric

pip install pygcl --dry-run
Collecting pygcl
...
Would install PyGCL-0.1.2 torch_geometric-2.5.3

关于复现先前的模型

你好,我用你们的源码复现了GRACE,并且超参数都按照源代码进行了调整,但是效果不如源码,请问你们有注意到这个问题吗?

feature dropout

Hi, thank you for you code.

return F.dropout(x, p=1. - drop_prob)

May I know why you set the F.dropout probability to 1. - drop_prob instead of the original drop_prob? I.e., why the code here is not return F.dropout(x, p=drop_prob)? Thanks.

Node representation for batch

Hi, thank you for opening your codes.
I wonder that how can I obtain representation of node-levels when I use 'batch'.

For example, when I use GraphCL code, z's dimension is ((batch_size*number of nodes)*representation dimension) and g's dimension is (batch_size * representation dimension).
However, I want to obtain z as (batch size * number of nodes * representation dimension). It is hard to use 'contrast model' provided in the examples because z is 3D.. ​

It would be appreciated if you could help this issue :)

关于A.EdgeAdding()

你好,我测试了GraphCL例子里面所有生成方法,全部ok。
唯独这个aug1 = A.EdgeAdding(pe=0.1)好像有点问题。
可以帮看一下吗?

import torch
import os.path as osp
import GCL.losses as L
import GCL.augmentors as A
import torch.nn.functional as F

from torch import nn
from tqdm import tqdm
from torch.optim import Adam
from GCL.eval import get_split, SVMEvaluator
from GCL.models import DualBranchContrast
from torch_geometric.nn import GINConv, global_add_pool
from torch_geometric.data import DataLoader
from torch_geometric.datasets import TUDataset


def make_gin_conv(input_dim, out_dim):
    return GINConv(nn.Sequential(nn.Linear(input_dim, out_dim), nn.ReLU(), nn.Linear(out_dim, out_dim)))


class GConv(nn.Module):
    def __init__(self, input_dim, hidden_dim, num_layers):
        super(GConv, self).__init__()
        self.layers = nn.ModuleList()
        self.batch_norms = nn.ModuleList()

        for i in range(num_layers):
            if i == 0:
                self.layers.append(make_gin_conv(input_dim, hidden_dim))
            else:
                self.layers.append(make_gin_conv(hidden_dim, hidden_dim))
            self.batch_norms.append(nn.BatchNorm1d(hidden_dim))

        project_dim = hidden_dim * num_layers
        self.project = torch.nn.Sequential(
            nn.Linear(project_dim, project_dim),
            nn.ReLU(inplace=True),
            nn.Linear(project_dim, project_dim))

    def forward(self, x, edge_index, batch):
        z = x
        zs = []
        for conv, bn in zip(self.layers, self.batch_norms):
            z = conv(z, edge_index)
            z = F.relu(z)
            z = bn(z)
            zs.append(z)
        gs = [global_add_pool(z, batch) for z in zs]
        z, g = [torch.cat(x, dim=1) for x in [zs, gs]]
        return z, g


class Encoder(torch.nn.Module):
    def __init__(self, encoder, augmentor):
        super(Encoder, self).__init__()
        self.encoder = encoder
        self.augmentor = augmentor

    def forward(self, x, edge_index, batch):
        aug1, aug2 = self.augmentor
        x1, edge_index1, edge_weight1 = aug1(x, edge_index)
        x2, edge_index2, edge_weight2 = aug2(x, edge_index)
        z, g = self.encoder(x, edge_index, batch)
        z1, g1 = self.encoder(x1, edge_index1, batch)
        z2, g2 = self.encoder(x2, edge_index2, batch)
        return z, g, z1, z2, g1, g2


def train(encoder_model, contrast_model, dataloader, optimizer):
    encoder_model.train()
    epoch_loss = 0
    for data in dataloader:
        data = data.to('cuda')
        optimizer.zero_grad()

        if data.x is None:
            num_nodes = data.batch.size(0)
            data.x = torch.ones((num_nodes, 1), dtype=torch.float32, device=data.batch.device)

        _, _, _, _, g1, g2 = encoder_model(data.x, data.edge_index, data.batch)
        g1, g2 = [encoder_model.encoder.project(g) for g in [g1, g2]]
        loss = contrast_model(g1=g1, g2=g2, batch=data.batch)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.item()
    return epoch_loss


def test(encoder_model, dataloader):
    encoder_model.eval()
    x = []
    y = []
    for data in dataloader:
        data = data.to('cuda')
        if data.x is None:
            num_nodes = data.batch.size(0)
            data.x = torch.ones((num_nodes, 1), dtype=torch.float32, device=data.batch.device)
        _, g, _, _, _, _ = encoder_model(data.x, data.edge_index, data.batch)
        x.append(g)
        y.append(data.y)
    x = torch.cat(x, dim=0)
    y = torch.cat(y, dim=0)

    split = get_split(num_samples=x.size()[0], train_ratio=0.8, test_ratio=0.1)
    result = SVMEvaluator(linear=True)(x, y, split)
    return result


def main():
    device = torch.device('cuda')
    path = osp.join(osp.expanduser('~'), 'datasets')
    dataset = TUDataset(path, name='PTC_MR')
    dataloader = DataLoader(dataset, batch_size=128)
    input_dim = max(dataset.num_features, 1)

>     aug1 = A.EdgeAdding(pe=0.1)
>     aug2 = A.EdgeAdding(pe=0.1)

    gconv = GConv(input_dim=input_dim, hidden_dim=32, num_layers=2).to(device)
    encoder_model = Encoder(encoder=gconv, augmentor=(aug1, aug2)).to(device)
    contrast_model = DualBranchContrast(loss=L.InfoNCE(tau=0.2), mode='G2G').to(device)

    optimizer = Adam(encoder_model.parameters(), lr=0.01)

    with tqdm(total=100, desc='(T)') as pbar:
        for epoch in range(1, 101):
            loss = train(encoder_model, contrast_model, dataloader, optimizer)
            pbar.set_postfix({'loss': loss})
            pbar.update()

    test_result = test(encoder_model, dataloader)
    print(f'(E): Best test F1Mi={test_result["micro_f1"]:.4f}, F1Ma={test_result["macro_f1"]:.4f}')


if __name__ == '__main__':
    main()

PyGCL installation not finding DGL

I have tried installing PyGCL with multiple versions of DGL >= 0.7
I am running on Centos 7 in an HPC environment.
Here is the error I am receiving:
ERROR: Could not find a version that satisfies the requirement dgl>=0.7 (from PyGCL)

Steps to reproduce:

spaulus@test:~$ pip3.8 install --user PyGCL
Collecting PyGCL
  Using cached PyGCL-0.1.1-py3-none-any.whl (32 kB)
Requirement already satisfied: scipy in /opt/ohpc/pub/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/py-scipy-1.6.3-pz52lt4qo22xudj5rxfswm3ohjaed2t5/lib/python3.8/site-packages (from PyGCL) (1.6.3)
ERROR: Could not find a version that satisfies the requirement dgl>=0.7 (from PyGCL) (from versions: 0.1.0, 0.1.2, 0.1.3, 0.4.3, 0.4.3.post1, 0.4.3.post2, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.6.0, 0.6.0.post1, 0.6.1, 0.7a210406, 0.7a210407, 0.7a210408, 0.7a210409, 0.7a210410, 0.7a210412, 0.7a210413, 0.7a210414, 0.7a210415, 0.7a210416, 0.7a210420, 0.7a210421, 0.7a210422, 0.7a210423, 0.7a210424, 0.7a210425, 0.7a210426, 0.7a210427, 0.7a210429, 0.7a210501, 0.7a210503, 0.7a210506, 0.7a210507, 0.7a210508, 0.7a210511, 0.7a210512, 0.7a210513, 0.7a210514, 0.7a210515, 0.7a210517, 0.7a210518, 0.7a210519, 0.7a210520, 0.7a210525, 0.7a210527)
ERROR: No matching distribution found for dgl>=0.7 (from PyGCL)
spaulus@test:~$ pip3.8 list
Package               Version
--------------------- ----------
certifi               2021.10.8
charset-normalizer    2.0.7
decorator             5.1.0
dgl                   0.7a210525
googledrivedownloader 0.4
idna                  3.3
isodate               0.6.0
Jinja2                3.0.2
joblib                1.0.1
MarkupSafe            2.0.1
networkx              2.2
numpy                 1.20.1
pandas                1.3.4
pip                   20.2
pyparsing             3.0.4
python-dateutil       2.8.2
pytz                  2021.3
PyYAML                6.0
rdflib                6.0.2
requests              2.26.0
scikit-learn          0.24.1
scipy                 1.6.3
setuptools            50.3.2
six                   1.16.0
threadpoolctl         2.0.0
torch                 1.10.0
torch-geometric       2.0.2
torch-scatter         2.0.9
torch-sparse          0.6.12
tqdm                  4.59.0
typing-extensions     3.10.0.2
urllib3               1.26.7
yacs                  0.1.8
spaulus@test:~$ module list
Currently Loaded Modules:
  1) autotools                        9) gcc-9.2.0-gcc-8.3.0-ebpgkrt
  2) prun/1.3                        10) cuda-11.2.0-gcc-9.2.0-3fwlgae
  3) gnu8/8.3.0                      11) cudnn-8.0.4.30-11.1-gcc-9.2.0-fyvouhn
  4) ohpc                            12) py-pip-20.2-gcc-9.2.0-d66cbwk
  5) gcc/1                           13) py-scikit-learn-0.24.1-gcc-9.2.0-srlkj6p
  6) slurm/1                         14) py-numpy-1.20.1-gcc-9.2.0-25bs7fj
  7) openmpi/4.1.0                   15) py-tqdm-4.59.0-gcc-9.2.0-jliepte
  8) python-3.8.7-gcc-9.2.0-fn3m3au  16) py-networkx-2.2-gcc-8.3.0-ovwwomc

Any help would be greatly appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.