GithubHelp home page GithubHelp logo

qitianwu / nodeformer Goto Github PK

View Code? Open in Web Editor NEW
282.0 4.0 27.0 361 KB

The official implementation of NeurIPS22 spotlight paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification"

Python 25.80% Shell 1.41% Jupyter Notebook 72.79%
graph-neural-networks graph-structure-learning graph-transformer node-classification pytorch image-classification large-graph neurips-2022 pytorch-geometric relational-learning

nodeformer's Introduction

NodeFormer: A Graph Transformer with Linear Complexity

The official implementation for "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification" which is accepted to NeurIPS22 as a spotlight presentation.

Related materials: [paper], [slides], [blog Chinese | English], [vedio Chinese | English], [tutorial]

Two follow-up works: DIFFormer (ICLR2023 spotlight) a physics-inspired scalable Transformer, SGFormer (NeurIPS2023) a simplified Transformer scaling to billion-scale graphs

What's news

  • [2022.11.27] We release the early version of codes for reproducibility.

  • [2023.02.20] We provide the checkpoints of NodeFormer on ogbn-Proteins and Amazon2M (see here for details).

  • [2023.03.08] We add results on cora, citeseer, pubmed with semi-supervised random splits (see here for details).

  • [2023.04.24] Another work DIFFormer (with linear attention) will appear on ICLR2023. The open source implementation is ready.

  • [2023.04.27] Upload the script for figure plotting plot_main.ipynb which contains the exact scores used for our figures in the paper.

  • [2023.07.03] I gave a talk on LOG seminar about scalable graph Transformers. See the online video here.

  • [2023.09.30] A follow-up work SGFormer (with simplified one-layer attentional architecture) will appear on NeurIPS2023. The open source implementation is ready.

  • [2024.03.04] Gave a 10-min talk summarizing NodeFormer, DIFFormer and SGFormer. See the video.

This work takes an initial step for exploring Transformer-style graph encoder networks for large node classification graphs, dubbed as NodeFormer, as an alternative to common Graph Neural Networks, in particular for encoding nodes in an input graph into embeddings in latent space.

The highlights of NodeFormer

NodeFormer is a pioneering Transformer model for node classification on large graphs. NodeFormer scales all-pair message passing with efficient latent structure learning to million-level nodes. NodeFormer has several merits:

  • All-Pair Message Passing on Layer-specific Adaptive Structures. The feature propagation per layer is operated on a latent graph that potentially connect all the nodes, in contrast with the local propagation design of GNNs that only aggregates the embeddings of neighbored nodes.

  • Linear Complexity w.r.t. Node Numbers. The all-pair message passing on latent graphs that are optimized together only requires $O(N)$ complexity, empowered by our proposed kernelized Gumbel-Softmax operator. The largest demonstration of our model in our paper is the graph with 2M nodes, yet we believe it can even scale to larger ones with the mini-batch partition.

  • Efficient End-to-End Learning for Latent Structures. The optimization for the latent structures is allowed for end-to-end training with the model, making the whole learning process simple and efficient. E.g., the training on Cora only requires 1-2 minutes, while on OGBN-Proteins requires 1-2 hours in one run.

  • Flexibility for Inductive Learning and Graph-Free Scenarios. NodeFormer is flexible for handling new unseen nodes in testing and as well as predictive tasks without input graphs, e.g., image and text classification. It can also be used for interpretability analysis with the latent interactions among data points explicitly estimated.

Structures of the Codes

The key module of NodeFormer is the kernelized (Gumbel-)Softmax message passing which achieves all-pair message passing on a latent graph in each layer with $O(N)$ complexity. The nodeformer.py implements our model:

  • The functions kernelized_softmax() and kernelized_gumbel_softmax() implement the message passing per layer. The Gumbel version is only used for training.

  • The layer class NodeFormerConv implements one-layer feed-forward of NodeFormer (which contains MP on a latent graph, adding relational bias and computing edge-level reg loss from input graphs if available).

  • The model class NodeFormer implements the model that adopts standard input (node features, adjacency) and output (node prediction, edge loss).

For other files, the descriptions are below:

  • main.py is the pipeline for full-graph training/evaluation.

  • main-batch.py is the pipeline for training with random mini-batch partition for large datasets.

Datasets

We provide an easy access to the used datasets in the Google drive. This also contains other commonly used graph datasets, except the large-scale graphs OGBN-Proteins and Amazon2M which can be downloaded automatically with our codes See here for how to get the datasets ready for running our codes.

The information and sources of datasets are summarized below

  • Transductive Node Classification (Sec 4.1 in paper): we use two homophilous graphs Cora and Citeseer and two heterophilic graphs Deezer-Europe and Actor. These graph datasets are all public available at Pytorch Geometric. The Deezer dataset is provided from Non-Homophilous Benchmark, and the Actor (also called Film) dataset is provided by Geom-GCN.

  • Large Graph Datasets (Sec 4.2 in paper): we use OGBN-Proteins and Amazon2M as two large-scale datasets. These datasets are available at OGB. The original Amazon2M is collected by ClusterGCN and later used to construct the OGBN-Products.

  • Graph-Enhanced Classification (Sec 4.3 in paper): we also consider two datasets without input graphs, i.e., Mini-Imagenet and 20News-Group for image and text classification, respectively. The Mini-Imagenet dataset is provided by Matching Network, and 20News-Group is available at Scikit-Learn

Key results

Dataset Split Metric Result Hyper-parameters/Checkpoints
Cora random 50%/25%/25% Accuracy 88.80 (0.26) train script
CiteSeer random 50%/25%/25% Accuracy 76.33 (0.59) train script
Deezer random 50%/25%/25% ROC-AUC 71.24 (0.32) train script
Actor random 50%/25%/25% Accuracy 35.31 (0.89) train script
OGBN-Proteins public split ROC-AUC 77.45 (1.15) train script, checkpoint, test script
Amazon2M random 50%/25%/25% Accuracy 87.85 (0.24) train script, checkpoint and fixed splits, test script
Mini-ImageNet (kNN, k=5) random 50%/25%/25% Accuracy 86.77 (0.45) train script
Mini-ImageNet (no graph) random 50%/25%/25% Accuracy 87.46 (0.36) train script
20News-Group (kNN, k=5) random 50%/25%/25% Accuracy 66.01 (1.18) train script
20News-Group (no graph) random 50%/25%/25% Accuracy 64.71 (1.33) train script
Cora 20 nodes per class for train Accuracy 83.4 (0.2) train script
CiteSeer 20 nodes per class for train Accuracy 73.0 (0.3) train script
Pubmed 20 nodes per class for train Accuracy 81.5 (0.4) train script

How to run our codes?

  1. Install the required package according to requirements.txt

  2. Create a folder ../data and download the datasets from here (For large graph datasets Proteins and Amazon2M, the datasets will be automatically downloaded)

  3. To run the training and evaluation on eight datasets we used, one can use the scripts in run.sh:

# node classification on small datasets
python main.py --dataset cora --rand_split --metric acc --method nodeformer --lr 0.001 \
--weight_decay 5e-3 --num_layers 2 --hidden_channels 32 --num_heads 4 --rb_order 2 --rb_trans sigmoid \
--lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --runs 5 --epochs 1000 --device 0

# node classification on large datasets
python main-batch.py --dataset ogbn-proteins --metric rocauc --method nodeformer --lr 1e-2 \
--weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity \
--lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 10000 \
--runs 5 --epochs 1000 --eval_step 9 --device 0

# image and text datasets
python main.py --dataset mini --metric acc --rand_split --method nodeformer --lr 0.001\
--weight_decay 5e-3 --num_layers 2 --hidden_channels 128 --num_heads 6\
--rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel \
--run 5 --epochs 300 --device 0
  1. We also provide the checkpoints of NodeFormer on two large datasets, OGBN-Proteins and Amazon2M. One can download the trained models into ../model/ and run the scripts in run_test_large_graph.sh for reproducing the results.
  • For Amazon2M, to ensure obtaining the result as ours, one need to download the fixed splits from here to ../data/ogb/ogbn_products/split/random_0.5_0.25.

Potential Applications and More Usability

NodeFormer can in principle be applied to solve three families of tasks:

  • Node-Level Prediction on (Large) Graphs: use NodeFormer to replace GNN encoder as an encoder backbone for graph-structured data.

  • General Machine Learning Problems: use NodeFormer as an encoder that computes instance representations by their global all-pair interactions, for general ML tasks, e.g., classification or regression.

  • Learning Latent Graph Structures: use NodeFormer to learn latent graphs among a set of objects.

Our work takes an initial step for exploring how to build a scalable graph Transformer model for node classification, and we also believe there exists ample room for further research and development as future works. One can also use our implementation kernelized_softmax() and kernelized_gumbel_softmax() for related projects concerning e.g., structure learning and communication, where the scalability matters.

Citation

If you find our codes useful, please consider citing our work

      @inproceedings{wu2022nodeformer,
      title = {NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification},
      author = {Qitian Wu and Wentao Zhao and Zenan Li and David Wipf and Junchi Yan},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      year = {2022}
      }

ACK

We acknowledge the implementation of the softmax kernel

https://github.com/lucidrains/performer-pytorch

and the training pipeline for GNN node classification

https://github.com/CUAI/Non-Homophily-Benchmarks

nodeformer's People

Contributors

qitianwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nodeformer's Issues

Clarification on Visualization Techniques for Graphs 4 and 7 under Linear Complexity Constraints

I am currently examining the methodologies presented in your paper and am unclear about the techniques used to visualize Graphs 4 and 7 under linear complexity constraints. According to Equation 7, it seems infeasible to compute the attention scores between any two nodes using a linear approximation.
Could you please clarify the specific approach utilized for these visualizations under the mentioned constraints? Additionally, if there are any potential modifications or alternative methods that could be recommended to handle these calculations more feasibly, I would appreciate your insights.

$$z_{(l+1)u} \approx \frac{\phi(q_{u}/\sqrt{\tau})^T\sum_{\nu=1}^N e^{g_{\nu}/\tau} \phi(k_{\nu}/\sqrt{\tau}) v_{\nu}^T}{\phi(q_{u}/\sqrt{\tau})^T\sum_{\omega=1}^N e^{g_{\omega}/\tau} \phi(k_{\omega}/\sqrt{\tau}) } $$

image

image

how to reproduce the result of Mini-ImageNet( w/o graph) ?

Thank you for your great work.
I've noticed that when the input k-NN graph is not used, Nodeformer can yield superior results on Mini-ImageNet. This suggests that the k-NN graphs are not necessarily informative and besides, Nodeformer learns useful latent graph structures from data.
I tried to train Nodeformer on Mini-ImageNet without input k-NN graph, removing both the edge regularization and relational bias, but I only achieved an accuracy of 83.76 ± 0.78%.When training Nodeformer on Mini-ImageNet with a k-NN graph (k=5), I was able to achieve an accuracy of 87.01 ± 0.45%.I wonder if I miss some important details during the training process.

Regarding the edge-level regularization w/o input graph

Hello,
thank you for your very awesome work!
I just come up with one question, if there is no input graph, does that mean we can not construct any edge-level regularization loss? If so, is that enough to train the model with relatively high degree of freedom? And I want to find out if there are some suggestions on how to address this problem if there is no input graph and not enough supervised information? Thank you!

Same results for the last four runs

Hello, I've encountered an issue while running the model on the official splits of Cora, CiteSeer, and Pubmed. Despite conducting five separate runs, the results for the last four runs are identical. I have meticulously ensured that the parameters are reset at the beginning of each run. Could this be a mere coincidence, or might there be an underlying error that I'm missing?

Deezer Europe Dataset on SGFormer/NodeFormer

Hi,

In the SGFormer paper, that paper claims that NodeFormer achieves a 66% test accuracy on the DeezerEurope dataset, whereas on the NodeFormer paper, the paper claims that NodeFormer achieves a 71% test accuracy on the DeezerEurope dataset.

Which one is accurate, and what is the reason for the discrepancy? I ran the NodeFormer code on the DeezerEurope dataset and got around 71% test accuracy.

Thanks

Reproducing results on Deezer dataset

Hi there,

Thanks for the fantastic work!

I'm running run.sh and find out that the metric for Deezer is set to be "rocauc", while the paper uses accuracy as the metric on Deezer (shown in Figure 2). When I change the metric from "rocauc" to "acc" in run.sh, the averaged accuracy is 65.12%, which is much lower than the accuracy reported in the paper (~71%). Could you kindly let me know the proper hyperparameter setting for reproducing the results on Deezer? Thanks in advance!

求一份PPT

你好,这篇论文的PPT疑似失效了,作者大大能再上传一份新的PPT吗?

Is the relatinoal bias still calculated in w/o graph circumstances?

Hi Qitian,

Thanks for this fantastic work!

In nodeformer.py line 284:

        # compute update by relational bias of input adjacency, requires O(E)
        for i in range(self.rb_order):
            z_next += add_conv_relational_bias(value, adjs[i], self.b[i], self.rb_trans)

I am a little bit confused that, in w/o graph circumstances (e.g. #2 ,) is the relatinoal bias still calculated?
Thanks.

Accurate scores of Figure 2 in the paper

Hi, your paper is interesting and I would like to follow-up your work. But I noticed that when it comes to the experiment on cora, citeseer, deezer-europe and actor (Figure2 in the paper) there are only plots but not the accurate scores. Would it be convenient to share the accurate scores used for plotting? Thanks in advance.

Code without inductive settings????

Hello, you explained in the experiment of your paper that this is an inductive setting, but I did not see the code for the inference stage of unknown nodes in the inductive setting. Do you have this code?

Errors occur while calling NodePropPredDataset in dataset.py

I can run your code correctly on small dataset by using scripts in run.sh and get similar results within the paper but when I'm trying to reproduce nodeformer on large graph dataset, it comes out an error on both amazon2m and ogb proteins dataset.

Traceback (most recent call last):
  File "main-batch.py", line 43, in <module>
    dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 102, in load_dataset
    dataset = load_amazon2m_dataset(data_dir)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 308, in load_amazon2m_dataset
    ogb_dataset = NodePropPredDataset(name='ogbn-products', root=f'{data_dir}/ogb')
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
    self.pre_process()
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 111, in pre_process
    additional_node_files = self.meta_info['additional node files'].split(',')
AttributeError: 'float' object has no attribute 'split'

This error occurs in dataset.py while calling the NodePropPredDataset functiuon :

ogb_dataset = NodePropPredDataset(name=name, root=f'{data_dir}/ogb')

ogb_dataset = NodePropPredDataset(name='ogbn-products', root=f'{data_dir}/ogb')

I tried to fix this error and ran into the implementation of NodePropPredDataset by changing the 'float' object into 'str':

additional_node_files = str(self.meta_info['additional node files']).split(',')

It passed, but another error comes out :

Loading necessary files...
This might take a while.
Traceback (most recent call last):
  File "main-batch.py", line 43, in <module>
    dataset = load_dataset(args.data_dir, args.dataset, args.sub_dataset)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 98, in load_dataset
    dataset = load_proteins_dataset(data_dir)
  File "/home/workspace/NF/NodeFormer/dataset.py", line 268, in load_proteins_dataset
    ogb_dataset = NodePropPredDataset(name='ogbn-proteins', root=f'{data_dir}/ogb')
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 63, in __init__
    self.pre_process()
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/nodeproppred/dataset.py", line 137, in pre_process
    self.graph = read_csv_graph_raw(raw_dir, add_inverse_edge = add_inverse_edge, additional_node_files = additional_node_files, additional_edge_files = additional_edge_files)[0] # only a single graph
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/ogb/io/read_graph_raw.py", line 83, in read_csv_graph_raw
    temp = pd.read_csv(osp.join(raw_dir, additional_file + '.csv.gz'), compression='gzip', header = None).values
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1661, in _make_engine
    self.handles = get_handle(
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/site-packages/pandas/io/common.py", line 753, in get_handle
    handle = gzip.GzipFile(  # type: ignore[assignment]
  File "/root/anaconda3/envs/nodeformer/lib/python3.8/gzip.py", line 173, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '../data//ogb/ogbn_proteins/raw/nan.csv.gz'

I don't know what‘s happening...... If you need more information, please let me know.

System Info

  • WSL2 Ubuntu 20.04LTS
  • anaconda3
  • python 3.8.16
  • torch 1.9.0+cu111
  • torch-cluster 1.5.9
  • torch-geometric 1.7.2
  • torch-sparse 0.6.12
  • torch-scatter 2.0.9
  • torch-spline-conv 1.2.1
  • ogb 1.3.1
  • numpy 1.22.4
  • networkx 2.6.1
  • scipy 1.6.2
  • scikit-learn 1.1.3

What is the principle of exchanging the first two dimensions when calculating QKV attention?

When reading the source code of NodeFormer, I found that when calculating QKV attention, the first and second dimensions of query/key/value were exchanged, such as lines 169-171 of nodeformer.py. After calculating attention, the first two dimensions were exchanged again when performing normalization.
At first, I thought this work was unnecessary until I commented out the code and discovered a program memory overflow. Therefore, I am very curious about the principle of this step.
Does placing the node_number in the second dimension affect the complexity of matrix multiplication when calculating the dot product of key and value? Therefore, the node_number was placed in the first dimension in advance.

the training code for the GCN, SGC models on the ogbn-proteins and amazon2m datasets

Hello! The work you have done is very interesting and has inspired me a lot. I am trying to reproduce the results of your paper for the GCN, SGC models on the ogbn-proteins and amazon2m datasets. Can you provide me with the training code in your experiments (similar to main-batch.py, but main-batch.py only contains the training code for nodeformer, not for GCN, SGC) and the associated hyperparameters? This would allow us to better follow your work. Thanks!

BUG: x_i doesn't match edge_index_i in main-batch.py

Description

Maybe there is a bug in main-batch.py because of random permutation of index:

NodeFormer/main-batch.py

Lines 133 to 144 in 64d2658

idx = torch.randperm(train_idx.size(0))
for i in range(num_batch):
idx_i = train_idx[idx[i*args.batch_size:(i+1)*args.batch_size]]
x_i = x[idx_i].to(device)
adjs_i = []
edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
adjs_i.append(edge_index_i.to(device))
for k in range(args.rb_order - 1):
edge_index_i, _ = subgraph(idx_i, adjs[k+1], num_nodes=n, relabel_nodes=True)
adjs_i.append(edge_index_i.to(device))
optimizer.zero_grad()
out_i, link_loss_ = model(x_i, adjs_i, args.tau)

In line 135, train_idx is sorted but idx_i is shuffled, then in line 136 x_i is also randomly permuted which means the original order of node has been changed.
However in line 138, node idx in adjs[0] is still in order and subgraph() also remain the order.
In this way, in line 144, x_i doesn't align with adjs_i.

Then, I change the code like this way so that the node idx can keep the original order:

idx = torch.randperm(train_idx.size(0))
for i in range(num_batch):
    idx_i = train_idx[idx[i * args.batch_size:(i + 1) * args.batch_size]]
    x_i = x[idx_i].to(device)
    adjs_i = []
    edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
   
    # Modify
    idx_perm = torch.argsort(idx_i)
    edge_index_i = idx_perm[edge_index_i]

    adjs_i.append(edge_index_i.to(device))
    for k in range(args.rb_order - 1):
        edge_index_i, _ = subgraph(idx_i, adjs[k + 1], num_nodes=n, relabel_nodes=True)
        adjs_i.append(edge_index_i.to(device))
    optimizer.zero_grad()
    out_i, link_loss_ = model(x_i, adjs_i, args.tau)

Experiments

python main-batch.py --dataset ogbn-arxiv --metric acc --method nodeformer --lr 1e-2 --weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity --lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 20000 --runs 1 --epochs 1000 --eval_step 9 --device 0

Before modification, test acc in ogbn-arxiv is only about 55%. After modification, test acc of it can be over 65%.

'NoneType' object has no attribute 'origin'

When I run the main.py and I have the dataset named 20news,but the code can't run.
The traceback is that:
python main.py --dataset 20news --metric acc --rand_split --method nodeformer --lr 0.001 --weight_decay 5e-3 --num_layers 2 --hidden_channels 64 --num_heads 4 --rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --run 5 --epochs 200 --device 1
Traceback (most recent call last):
File "main.py", line 8, in
from torch_geometric.utils import to_undirected, remove_self_loops, add_self_loops
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/init.py", line 5, in
import torch_geometric.data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in
from torch_sparse import coalesce, SparseTensor
File "/home/zhengyt/anaconda3/envs/Node/lib/python3.8/site-packages/torch_sparse/init.py", line 14, in
torch.ops.load_library(importlib.machinery.PathFinder().find_spec(
AttributeError: 'NoneType' object has no attribute 'origin'

Runtime error when running main-batch.py

When running the following command with the amazon2m dataset:

python main-batch.py --dataset amazon2m --rand_split --metric acc --method nodeformer  --lr 1e-2 --weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity --lamda 0.01 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 100000 --runs 5 --epochs 1000 --eval_step 9 --device 0

I encountered the following runtime error:

Traceback (most recent call last):
  File "/home/s111062588/NodeFormer/main-batch.py", line 136, in <module>
    x_i = x[idx_i].to(device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Could you please explain why this error occurred and how to resolve it?
Thank you!

Regarding DGL Implementation

Has the author implemented this in the DGL framework, or are there any examples of implementing a graph transformer using DGL?

ValueError: Both 'src' and 'index' must be on the same device (got 'cpu' and 'cuda:1')

This bug occurs when I do node classification experiments on the dataset ogbn-proteins.

Traceback (most recent call last):
File "/NodeFormer/main-batch.py", line 138, in
edge_index_i, _ = subgraph(idx_i, adjs[0], num_nodes=n, relabel_nodes=True)
File "/python3.10/site-packages/torch_geometric/utils/subgraph.py", line 104, in subgraph
edge_index, _ = map_index(
File "/python3.10/site-packages/torch_geometric/utils/map.py", line 63, in map_index
raise ValueError(f"Both 'src' and 'index' must be on the same device "

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.