deepgraphlearning / nbfnet Goto Github PK

View Code? Open in Web Editor NEW

191.0 191.0 32.0 270 KB

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

License: MIT License

Python 100.00%

graph-neural-networks knowledge-graph link-prediction reasoning

nbfnet's People

Contributors

Stargazers

Watchers

nbfnet's Issues

Unable to run the code with error importing 'spmm'

Hi, I followed the instruction to reproduce results but had a problem with module 'spmm'. My torch version is 1.8.2, torchdrug is 0.1.2. Any ideas how to fix it?

12:53:15 Epoch 0 begin
Traceback (most recent call last):
File "script/run.py", line 78, in
File "script/run.py", line 30, in train_and_validate
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\core\engine.py", line 143, in train
loss, metric = model(batch)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\tasks\reasoning.py", line 85, in forward
pred = self.predict(batch, all_loss, metric)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\task.py", line 288, in predict
pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 56, in wrapper
return forward(self, *args, **kwargs)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\layer.py", line 140, in message_and_aggregate
sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 378, in generalized_rspmm
return Function.apply(sparse.coalesce(), relation, input)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 172, in forward
forward = spmm.rspmm_add_mul_forward_cuda
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 27, in getattr
return getattr(self.module, key)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 21, in get
result = self.func(obj)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1079, in load
return _jit_compile(
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1700, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "C:\Users\Pengfei\anaconda3\envs\py38\lib\imp.py", line 296, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'spmm'

Seems unable to utilize multiple GPUs

Hi there.

I have tried running this code on one of my machine with four RTX3090 GPUs (GPU memory 24GB for each)

python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]

I do not change any other parts of this repo. However, I encountered the CUDA error saying that I need more GPU memory. Later I modified this code as follows:

python script/run.py -c config/inductive/wn18rr.yaml --gpus [0]

and run it on a machine with one A100 GPU with 40GB GPU memory. The code runs successfully and costs roughly 32GB GPU memory. I am really puzzled for this: why the code does not properly utilize the total 24GB*4=96GB GPU memory and still report a memory issue? Is there something wrong with my setups?

Error when loading pretrained epoch

Hi,

I tried a new model on NBFNet and tried to load it. But I cannot load it, the issues seem to come from the torchdrug/patch.py. I wonder if you have a good solution on this:
Traceback (most recent call last):
File "script/run.py", line 60, in
solver = util.build_solver(cfg, dataset)
File "/shared-datadrive/shared-training/NBFNet/nbfnet/util.py", line 120, in build_solver
solver.load(cfg.checkpoint)
File "/home/azureuser/.pyenv/versions/nbfnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdrug/core/engine.py", line 231, in load
self.model.load_state_dict(state["model"])
File "/home/azureuser/.pyenv/versions/nbfnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1497, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for KnowledgeGraphCompletion:
While copying the parameter named "graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
While copying the parameter named "fact_graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>

And I checked the module in nn.Module is actually overwritten by PatchedModule
-> self.model.load_state_dict(state["model"])
(Pdb) nn.Module
<class 'torchdrug.patch.PatchedModule'>

Unable to run the code with error regarding 'mpiicpc'

Hello,

I followed the instruction to install the torchdrug-related packages and matching PyTorch/CUDA version. However, I got this following error when initializing the code. Any ideas to fix this? The system has intel/19.0.3.199 loaded.

01:24:15   Epoch 0 begin
Traceback (most recent call last):
  File "script/run.py", line 62, in <module>
    train_and_validate(cfg, solver)
  File "script/run.py", line 27, in train_and_validate
    solver.train(**kwargs)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/core/engine.py", line 143, in train
    loss, metric = model(batch)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/tasks/reasoning.py", line 85, in forward
    pred = self.predict(batch, all_loss, metric)
  File "~/Workspace/Python/NBFNet/nbfnet/task.py", line 288, in predict
    pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 149, in forward
    output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
  File "<decorator-gen-888>", line 2, in bellmanford
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 56, in wrapper
    return forward(self, *args, **kwargs)
  File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 115, in bellmanford
    hidden = layer(step_graph, layer_input)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/layers/conv.py", line 91, in forward
    update = self.message_and_aggregate(graph, input)
  File "~/Workspace/Python/NBFNet/nbfnet/layer.py", line 124, in message_and_aggregate
    adjacency = graph.adjacency.transpose(0, 1)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
    result = self.func(obj)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/data/graph.py", line 658, in adjacency
    return utils.sparse_coo_tensor(self.edge_list.t(), self.edge_weight, self.shape)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 182, in sparse_coo_tensor
    return torch_ext.sparse_coo_tensor_unsafe(indices, values, size)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 27, in __getattr__
    return getattr(self.module, key)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
    result = self.func(obj)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 31, in module
    return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
    return _jit_compile(
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1292, in _jit_compile
    _write_ninja_file_and_build_library(
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1378, in _write_ninja_file_and_build_library
    check_compiler_abi_compatibility(compiler)
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
    if not check_compiler_ok_for_platform(compiler):
  File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 249, in check_compiler_ok_for_platform
    version_string = subprocess.check_output([compiler, '-v'], stderr=subprocess.STDOUT).decode()
  File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['icpc', '-v']' returned non-zero exit status 1.

Issues of the experiment settings of inductive link prediction

First of all, thanks for the awesome code!

The authors claim that they follow the experiment settings of GraIL, which draws 50 negative triplets for each positive triplet and use the filtered ranking. However, I do not find the corresponding process of drawing 50 negative samples in the code. Can the authors please answer my question?

Inverse relations

Hello,

I was wondering whether there is a particular reason for adding the inverse relations in the graph.

Thanks in advance for your answer!

Problems about ninja

Hi, Doctor. I meet some problems when I run the code on the Linux.
I do really need your help. Could you help me? It really troubles me a lot.

15:43:32   Preprocess training set
15:43:36   >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
15:43:36   Epoch 0 begin
Traceback (most recent call last):
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1666, in _run_ninja_build
    subprocess.run(
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "script/run.py", line 62, in <module>
    train_and_validate(cfg, solver)
  File "script/run.py", line 27, in train_and_validate
    solver.train(**kwargs)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/core/engine.py", line 143, in train
    loss, metric = model(batch)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/tasks/reasoning.py", line 85, in forward
    pred = self.predict(batch, all_loss, metric)
  File "/data1/home/wza/nbfnet/nbfnet/task.py", line 288, in predict
    pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/home/wza/nbfnet/nbfnet/model.py", line 149, in forward
    output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 56, in wrapper
    return forward(self, *args, **kwargs)
  File "/data1/home/wza/nbfnet/nbfnet/model.py", line 115, in bellmanford
    hidden = layer(step_graph, layer_input)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/conv.py", line 91, in forward
    update = self.message_and_aggregate(graph, input)
  File "/data1/home/wza/nbfnet/nbfnet/layer.py", line 140, in message_and_aggregate
    sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/spmm.py", line 378, in generalized_rspmm
    return Function.apply(sparse.coalesce(), relation, input)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/spmm.py", line 172, in forward
    forward = spmm.rspmm_add_mul_forward_cuda
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 27, in __getattr__
    return getattr(self.module, key)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
    result = self.func(obj)
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 31, in module
    return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1080, in load
    return _jit_compile(
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1293, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1405, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'spmm': [1/3] /usr/local/cuda-10.2/bin/nvcc  -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu -o rspmm.cuda.o
FAILED: rspmm.cuda.o
/usr/local/cuda-10.2/bin/nvcc  -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu -o rspmm.cuda.o
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu: In instantiation of ‘at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()>::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’:
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:600:   required from ‘struct at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:608:   required from ‘at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:607:   required from ‘struct at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:246:28:   required from ‘at::Tensor at::rspmm_forward_cuda(const SparseTensor&, const at::Tensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:356:193:   required from here
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/rspmm.cu:244:37: internal compiler error: in tsubst_copy, at cp/pt.c:13189
     const int num_row_block = (num_row + row_per_block - 1) / row_per_block;
                                     ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
[2/3] /usr/local/cuda-10.2/bin/nvcc  -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
FAILED: spmm.cuda.o
/usr/local/cuda-10.2/bin/nvcc  -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/TH -isystem /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-10.2/include -isystem /data1/home/wza/.conda/envs/linkp/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu: In instantiation of ‘at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()>::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’:
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:506:   required from ‘struct at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:514:   required from ‘at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&)::<lambda()> [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:512:   required from ‘struct at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]::<lambda()>’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:219:28:   required from ‘at::Tensor at::spmm_forward_cuda(const SparseTensor&, const at::Tensor&) [with NaryOp = at::NaryAdd; BinaryOp = at::BinaryMul; at::sparse::SparseTensor = at::Tensor]’
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:315:157:   required from here
/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu:217:37: internal compiler error: in tsubst_copy, at cp/pt.c:13189
     const int num_row_block = (num_row + row_per_block - 1) / row_per_block;
                                     ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-5/README.Bugs> for instructions.
ninja: build stopped: subcommand failed.

JIT compile fail when using `functional.generalized_rspmm` with CUDA on Linux

Hey,

Most likely this is an error with torch drug itself however when I try to run any of the examples from the readme, the code will crash with the following error:

spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
FAILED: spmm.cuda.o
/opt/scp/software/CUDA/11.1.0/bin/nvcc --generate-dependencies-with-compile --dependency-output spmm.cuda.o.d -DTORCH_EXTENSION_NAME=spmm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/TH -isystem /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torch/include/THC -isystem /opt/scp/software/CUDA/11.1.0/include -isystem /home/user/miniconda3/envs/path/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -std=c++14 -c /home/user/miniconda3/envs/path/lib/python3.8/site-packages/torchdrug/layers/functional/extension/spmm.cu -o spmm.cuda.o
/opt/software/CUDA/11.1.0/include/cuComplex.h: In function ‘float cuCabsf(cuFloatComplex)’:
/opt/software/CUDA/11.1.0/include/cuComplex.h:179:16: error: expected ‘)’ before numeric constant

This only occurs on a GPU linux machine, which is using CUDA 11.1 and GCC 10.3.

The conda env is as follows:

blas                      1.0                         mkl
boost                     1.74.0           py38hc10631b_3    conda-forge
boost-cpp                 1.74.0               h9359b55_0    conda-forge
brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
cairo                     1.16.0            h3fc0475_1005    conda-forge
certifi                   2021.10.8        py38h578d9bd_1    conda-forge
cffi                      1.15.0           py38hd667e15_1
charset-normalizer        2.0.10             pyhd8ed1ab_0    conda-forge
colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
cryptography              35.0.0           py38ha5dfef3_0    conda-forge
cudatoolkit               11.1.1               h6406543_8    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
decorator                 4.4.2                      py_0    conda-forge
easydict                  1.9                        py_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
glib                      2.69.1               h4ff587b_1
icu                       67.1                 he1b5a44_0    conda-forge
idna                      3.3                pyhd8ed1ab_0    conda-forge
intel-openmp              2021.4.0          h06a4308_3561
jinja2                    3.0.3              pyhd8ed1ab_0    conda-forge
joblib                    1.1.0              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kiwisolver                1.3.1            py38h2531618_0
ld_impl_linux-64          2.35.1               h7274673_9
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5101ec6_17
libgfortran-ng            7.5.0               h14aa051_19    conda-forge
libgfortran4              7.5.0               h14aa051_19    conda-forge
libgomp                   9.3.0               h5101ec6_17
libiconv                  1.16                 h516909a_0    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libstdcxx-ng              9.3.0               hd4cf53a_17
libtiff                   4.0.10            hc3755c2_1005    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libuv                     1.42.0               h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxml2                   2.9.10               h68273f3_2    conda-forge
littleutils               0.2.2                      py_0    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markupsafe                2.0.1            py38h497a2fe_0    conda-forge
matplotlib                3.2.2                         1    conda-forge
matplotlib-base           3.2.2            py38h5d868c9_1    conda-forge
mkl                       2021.4.0           h06a4308_640
mkl-service               2.4.0            py38h497a2fe_0    conda-forge
mkl_fft                   1.3.1            py38hd3c417c_0
mkl_random                1.2.2            py38h1abd341_0    conda-forge
ncurses                   6.3                  h7f8727e_2
networkx                  2.5.1              pyhd8ed1ab_0    conda-forge
ninja                     1.10.2               h4bd325d_0    conda-forge
numpy                     1.21.2           py38h20f2e39_0
numpy-base                1.21.2           py38h79a1101_0
ogb                       1.3.2              pyhd8ed1ab_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openssl                   1.1.1m               h7f8727e_0
outdated                  0.2.1              pyhd8ed1ab_0    conda-forge
pandas                    1.2.5            py38h1abd341_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    6.2.1            py38h6b7be26_0    conda-forge
pip                       21.2.4           py38h06a4308_0
pixman                    0.38.0            h516909a_1003    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycairo                   1.20.1           py38hf61ee4a_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyopenssl                 21.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1            py38h578d9bd_4    conda-forge
python                    3.8.12               h12debd9_0
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.8                      2_cp38    conda-forge
pytorch                   1.8.2           py3.8_cuda11.1_cudnn8.0.5_0    pytorch-lts
pytorch-scatter           2.0.8           py38_torch_1.8.0_cu111    pyg
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py38h497a2fe_0    conda-forge
rdkit                     2020.09.5        py38h2bca085_0    conda-forge
readline                  8.1.2                h7f8727e_1
reportlab                 3.5.68           py38hadf75a6_0    conda-forge
requests                  2.27.1             pyhd8ed1ab_0    conda-forge
scikit-learn              1.0.2            py38h51133e4_1
scipy                     1.7.3            py38hc147768_0
setuptools                58.0.4           py38h06a4308_0
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlalchemy                1.3.23           py38h497a2fe_0    conda-forge
sqlite                    3.37.0               hc218d9a_0
threadpoolctl             3.0.0              pyh8a188c0_0    conda-forge
tk                        8.6.11               h1ccaba5_0
torchdrug                 0.1.2                  ha710097    milagraph
tornado                   6.1              py38h497a2fe_1    conda-forge
tqdm                      4.62.3             pyhd8ed1ab_0    conda-forge
typing_extensions         4.0.1              pyha770c72_0    conda-forge
urllib3                   1.26.8             pyhd8ed1ab_1    conda-forge
wheel                     0.37.1             pyhd3eb1b0_0
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h7f98852_1    conda-forge
xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h7b6447c_0
yaml                      0.2.5                h516909a_0    conda-forge
zlib                      1.2.11               h7f8727e_4
zstd                      1.4.9                ha95c52a_0    conda-forge

Any ideas how to get this to run?

Many thanks!

Unable to run the code with ImportError in cpp_extension

Hi! I followed the instruction to install the packages. But now I'm getting an ImportError when reproducing the results. The error is as following. I also tried rm -r ~/.cache/torch_extensions/* as suggested in Readme but that will cause more error.

Traceback (most recent call last):
File "script/run.py", line 69, in
train_and_validate(cfg, solver)
File "script/run.py", line 28, in train_and_validate
solver.evaluate("test")
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/core/engine.py", line 206, in evaluate
pred, target = model.predict_and_target(batch)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/tasks/task.py", line 27, in predict_and_target
return self.predict(batch, all_loss, metric), self.target(batch)
File "/home/lja/git_clone/NBFNet/nbfnet/task.py", line 277, in predict
t_pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/decorator.py", line 232, in fun
return caller(func, *(extras + args), **kw)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 88, in wrapper
result = forward(self, *args, **kwargs)
File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "/home/lja/git_clone/NBFNet/nbfnet/layer.py", line 140, in message_and_aggregate
sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 378, in generalized_rspmm
return Function.apply(sparse.coalesce(), relation, input)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 172, in forward
forward = spmm.rspmm_add_mul_forward_cuda
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 27, in getattr
return getattr(self.module, key)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 21, in get
result = self.func(obj)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1144, in load
return _jit_compile(
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1382, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1776, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 556, in module_from_spec
File "", line 1166, in create_module
File "", line 219, in _call_with_frames_removed
ImportError: /home/lja/.cache/torch_extensions/spmm_0/spmm.so: cannot open shared object file: No such file or directory

I'm using torch1.11+cuda11.3 \ torchdrug0.1.2

Do you know how to dealing with this? Any help is appreciated!
By the way, in other issues I noticed an enviroment.yml would be released. Where can I find that? Thanks!

[Feature Request] `Dockerfile` / `environment.yml` for better reproducibility

Congratulations to the authors for NeurIPS'21, looking forward to your talk during LoGaG

While installing the project on VMs and local systems, I've been running into multiple issues getting the correct package versions installed. Be it CUDA errors while installing torch-scatter and torchdrug or simply pybind11 issues. Having a Dockerfile would help out with preventing such errors and make reproducibility + experimentation easier.

I think it'd be easier and better for there to be a Docker image for torchdrug itself and then the image for NBFNet would just use that as the base image. More than happy to take this up.

This way one could also use the nvidia container toolkit for running experiments across multiple GPUs/nodes easily.

Code for reproducing wikikg90m

Hi!

Could you provide the code required for training on WikiKG90M?

why this error and how to solve it

RuntimeError: Error(s) in loading state_dict for KnowledgeGraphCompletion:
While copying the parameter named "graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>
While copying the parameter named "fact_graph", expected torch.Tensor or Tensor-like object from checkpoint but received <class 'torchdrug.data.graph.Graph'>

not able to execute this code

Train on new datasets.

Hi,
Thank you for your wonderful work and open source code.

I want to know how to train other KGs on the KG completion task? Such as FB15K.

Thanks very much.

Pretrained models

Do you have pretrained models available for FB15k-237/WN18RR that I can use to run evaluation?

Thanks

[Question] Proportion of training triples used

Hello!

First of all thanks so much for this awesome publication & codebase.

I'm in the process of tweaking a config for training NBFNet, and trying to understand the proportion of the training triples used when training on ogbl-biokg using the provided config config/knowledge_graph/ogbl-biokg.yaml.

Since batch_size: 8, batch_per_epoch: 200 and num_epoch: 10, and the number of training triples in ogbl-biokg being 4,762,678, is it correct to assume that only (8 * 200 * 10)/4,762,678 = 0.000335... ≈ 0.34% of the training triples is used for the entire training run?

It seems very small and I'm most likely missing some vital implementation details - I'd appreciate your help.

Thanks so much!

Problem about wikikg90m

Hello,
Thank you for your wonderful work .
Can you provide the code for NBFNet to implement wikikg90m? How can I reproduce this result? I hope to get your help.

Hits@10 of RotatE is higher than original paper.

Hi,

I found the Hits@10 of RotatE in FB15k237 (0.553) is higher than original paper (0.533). And others are same.

Is this a recording error or did you improve the performance of RotatE?

Reproduction of SEAL baseline

Hello. I have difficulties reproducing the results of SEAL baselines in the paper. Could you please provide more details? For example, the code base you use and the hyperparameters?

deepgraphlearning / nbfnet Goto Github PK

nbfnet's People

Contributors

Stargazers

Watchers

Forkers

nbfnet's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs