GithubHelp home page GithubHelp logo

bindog / pytorch-model-parallel Goto Github PK

View Code? Open in Web Editor NEW
77.0 4.0 19.0 87 KB

A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch

Python 100.00%
model-parallel distributed-training half-precision re-id pytorch

pytorch-model-parallel's People

Contributors

bindog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pytorch-model-parallel's Issues

RuntimeError: Could not run 'aten::nonzero' with arguments from the 'SparseCUDA' backend. 'aten::nonzero' is only available for these backends: [CPU, CUDA, Autograd, Profiler, Tracer].

pytorch1.6:
cuda10.2
titan rtx * 4

output = self.am_branches[i](x.cuda(i), labels[i])

File "/home/derron/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/derron/arcface-pytorch/head/metrics_parallel.py", line 102, in forward
output[index] = phi[index]
RuntimeError: Could not run 'aten::nonzero' with arguments from the 'SparseCUDA' backend. 'aten::nonzero' is only available for these backends: [CPU, CUDA, Autograd, Profiler, Tracer].

Error in `/opt/conda/bin/python': double free or corruption (fasttop): 0x00007f0018011960

Have you ever meet such problems when you run the training code? It happened after the training process goes for a few iterations

*** Error in `/opt/conda/bin/python': double free or corruption (fasttop): 0x00007f0018011960 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f026f6987e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f026f6a137a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f026f6a553c]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3cead6e)[0x7f01f8755d6e]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3ceae19)[0x7f01f8755e19]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(+0x3ceaf95)[0x7f01f8755f95]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(_ZN5torch8autograd6Engine17evaluate_functionERNS0_8NodeTaskE+0x1210)[0x7f01f874d6b0]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch.so(_ZN5torch8autograd6Engine11thread_mainEPNS0_9GraphTaskE+0x1c4)[0x7f01f874f564]
/opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so(_ZN5torch8autograd6python12PythonEngine11thread_initEi+0x2a)[0x7f026b2eebca]
/opt/conda/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so(+0xf14f)[0x7f026be2d14f]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f026f9f26ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f026f72841d]
======= Memory map: ========
200000000-200200000 rw-s 00000000 00:06 533                              /dev/nvidiactl
200200000-200400000 ---p 00000000 00:00 0
200400000-200404000 rw-s 00000000 00:06 533                              /dev/nvidiactl
200404000-200600000 ---p 00000000 00:00 0
200600000-200a00000 rw-s 00000000 00:06 533                              /dev/nvidiactl
200a00000-201600000 ---p 00000000 00:00 0
201600000-201800000 rw-s 00000000 00:06 533                              /dev/nvidiactl
201800000-201804000 rw-s 00000000 00:06 533                              /dev/nvidiactl
201804000-201a00000 ---p 00000000 00:00 0
201a00000-201e00000 rw-s 00000000 00:06 533                              /dev/nvidiactl
201e00000-201e04000 rw-s 00000000 00:06 533                              /dev/nvidiactl
201e04000-202000000 ---p 00000000 00:00 0
202000000-202400000 rw-s 00000000 00:06 533                              /dev/nvidiactl
202400000-202404000 rw-s 00000000 00:06 533                              /dev/nvidiactl
202404000-202600000 ---p 00000000 00:00 0
202600000-202a00000 rw-s 00000000 00:06 533                              /dev/nvidiactl
202a00000-202a04000 rw-s 00000000 00:06 533                              /dev/nvidiactl
202a04000-202c00000 ---p 00000000 00:00 0
202c00000-203000000 rw-s 00000000 00:06 533                              /dev/nvidiactl
203000000-203004000 rw-s 00000000 00:06 533                              /dev/nvidiactl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.