Thank you for your sharing!!! when I run with single GPU，it runs well, but when I run

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

I'm seeing a similar issue when running with nn.DataParallel: <div class="snippet-

How to use it with Multi GPU about efficientunet-pytorch HOT 12 OPEN

zhoudaxia233 commented on May 29, 2024

How to use it with Multi GPU

from efficientunet-pytorch.

Comments (12)

zhoudaxia233 commented on May 29, 2024

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

from efficientunet-pytorch.

Hesene commented on May 29, 2024

@Hesene Hello Hesene, in my lab I only have one single 2080Ti, therefore I cannot replicate this issue. I'm sorry about it!

Ok, thank you for your code, it help me a lot

from efficientunet-pytorch.

AtsunoriFujita commented on May 29, 2024

I face the same problem.
Which part is the cause?

from efficientunet-pytorch.

goodgoodstudy92 commented on May 29, 2024

did you use torch.nn.DataParallel()?

from efficientunet-pytorch.

zhoudaxia233 commented on May 29, 2024

did you use torch.nn.DataParallel()?

no I didn't, but I think it may work

from efficientunet-pytorch.

zhoudaxia233 commented on May 29, 2024

I face the same problem.
Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

from efficientunet-pytorch.

goodgoodstudy92 commented on May 29, 2024

I face the same problem.
Which part is the cause?

I'm not sure, but I think you can try to integrate nn.DataParallel() into the source code

I use efficientnet as backbone to trian a object detection model, and the nn.DataParallel() works fine, the only issue is the speed of multi gpu is quit slow

from efficientunet-pytorch.

ryanstout commented on May 29, 2024

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

from efficientunet-pytorch.

Vipermdl commented on May 29, 2024

I'm seeing a similar issue when running with nn.DataParallel:

RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ryanstout/.local/share/virtualenvs/arsenal_train2-TlJZ47AR/lib/python3.7/site-packages/efficientunet/efficientunet.py", line 106, in forward
    x = torch.cat([x, blocks.popitem()[1]], dim=1)
RuntimeError: All input tensors must be on the same device. Received cuda:0 and cuda:1

Any ideas?

Thanks!

Hi, bro.
Are you solved the problem?

from efficientunet-pytorch.

If-only1 commented on May 29, 2024

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

from efficientunet-pytorch.

TianyiFranklinWang commented on May 29, 2024

I suspect that this problem is due to the sharing of a certain module in Efficientunet, which results in this module being only on one GPU, perhaps the encoder……

I agree, I'm now facing the same problem.

from efficientunet-pytorch.

zhoudaxia233 commented on May 29, 2024

@NPU-Franklin Franklin created a PR (#11 ) to support multi GPUs. I do not have multi cards therefore I cannot test it. But maybe you can give it a try.

from efficientunet-pytorch.

How to use it with Multi GPU about efficientunet-pytorch HOT 12 OPEN

Comments (12)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs