System Info Kind of irrelevant, but: <div class="snippet-clipb

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

DPT implementation contains unused parameters about transformers HOT 4 OPEN

ducha-aiki commented on June 26, 2024

DPT implementation contains unused parameters

from transformers.

Comments (4)

ducha-aiki commented on June 26, 2024 1

@qubvel good point about the backbone. Probably because I have trained with a frozen backbone, which is kind of common.
And about the backbone removing unused params there would probably required too much changes.
I will do a PR then, thanks.

from transformers.

qubvel commented on June 26, 2024

Hi @ducha-aiki, thanks for reporting!

You are right, it looks like we can safely delete layers[0].residual_layer1 from DPTFeatureFusionStage because its never used.

Would you mind sharing why this prevents DDP training?

from transformers.

ducha-aiki commented on June 26, 2024

@qubvel I believe I shared this in:

Parameters which did not receive grad for rank 3: neck.fusion_stage.layers.0.residual_layer1.convolution2.bias, neck.fusion_stage.layers.0.residual_layer1.convolution2.weight, neck.fusion_stage.layers.0.residual_layer1.convolution1.bias, neck.fusion_stage.layers.0.residual_layer1.convolution1.weight

That is a quote from the error crash message I am getting, when running with accelerate for multi-GPU, when I specify in Trainer ddp_find_unused_parameters=False.

from transformers.

qubvel commented on June 26, 2024

Thank you, I missed it 🙂 I am trying to understand why backbone unused weights are not blocking, while neck's block. Did you try training with a fix?
Anyway if this solves the issue it is worth a PR.

from transformers.

DPT implementation contains unused parameters about transformers HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs