Describe the bug We are in the process of fine-tuni

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Slow training on Mixtral-8x22B when DP size > 1 about nemo HOT 4 OPEN

sunilitggu commented on July 17, 2024

Slow training on Mixtral-8x22B when DP size > 1

from nemo.

Comments (4)

akoumpa commented on July 17, 2024

Hi, thanks for reporting this,

Can you retry without EP and report back whether this improves the speed? In addition, I would encourage trying different TP/PP configurations to determine the optimal.

Thank you.

from nemo.

sunilitggu commented on July 17, 2024

Hi, thanks for reporting this,

Can you retry without EP and report back whether this improves the speed? In addition, I would encourage trying different TP/PP configurations to determine the optimal.

Thank you.

Thank you for your response. We have already attempted the process without EP. However, it proved to be slower compared to when EP was utilized. Below are the average times recorded without EP:

#nodes=4, DP=1, GBPT = 2 sec
#nodes=8, DP=2, GBPT = 12 sec
#nodes=16, DP=4, GBPT = 34 sec
We have also experimented with different combinations for TP and PP, such as 8x4, 4x8 and 8x8. In terms of speed, all configurations performed worse than the one reported in the issue.

from nemo.

github-actions commented on July 17, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

from nemo.

akoumpa commented on July 17, 2024

@sunilitggu can you try with top of tree NeMo (git clone) and set your optimizer to mcore_distributed_optim (via model.optim.name='mcore_distributed_optim') ?

from nemo.

Slow training on Mixtral-8x22B when DP size > 1 about nemo HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs