Comments (5)
Hi @tongye98
Thank you for asking!
The short answer is no, simply because I cannot make it work. I tried to integrate DistributedDataParallel
, but for some reason I keep getting errors in my Single-Node Multi-GPU environment. Moreover, it's difficult to test multi-node distributed learning on my gpu cluster.
I totally agree, it's really nice to have. We are waiting for a contributor who can work on it. Your help is needed!!
from joeynmt.
Reminder for myself:
- https://pytorch.org/docs/stable/notes/ddp.html
- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
- https://pytorch.org/tutorials/intermediate/dist_tuto.html
from joeynmt.
Hi @may- , I used this tutorial(https://www.youtube.com/watch?v=-K3bZYHYHEA&list=PL_lsbAsL_o2CSuhUhJIiW0IkdT5C2wGWj) to integrate DDP and torch run in a Pytorch project and it worked well for me!
from joeynmt.
Hi @Darwin99-debug,
Thank you for your comment!
Actually, we have been working on this (DDP) for a while. You can see our progress in the ddp branch (#225)
Although it works, we are concerned that our current DDP implementation makes JoeyNMT less readable, less novice-friendly.
So we hesitate to integrate DDP to the main branch. Any comments and suggestions will be appreciated!
FYI @juliakreutzer
from joeynmt.
Related Issues (20)
- Better Transformer `TokenBatchSampler` HOT 4
- JoeyNMT v1 procedure is no more compatible with JoeyNMT v2 HOT 12
- better config validation
- "AutocastCPU only supports Bfloat16" error when following rnn_reverse tutorial HOT 5
- RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 1
- AttributeError: module 'packaging' has no attribute 'version' HOT 2
- Unit test FAIL: testSentencepieceTokenizer (test.unit.test_tokenizer.TestTokenizer) HOT 4
- trg_mask generate problem HOT 4
- Running build_vocab.py for wmt17_bpe with or without --joint? HOT 3
- RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) HOT 5
- run predict function in Colab, get ConfigurationError: Invalid `batch_type` option.
- (enhancement) Deploying trained models on HuggingFace Space HOT 2
- Basic iwslt config train failure due to directory errors HOT 1
- Early stopping criteria is only checked for the `ReduceLROnPlateau` scheduler HOT 5
- Link in Tutorial to Collab dead HOT 4
- Tutorial - Test Set Evaluation HOT 5
- Columns and DataType Not Explicitly Set on line 387 of datasets.py
- Unit Test Fails - Windows Installation HOT 4
- serving & ONNX compat ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from joeynmt.