Comments (9)
addition:
after some tests with 2 GPUs, I got:
with stream sync : Ep. 151 Up. 4500 : Sen. 256 : Cost 74.35
without sync : Ep. 151 : Up. 4500 : Sen. 256 : Cost 106.45
from marian-dev.
This is some toymodel? I was training with normal models and did not see any difference from single gpu model. And wouldn't that synchronize automatically as copyFrom is called from stream 0 and the following updates as well? I'll check with real models.
from marian-dev.
There is a cudaStreamSynchronize(0) in copyFrom. So that can't be it. Also removing does not seem to change anything for me. Most probably because of the effect I described in the previous message.
from marian-dev.
because you use single GPU, (there is if-statement to handle single GPU). Try with more than one
yes, there is synchronize in copyFrom, but not in parameter update.
from marian-dev.
I was testing with 3 GPUs, but misunderstood which update you mean. OK, let's check again.
from marian-dev.
Try this
void pushGradients(Tensor newGrads) {
if(graphs_.size() < 2) {
opt_->update(graphs_[0]);
}
else {
std::lock_guard<std::mutex> guard(sync_);
// add instead of copy?
grads_->copyFrom(newGrads);
opt_->update(params_, grads_);
}
}
versus this
void pushGradients(Tensor newGrads) {
if(graphs_.size() < 2) {
opt_->update(graphs_[0]);
}
else {
std::lock_guard<std::mutex> guard(sync_);
// add instead of copy?
grads_->copyFrom(newGrads);
opt_->update(params_, grads_);
cudaStreamSynchronize(0);
}
}
Also make sure use more than 1 GPU because the if-statement above.
Edit: the code messed up, not sure how to fix them
mjd: three apostrophes
from marian-dev.
OK, I think you are right. It also seems that the speed of your implementation matches now my implementation without sharding. To fix this issue I will probably just pull your pull request.
from marian-dev.
have you check the non-sharding speed with synchronize? I believe that might slow it down a bit.
I still need to verify the correctness of my PR in terms of convergence.
from marian-dev.
I will post stats in a moment under the pull request
from marian-dev.
Related Issues (20)
- Compilation error on gcc 12: pointer used after ‘void operator delete(void*, std::size_t)’
- Doesn't compile on clang 16.0.6 due to issue in sentencepiece
- Doubt regarding scoring method,F0
- Cost nan
- Portable marian binary for the recent versions of ubuntu (20.04 and newer)
- Curand error 203 in wsl
- fp16 does not work on CPU HOT 2
- --force-decode does not work on CPU
- Cmake cannot find cuBLASLt
- marian embed --compute-similarity errors out HOT 2
- Multithread Translation HOT 1
- High RAM usage with factors+shuffle-in-ram: false
- Per-factor embedding dimensions when concatenating
- Setting optimizer-delay to 0 prevents makes the trainining process stall with no error
- [Feature Request] Decoder-only Marian models
- GCC 12 compilation warning: withCommas integer wraparound
- intrusive_ptr not threadsafe
- Training Optimization Question
- zstandard support in input files
- Training fails on Vertex AI (GCP) due to NCCL error on A100 GPUs HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from marian-dev.