Comments (16)
Hi,
Can you try like this ?
mpiexec -np 2 cntk configFile=../Config/01_OneHidden.config parallelTrain=true deviceId=0
from cntk.
Parallel training in CNTK needs to be launched using MPI. Refer to the example "Examples/Other/Simple2d/Config/Multigpu.config" that illustrates the CNTK config options needed for parallel training.
For e.g. parallel training using 2 workers on the same machine:
cd /Examples/Other/Simple2d/Data
mpiexec -np 2 configFile=../Config/Multigpu.config
To run across multiple machines, a MPI hosts file needs to be passed to the mpiexec command to specify the hosts where the CNTK parallel training workers will be launched. Please refer the documentation of the MPI implementation you are using, for details regarding launching a MPI job spanning multiple machines.
from cntk.
Just want to add some additional comments on top of Amit's answer. To use multiple GPUs in training, it is better to set deviceId=auto, otherwise e.g., if we set deviceId=0, two individual mpi workers will compete for the 0-th GPU if they are launched in the same machine.
We may need to reset deviceId to auto, once we detected paralleTrain=true.
from cntk.
Thanks for all the answers. I have tested it on my computer. I have only one GPU. I found out that if I can with "mpiexec -np 2," one process takes the GPU and the other process runs on CPU (using all available cores on that CPU). This is very smart. Next week, I will test on our compute cloud. Hope everything will go smoothly.
from cntk.
Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?
from cntk.
CPU and GPU are equivalent, with very few image-related exceptions where we rely on cuDNN and lack CPU implementations.
CPU code already leverages multiple CPUs, so you may need to fiddle a little how many CPU threads vs MPI nodes you want to use. E.g. start with one MPI process per server, and then compare with using 2 while limiting cpuThreads to half the number of cores.
Let us know if you run into problems (we normally do not run parallelized across CPU-only machines).
Sent from Outlookhttp://aka.ms/Ox5hz3
On Sat, Jan 30, 2016 at 6:50 PM -0800, "Rahul Bhalerao" <[email protected]mailto:[email protected]> wrote:
Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177365142.
from cntk.
Thank you for the prompt response. Could you please let me know if any of the provided examples can be run this way on multiple CPU only machine setting. I am new to MPI, so starting pointers will be very helpful.
from cntk.
Yes, it will. Is that causing a problem for you?
We are seeing that some environments do not have this, or not have it write-enabled by users. It is on our TODO list to find a more universal solution.
From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)
hi, all, do you have problem while set deviceId=auto??
I'm running cntk in linux, according to code, while set deviceId=auto, will create lock file in /var/lock/
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177872518.
from cntk.
pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?
On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide [email protected]
wrote:
Yes, it will. Is that causing a problem for you?
We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)hi, all, do you have problem while set deviceId=auto??
I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/—
Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.—
Reply to this email directly or view it on GitHub
#59 (comment).
from cntk.
the BLAS libraries will automatically used all CPU cores you have on your computer. If you run it on a single box you can run cntk directly to exploit them, without using MPI.
from cntk.
Yes you can. First of all, by default, it will already use all cores on your machine through OpenMP. If you do nothing, you should see a CPU utilization >> 1 core. If not, please let us know and try setting a global parameter numCPUThreads= number of cores in your system.
However, this may or may not be optimal, depending on your specific HW configuration, model dimensions, and the BLAS library (which would be ACML unless you explicitly switched to MKL). The two options you have are:
· single process, using OpenMP to parallelize matrix operations using multiple threads. You can set a parameter numCPUThreads to select how many CPU cores OpenMP may use. The default is all cores (although in some cases we artificially cap this for some operations where we found it is actually slower).
· multi-process data parallelism (1-bit or model averaging). If you choose this on a single machine, you probably need to set numCPUThreads to limit the #cores that each process can use. E.g. if you have 12 cores and use 3-way data parallelism, you probably need to set numCPUThreads=4.
I cannot predict which will work better. We have seen that some BLAS libraries perform worse once you span a NUMA “socket.” E.g. if you have 3 CPU chips with 4 cores each, it may or may not be better to run 3-way data parallelism with 4-core OpenMP parallelism, compared to 12-core OpenMP parallelism. I would just try different combinations.
From: Sandy4321 [mailto:[email protected]]
Sent: Monday, February 1, 2016 9:27
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)
pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?
On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide <[email protected]mailto:[email protected]>
wrote:
Yes, it will. Is that causing a problem for you?
We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK <[email protected]mailto:[email protected]>
Cc: Frank Seide <[email protected]mailto:[email protected]>
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)hi, all, do you have problem while set deviceId=auto??
I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/—
Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.—
Reply to this email directly or view it on GitHub
#59 (comment).
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178081549.
from cntk.
I tested on a computer (Windows Server 2012 R2) with 4 Titan X GPUs. Unfortunately, I didn't see any speed up on training time. When I ran with mpiexec -np 4, I could conformed that all 4 GPUs were used with 20-30% of usage. If I ran with a single GPU, I also confirmed that only one GPU was used, but the usage was higher (40-50%). However, in the end, the training with 4 GPUs was actually slower than a single GPU.
I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!
from cntk.
The MB size is too small. We are working on updating the documentation and the sample.
Thanks!
From: yefeng-zheng [mailto:[email protected]]
Sent: Monday, February 1, 2016 14:37
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)
I tested on a computer (Windows Server 2012 R2) with 4 Titan X GPUs. Unfortunately, I didn't see any speed up on training time. When I ran with mpiexec -np 4, I could conformed that all 4 GPUs were used with 20-30% of usage. If I ran with a single GPU, I also confirmed that only one GPU was used, but the usage was higher (40-50%). However, in the end, the training with 4 GPUs was actually slower than a single GPU.
I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178233969.
from cntk.
yeah, in my computer, there is no write permission on /var/lock, which
is soft link to /run/lock, I have changed the lock directory used
by CrossProcessMutex to current directory, until now, everything seems OK
On 02/02/2016 01:23 AM, Frank Seide wrote:
Yes, it will. Is that causing a problem for you?
We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more
universal solution.From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)hi, all, do you have problem while set deviceId=auto??
I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/59#issuecomment-177872518.—
Reply to this email directly or view it on GitHub
#59 (comment).
Life is too short to do what I have to do, it's just bearly long enough
to do what I wanna to do
from cntk.
Added Issue #62 on /var/lock and #73 on better documentation/samples for multi-GPU training. I will close this one instead.
from cntk.
[email protected]
On Feb 2, 2016 7:43 AM, "Frank Seide" [email protected] wrote:
Added Issue #62 #62 on
/var/lock and #73 #73 on better
documentation/samples for multi-GPU training. I will close this one instead.—
Reply to this email directly or view it on GitHub
#59 (comment).
from cntk.
Related Issues (20)
- Value goes invalid when using TestMinibatch
- Request for a no-opencv dotnet release
- CNTK C# Crash when layer is deeper
- Convolution 1D CNTK C++ HOT 1
- Add support to release linux aarch64 wheels
- Error while deploying MS Teams Bot with SSO
- how to install in python3.8
- A model causes CNTK crash with cudnnSetPoolingNDDescriptor when invoking the pooling operator
- This repo is missing important files HOT 1
- API document issue due to syntax in source code HOT 1
- Error: could not find all specified 'to_nodes' in clone. Looking for ['relu5'], found [None]
- SequenceClassification.py does not work out of box HOT 1
- Microsoft has absolutely no potential
- development of a neural network for object search
- join vcpkg
- cntk crash and pycharm process finished with exit code -1066598274 (0xC06D007E)
- ConvolutionTranspose2D outputs normally when num_filters is tuple and dilation is 0
- program crash when get gradient of `ConvolutionTranspose2D`
- when shape contains negative integer, input_variable should throw an error HOT 1
- MAX_POOLING crash, when pooling_window_shape contains 0 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cntk.