The main selling point of CNTK (compared to other deep learning packages) is that it s

How to train a model using multiple machines? about cntk HOT 16 CLOSED

microsoft commented on May 7, 2024

How to train a model using multiple machines?

from cntk.

Comments (16)

such87 commented on May 7, 2024

Hi,
Can you try like this ?

mpiexec -np 2 cntk configFile=../Config/01_OneHidden.config parallelTrain=true deviceId=0

from cntk.

amitaga commented on May 7, 2024

Parallel training in CNTK needs to be launched using MPI. Refer to the example "Examples/Other/Simple2d/Config/Multigpu.config" that illustrates the CNTK config options needed for parallel training.

For e.g. parallel training using 2 workers on the same machine:

cd /Examples/Other/Simple2d/Data
mpiexec -np 2 configFile=../Config/Multigpu.config

To run across multiple machines, a MPI hosts file needs to be passed to the mpiexec command to specify the hosts where the CNTK parallel training workers will be launched. Please refer the documentation of the MPI implementation you are using, for details regarding launching a MPI job spanning multiple machines.

from cntk.

yqwangustc commented on May 7, 2024

Just want to add some additional comments on top of Amit's answer. To use multiple GPUs in training, it is better to set deviceId=auto, otherwise e.g., if we set deviceId=0, two individual mpi workers will compete for the 0-th GPU if they are launched in the same machine.

We may need to reset deviceId to auto, once we detected paralleTrain=true.

from cntk.

yefeng-zheng commented on May 7, 2024

Thanks for all the answers. I have tested it on my computer. I have only one GPU. I found out that if I can with "mpiexec -np 2," one process takes the GPU and the other process runs on CPU (using all available cores on that CPU). This is very smart. Next week, I will test on our compute cloud. Hope everything will go smoothly.

from cntk.

rahulbhalerao001 commented on May 7, 2024

Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?

from cntk.

frankseide commented on May 7, 2024

CPU and GPU are equivalent, with very few image-related exceptions where we rely on cuDNN and lack CPU implementations.

CPU code already leverages multiple CPUs, so you may need to fiddle a little how many CPU threads vs MPI nodes you want to use. E.g. start with one MPI process per server, and then compare with using 2 while limiting cpuThreads to half the number of cores.

Let us know if you run into problems (we normally do not run parallelized across CPU-only machines).

Sent from Outlookhttp://aka.ms/Ox5hz3

On Sat, Jan 30, 2016 at 6:50 PM -0800, "Rahul Bhalerao" <[email protected]mailto:[email protected]> wrote:

Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?

Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177365142.

from cntk.

rahulbhalerao001 commented on May 7, 2024

Thank you for the prompt response. Could you please let me know if any of the provided examples can be run this way on multiple CPU only machine setting. I am new to MPI, so starting pointers will be very helpful.

from cntk.

frankseide commented on May 7, 2024

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it write-enabled by users. It is on our TODO list to find a more universal solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto, will create lock file in /var/lock/

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177872518.

from cntk.

Sandy4321 commented on May 7, 2024

pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?

On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide [email protected]
wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/

—
Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.

—
Reply to this email directly or view it on GitHub
#59 (comment).

from cntk.

dongyu888 commented on May 7, 2024

the BLAS libraries will automatically used all CPU cores you have on your computer. If you run it on a single box you can run cntk directly to exploit them, without using MPI.

from cntk.

frankseide commented on May 7, 2024

Yes you can. First of all, by default, it will already use all cores on your machine through OpenMP. If you do nothing, you should see a CPU utilization >> 1 core. If not, please let us know and try setting a global parameter numCPUThreads= number of cores in your system.

However, this may or may not be optimal, depending on your specific HW configuration, model dimensions, and the BLAS library (which would be ACML unless you explicitly switched to MKL). The two options you have are:

· single process, using OpenMP to parallelize matrix operations using multiple threads. You can set a parameter numCPUThreads to select how many CPU cores OpenMP may use. The default is all cores (although in some cases we artificially cap this for some operations where we found it is actually slower).

· multi-process data parallelism (1-bit or model averaging). If you choose this on a single machine, you probably need to set numCPUThreads to limit the #cores that each process can use. E.g. if you have 12 cores and use 3-way data parallelism, you probably need to set numCPUThreads=4.

I cannot predict which will work better. We have seen that some BLAS libraries perform worse once you span a NUMA “socket.” E.g. if you have 3 CPU chips with 4 cores each, it may or may not be better to run 3-way data parallelism with 4-core OpenMP parallelism, compared to 12-core OpenMP parallelism. I would just try different combinations.

From: Sandy4321 [mailto:[email protected]]
Sent: Monday, February 1, 2016 9:27
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?

On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide <[email protected]mailto:[email protected]>
wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK <[email protected]mailto:[email protected]>
Cc: Frank Seide <[email protected]mailto:[email protected]>
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/

—
Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.

—
Reply to this email directly or view it on GitHub
#59 (comment).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178081549.

from cntk.

yefeng-zheng commented on May 7, 2024

I tested on a computer (Windows Server 2012 R2) with 4 Titan X GPUs. Unfortunately, I didn't see any speed up on training time. When I ran with mpiexec -np 4, I could conformed that all 4 GPUs were used with 20-30% of usage. If I ran with a single GPU, I also confirmed that only one GPU was used, but the usage was higher (40-50%). However, in the end, the training with 4 GPUs was actually slower than a single GPU.

I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!

from cntk.

frankseide commented on May 7, 2024

The MB size is too small. We are working on updating the documentation and the sample.

Thanks!

From: yefeng-zheng [mailto:[email protected]]
Sent: Monday, February 1, 2016 14:37
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178233969.

from cntk.

hanpum commented on May 7, 2024

yeah, in my computer, there is no write permission on /var/lock, which
is soft link to /run/lock, I have changed the lock directory used
by CrossProcessMutex to current directory, until now, everything seems OK

On 02/02/2016 01:23 AM, Frank Seide wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more
universal solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/59#issuecomment-177872518.

—
Reply to this email directly or view it on GitHub
#59 (comment).

Life is too short to do what I have to do, it's just bearly long enough
to do what I wanna to do

from cntk.

frankseide commented on May 7, 2024

Added Issue #62 on /var/lock and #73 on better documentation/samples for multi-GPU training. I will close this one instead.

from cntk.

amyrebullar commented on May 7, 2024

[email protected]
On Feb 2, 2016 7:43 AM, "Frank Seide" [email protected] wrote:

Added Issue #62 #62 on
/var/lock and #73 #73 on better
documentation/samples for multi-GPU training. I will close this one instead.

—
Reply to this email directly or view it on GitHub
#59 (comment).

from cntk.

How to train a model using multiple machines? about cntk HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs