GithubHelp home page GithubHelp logo

Comments (16)

such87 avatar such87 commented on May 7, 2024

Hi,
Can you try like this ?

mpiexec -np 2 cntk configFile=../Config/01_OneHidden.config parallelTrain=true deviceId=0

from cntk.

amitaga avatar amitaga commented on May 7, 2024

Parallel training in CNTK needs to be launched using MPI. Refer to the example "Examples/Other/Simple2d/Config/Multigpu.config" that illustrates the CNTK config options needed for parallel training.

For e.g. parallel training using 2 workers on the same machine:

cd /Examples/Other/Simple2d/Data
mpiexec -np 2 configFile=../Config/Multigpu.config

To run across multiple machines, a MPI hosts file needs to be passed to the mpiexec command to specify the hosts where the CNTK parallel training workers will be launched. Please refer the documentation of the MPI implementation you are using, for details regarding launching a MPI job spanning multiple machines.

from cntk.

yqwangustc avatar yqwangustc commented on May 7, 2024

Just want to add some additional comments on top of Amit's answer. To use multiple GPUs in training, it is better to set deviceId=auto, otherwise e.g., if we set deviceId=0, two individual mpi workers will compete for the 0-th GPU if they are launched in the same machine.

We may need to reset deviceId to auto, once we detected paralleTrain=true.

from cntk.

yefeng-zheng avatar yefeng-zheng commented on May 7, 2024

Thanks for all the answers. I have tested it on my computer. I have only one GPU. I found out that if I can with "mpiexec -np 2," one process takes the GPU and the other process runs on CPU (using all available cores on that CPU). This is very smart. Next week, I will test on our compute cloud. Hope everything will go smoothly.

from cntk.

rahulbhalerao001 avatar rahulbhalerao001 commented on May 7, 2024

Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?

from cntk.

frankseide avatar frankseide commented on May 7, 2024

CPU and GPU are equivalent, with very few image-related exceptions where we rely on cuDNN and lack CPU implementations.

CPU code already leverages multiple CPUs, so you may need to fiddle a little how many CPU threads vs MPI nodes you want to use. E.g. start with one MPI process per server, and then compare with using 2 while limiting cpuThreads to half the number of cores.

Let us know if you run into problems (we normally do not run parallelized across CPU-only machines).

Sent from Outlookhttp://aka.ms/Ox5hz3

On Sat, Jan 30, 2016 at 6:50 PM -0800, "Rahul Bhalerao" <[email protected]mailto:[email protected]> wrote:

Can a model be trained on multiple CPU-only machines? Or is the case that for multi machine examples, GPUs are required on all the machines?

Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177365142.

from cntk.

rahulbhalerao001 avatar rahulbhalerao001 commented on May 7, 2024

Thank you for the prompt response. Could you please let me know if any of the provided examples can be run this way on multiple CPU only machine setting. I am new to MPI, so starting pointers will be very helpful.

from cntk.

frankseide avatar frankseide commented on May 7, 2024

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it write-enabled by users. It is on our TODO list to find a more universal solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto, will create lock file in /var/lock/


Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-177872518.

from cntk.

Sandy4321 avatar Sandy4321 commented on May 7, 2024

pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?

On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide [email protected]
wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/


Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.


Reply to this email directly or view it on GitHub
#59 (comment).

from cntk.

dongyu888 avatar dongyu888 commented on May 7, 2024

the BLAS libraries will automatically used all CPU cores you have on your computer. If you run it on a single box you can run cntk directly to exploit them, without using MPI.

from cntk.

frankseide avatar frankseide commented on May 7, 2024

Yes you can. First of all, by default, it will already use all cores on your machine through OpenMP. If you do nothing, you should see a CPU utilization >> 1 core. If not, please let us know and try setting a global parameter numCPUThreads= number of cores in your system.

However, this may or may not be optimal, depending on your specific HW configuration, model dimensions, and the BLAS library (which would be ACML unless you explicitly switched to MKL). The two options you have are:

· single process, using OpenMP to parallelize matrix operations using multiple threads. You can set a parameter numCPUThreads to select how many CPU cores OpenMP may use. The default is all cores (although in some cases we artificially cap this for some operations where we found it is actually slower).

· multi-process data parallelism (1-bit or model averaging). If you choose this on a single machine, you probably need to set numCPUThreads to limit the #cores that each process can use. E.g. if you have 12 cores and use 3-way data parallelism, you probably need to set numCPUThreads=4.

I cannot predict which will work better. We have seen that some BLAS libraries perform worse once you span a NUMA “socket.” E.g. if you have 3 CPU chips with 4 cores each, it may or may not be better to run 3-way data parallelism with 4-core OpenMP parallelism, compared to 12-core OpenMP parallelism. I would just try different combinations.

From: Sandy4321 [mailto:[email protected]]
Sent: Monday, February 1, 2016 9:27
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

pls help understand how may I use it for several CPU use? for example I
have PC with 4 CPU, can I train on all 4 CPUs?

On Mon, Feb 1, 2016 at 12:23 PM, Frank Seide <[email protected]mailto:[email protected]>
wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more universal
solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK <[email protected]mailto:[email protected]>
Cc: Frank Seide <[email protected]mailto:[email protected]>
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/


Reply to this email directly or view it on GitHub<
https://github.com/Microsoft/CNTK/issues/59#issuecomment-177872518>.


Reply to this email directly or view it on GitHub
#59 (comment).


Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178081549.

from cntk.

yefeng-zheng avatar yefeng-zheng commented on May 7, 2024

I tested on a computer (Windows Server 2012 R2) with 4 Titan X GPUs. Unfortunately, I didn't see any speed up on training time. When I ran with mpiexec -np 4, I could conformed that all 4 GPUs were used with 20-30% of usage. If I ran with a single GPU, I also confirmed that only one GPU was used, but the usage was higher (40-50%). However, in the end, the training with 4 GPUs was actually slower than a single GPU.

I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!

from cntk.

frankseide avatar frankseide commented on May 7, 2024

The MB size is too small. We are working on updating the documentation and the sample.

Thanks!

From: yefeng-zheng [mailto:[email protected]]
Sent: Monday, February 1, 2016 14:37
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

I tested on a computer (Windows Server 2012 R2) with 4 Titan X GPUs. Unfortunately, I didn't see any speed up on training time. When I ran with mpiexec -np 4, I could conformed that all 4 GPUs were used with 20-30% of usage. If I ran with a single GPU, I also confirmed that only one GPU was used, but the usage was higher (40-50%). However, in the end, the training with 4 GPUs was actually slower than a single GPU.

I evaluated on simple2d and MNIST and they may not be good examples. Do you have any example to show the benefit of training with multiple GPUs? Thank you very much!


Reply to this email directly or view it on GitHubhttps://github.com//issues/59#issuecomment-178233969.

from cntk.

hanpum avatar hanpum commented on May 7, 2024

yeah, in my computer, there is no write permission on /var/lock, which
is soft link to /run/lock, I have changed the lock directory used
by CrossProcessMutex to current directory, until now, everything seems OK

On 02/02/2016 01:23 AM, Frank Seide wrote:

Yes, it will. Is that causing a problem for you?

We are seeing that some environments do not have this, or not have it
write-enabled by users. It is on our TODO list to find a more
universal solution.

From: weixing.mei [mailto:[email protected]]
Sent: Monday, February 1, 2016 1:24
To: Microsoft/CNTK [email protected]
Cc: Frank Seide [email protected]
Subject: Re: [CNTK] How to train a model using multiple machines? (#59)

hi, all, do you have problem while set deviceId=auto??

I'm running cntk in linux, according to code, while set deviceId=auto,
will create lock file in /var/lock/


Reply to this email directly or view it on
GitHubhttps://github.com//issues/59#issuecomment-177872518.


Reply to this email directly or view it on GitHub
#59 (comment).

Life is too short to do what I have to do, it's just bearly long enough
to do what I wanna to do

from cntk.

frankseide avatar frankseide commented on May 7, 2024

Added Issue #62 on /var/lock and #73 on better documentation/samples for multi-GPU training. I will close this one instead.

from cntk.

amyrebullar avatar amyrebullar commented on May 7, 2024

[email protected]
On Feb 2, 2016 7:43 AM, "Frank Seide" [email protected] wrote:

Added Issue #62 #62 on
/var/lock and #73 #73 on better
documentation/samples for multi-GPU training. I will close this one instead.


Reply to this email directly or view it on GitHub
#59 (comment).

from cntk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.