hello: May I ask why not using workers' CPUs as PS? <p dir="auto

Why not using worker'CPU as PS? about byteps HOT 5 CLOSED

bytedance commented on July 20, 2024

Why not using worker'CPU as PS?

from byteps.

Comments (5)

ymjiang commented on July 20, 2024

The key idea is that co-locating servers and workers causes bandwidth contention. If you already read this, let me give you an example.

Imagine you have 4 physical machines, and put one worker and one server on each physical machine (e.g., worker0 and server0 on same machine0). Say the model size is 1. In each iteration, worker0 needs to sends out 1/4 data to server1, server2 and server3, respectively (in total 3/4); meanwhile, server0 needs to sends out 1/4 data to worker1, worker2 and worker3, respectively (in total 3/4). Therefore, machine0 needs to send out 3/2 data in one iteration. You will find this value is equivalent to allreduce case.

(However, if you put servers on different machines, you will see the value is 1)

In short, if you co-locate workers & servers, the communication overhead is the same as allreduce.

from byteps.

Shuming19 commented on July 20, 2024

In the same example, if a remote server is used, a remote server(totally 4 servers and 4 workers) needs to receive M/4 data from each worker and send M/4 data to each worker. The total data worker 0 transfers is 2M, which is greater than 3M/2, isn't it? That is to say, the server's bandwidth bottleneck is more serious.

Expect your reply. Thanks.

from byteps.

ymjiang commented on July 20, 2024

I am talking about one direction. Sending and receiving are duplex (two different directions) and do not affect each other in terms of bandwidth.

You can imagine there are two channels between two endpoints: one is for sending traffic from A to B, and the other is for B to A. They do not affect each other at all.

from byteps.

Shuming19 commented on July 20, 2024

Thank you. That makes sense.

from byteps.

ymjiang commented on July 20, 2024

Closing this.

from byteps.

Recommend Projects

Why not using worker'CPU as PS? about byteps HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs