I tried to integrate mii into tritonserver, but encountered some problems Below is

I meet the similar case. Here is my code: <div class="snippet-clipboard-content no

I meet the similar case. Here is my code: <div class="snippet-clipboa

I meet the similar case. Here is my code: <div class="sn

Block when Call client inference in multiprocessing.Process about deepspeed-mii HOT 3 OPEN

zhaotyer commented on July 28, 2024 1

Block when Call client inference in multiprocessing.Process

from deepspeed-mii.

Comments (3)

nxznm commented on July 28, 2024

I meet the similar case. Here is my code:

def worker(rank, this_model):
    try:
        if this_model is None:
            client = mii.client('qwen')
        else:
            client = this_model
        response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True)
        print("in worker rank:", rank, " response:", response)
    except Exception as e:
        print(f"Capture error:{e}")
    finally:
        print("final")

model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num)

job_process = []
for rank in range(0, replica_num):
    if rank == 0:
        job_process.append(threading.Thread(target=worker,args=(rank,model,)))
    else:
        job_process.append(threading.Thread(target=worker,args=(rank,None,)))
for process in job_process:
    process.start()
for process in job_process:
    process.join()

When using threading.Thread, it works well. However, it will be blocked in client.generate if using multiprocessing.Process.

from deepspeed-mii.

nxznm commented on July 28, 2024

I meet the similar case. Here is my code:

def worker(rank, this_model):
    try:
        if this_model is None:
            client = mii.client('qwen')
        else:
            client = this_model
        response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True)
        print("in worker rank:", rank, " response:", response)
    except Exception as e:
        print(f"Capture error:{e}")
    finally:
        print("final")

model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num)

job_process = []
for rank in range(0, replica_num):
    if rank == 0:
        job_process.append(threading.Thread(target=worker,args=(rank,model,)))
    else:
        job_process.append(threading.Thread(target=worker,args=(rank,None,)))
for process in job_process:
    process.start()
for process in job_process:
    process.join()

When using threading.Thread, it works well. However, it will be blocked in client.generate if using multiprocessing.Process.

Since the threading.Thread is fake in python due to GIL, this code can not make full use of concurrency. It means that I still need multiprocessing.Process to start a new client. However, it does not work well mentioned above.

from deepspeed-mii.

nxznm commented on July 28, 2024

I meet the similar case. Here is my code:
def worker(rank, this_model):
    try:
        if this_model is None:
            client = mii.client('qwen')
        else:
            client = this_model
        response = client.generate(["xxx"], max_new_tokens=1024, stop="<|im_end|>", do_sample=False, return_full_text=True)
        print("in worker rank:", rank, " response:", response)
    except Exception as e:
        print(f"Capture error:{e}")
    finally:
        print("final")

model = mii.serve(model_dir, deployment_name="qwen", tensor_parallel=xx, replica_num=replica_num)

job_process = []
for rank in range(0, replica_num):
    if rank == 0:
        job_process.append(threading.Thread(target=worker,args=(rank,model,)))
    else:
        job_process.append(threading.Thread(target=worker,args=(rank,None,)))
for process in job_process:
    process.start()
for process in job_process:
    process.join()
When using threading.Thread, it works well. However, it will be blocked in client.generate if using multiprocessing.Process.
Since the threading.Thread is fake in python due to GIL, this code can not make full use of concurrency. It means that I still need multiprocessing.Process to start a new client. However, it does not work well mentioned above.

I find the official example. Maybe we should start the server and clients like these ways.

from deepspeed-mii.

Block when Call client inference in multiprocessing.Process about deepspeed-mii HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs