Under development
Right now the total image size is around 7GB without including the model. The model selected will be downloaded on the first run of the script.
Follow the notebook tutorial to run your first model.
Links below only valid after starting the jupyter server.
docker pull caviri/sdsc-llm-playground:latest
docker pull caviri/sdsc-llm-playground:nonroot_user
docker run --rm -it --gpus all -p 8888:8888 -e JUPYTER_TOKEN=TEST caviri/sdsc-llm-playground:nonroot_user
Enter in the server in: http://127.0.0.1:8888/ and use the password defined
runai submit testllm4 -i caviri/sdsc-llm-playground:nonroot_user -e JUPYTER_TOKEN=TEST --service-type=portforward --port 8888:8888 --attach --interactive --node-type "A100" -g 0.2
docker build -t caviri/sdsc-llm-playground:latest .
- Open a new terminal in jupyter lab
- Install Fastchat with
pip install fschat
- Run a model:
python3 -m fastchat.serve.cli --model-path lmsys/fastchat-t5-3b-v1.0
(~10GB VRAM)python3 -m fastchat.serve.cli --model-path databricks/dolly-v2-7b
(~16GB VRAM)
This will open an interactive session with the model. --load-8bit
flag will reduce the size of the models in memory but is not working.