The goal of this project is to run an LLM locally and expose the commands via API and streamlit app. The API and streamlit app can be ran via docker-compose services.
- only the
llama_cpp.Llama.create_completion()
function is currently exposed via api/app. - This video was helpful in getting started with
llama_cpp
and thevicuna
model: https://www.youtube.com/watch?v=-BidzsQYZM4
- create/start the docker services via command-line
make docker_run
- This may take some time, the following models are downloaded into the
/llms/models
directory of the container:ggml-vic13b-q5_1.bin
ggml-alpaca-7b-q4.bin
- This may take some time, the following models are downloaded into the
- (optional) you can run unit tests (note: these take around 15 seconds to run because I load both models which is slow) either with the
make tests
command by either:- attaching VS code to one of the containers and running the command inside the VS Code terminal
- attaching the command line directly to the container via the command
make zsh
- once services are started, in a separate terminal
- run
make streamlit
to open up the streamlit app in your default browser - run
make api_docs
to open of the docs for the FastAPI app
- run
To test out the API via command-line, run:
curl -X POST -H "Authorization: Bearer token123" -H "Content-Type: application/json" -d '{"prompt": "Q: What is the capital of France? A: "}' http://localhost:8080/completions