llm_local's Introduction

llm_local

run llm model on local system

guide llamafile

Download llava-v1.5-7b-q4.llamafile (4.29 GB)
Grant permission for your computer to execute this new file (If you're on Windows, rename the file by adding ".exe" on the end):
```
 chmod +x llava-v1.5-7b-q4.llamafile
```

Run the llamafile (On your browser http://localhost:8080/)

 ./llava-v1.5-7b-q4.llamafile

 ./llava-v1.5-7b-q4.llamafile --server --nobrowser (server mode)

 CUDA_VISIBLE_DEVICES=0 ./Meta-Llama-3-8B-Instruct.Q4_1.llamafile --gpu nvidia (for specific GPU)

Kill process sudo kill <PID>

json API Quickstart

Curl API Client Example:

  curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer no-key" \
  -d '{
  "model": "LLaMA_CPP",
  "messages": [
  {
          "role": "system",
          "content": "You are LLAMAfile, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."
  },
  {
          "role": "user",
          "content": "Write a limerick about python exceptions"
  }
  ]
  }' | python3 -c '
  import json
  import sys
  json.dump(json.load(sys.stdin), sys.stdout, indent=2)
  print()
  '

Python API Client example:

  #!/usr/bin/env python3
  from openai import OpenAI
  client = OpenAI(
  base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
  api_key = "sk-no-key-required"
  )
  completion = client.chat.completions.create(
  model="LLaMA_CPP",
  messages=[
          {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
          {"role": "user", "content": "Write a limerick about python exceptions"}
  ]
  )
  print(completion.choices[0].message)