Gener8-Llama2

Generate Kubernetes resource YAML manifests from a text prompt

Gener8-Llama2 is a simple Kubernetes resource YAML generator based on Meta's Llama-2 model

Architecture

Prerequisites

Please make you have Python 3.8.X or higher version

Requesting access to Llama Models

Request for accessing Llama models here

You will receive a mail with the URL to download the model which we will use later.

Setup Llama2 Model

Make sure you have all the repos downloaded: llama, and llama.cpp

First download the llama-2–7b-chat model from llama.

$ cd llama/
$ /bin/bash ./download.sh
  Enter the URL from email: https://download.llamameta.net/*?XXXXXXXXXXXXX
  Enter the list of models to download without spaces (7B,13B,70B,7B-chat,13B-chat,70B-chat), or press Enter for all: 7B-chat

Converting and Quantizing Downloaded Model

Now we have to convert the downloaded model to f16 format and quantize it to reduce its size.

Build llama.cpp project
```
$ cd llama.cpp
$ make
```

First activate a virtual env and install all the requirements

$ python3 -m venv llama2
$ source llama2/bin/activate
$ python3 -m pip install -r requirements.txt

Then convert the model into f16 format and quantize it

$ python3 convert.py --outfile models/7B-chat/ggml-model-f16.bin --outtype f16 ../../llama2/llama/llama-2-7b-chat --vocab-dir ../../llama2/llama
$ ./quantize  ./models/7B-chat/ggml-model-f16.bin ./models/7B-chat/ggml-model-q4_0.bin q4_0

Make sure you change the vocab_size in llama/llama-2-7b-chat/params.json to 32000

$ cat llama/llama-2-7b-chat/params.json
{"dim": 4096, "multiple_of": 256, "n_heads": 32, "n_layers": 32, "norm_eps": 1e-06, "vocab_size": 32000}

Build

Before proceeding further, please make sure you have setup the Llama2 model using the steps given in Prerequisites section

Run python server

$ python app.py
 * Serving Flask app 'app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000

Use Curl or Webapp to send query to server To query using webapp, open /PATH/TO/REPO/Gener8-Llama2/frontend/index.html in your browser and enter the description of the K8s resource you want to generate specs for

Contributing

We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:

Reporting a bug
Discussing the current state of the code
Submitting a fix
Proposing new features

prasadg193 / gener8-llama2 Goto Github PK

gener8-llama2's Introduction

Gener8-Llama2

Architecture

Prerequisites

Requesting access to Llama Models

Setup Llama2 Model

Converting and Quantizing Downloaded Model

Build

Contributing

gener8-llama2's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs