mlx-llm

LLM applications running on Apple Silicon in real-time thanks to Apple MLX framework.

How to install 🔨

git clone https://github.com/riccardomusmeci/mlx-llm
cd mlx-llm
pip install .

Models 🧠

Go check models for a summary of available models.

To create a model with weights:

from mlx_llm.model import create_model

# loading weights from HuggingFace
model = create_model("TinyLlama-1.1B-Chat-v0.6")

# loading weights from local file
model = create_model("TinyLlama-1.1B-Chat-v0.6", weights="path/to/weights.npz")

To list all available models:

from mlx_llm.model import list_models

print(list_models())

LLM Chat 📱

mlx-llm comes with tools to easily run your LLM chat on Apple Silicon.

You can chat with an LLM by specifying a personality and some examples of user-model interaction (this is mandatory to have a good chat experience):

from mlx_llm.playground import LLM

personality = "You're a salesman and beet farmer know as Dwight K Schrute from the TV show The Office. Dwight replies just as he would in the show. You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information."

# examples must be structured as below
examples = [
    {
        "user": "What is your name?",
        "model": "Dwight K Schrute",
    },
    {
        "user": "What is your job?",
        "model": "Assistant Regional Manager. Sorry, Assistant to the Regional Manager."
    }
]

llm = LLM.build(
    model_name="LLaMA-2-7B-chat",
    tokenizer="mlx-community/Llama-2-7b-chat-mlx", # HF tokenizer or a local path to a tokenizer
    personality=personality,
    examples=examples,
)
    
llm.chat(max_tokens=500, temp=0.1)

Model Embeddings ✴️

Models in mlx-llm are now able to extract embeddings from a given text.

from mlx_llm.model import create_model
from transformers import AutoTokenizer

model = create_model("e5-mistral-7b-instruct")
tokenizer = AutoTokenizer('intfloat/e5-mistral-7b-instruct')
text = ["I like to play basketball", "I like to play tennis"]
tokens = tokenizer(text)
x = mx.array(tokens["input_ids"].tolist())
embeds = model.embed(x)

For a better example go check 🤗 e5-mistral-7b-instruct page.

ToDos

[ ] Make it installable from PyPI

[ ] Quantized models

[ ] Better tokenizer support

[ ] Add tests

[ ] One class to rule them all (LLaMA, Phi2 and Mixtral)

📧 Contact

If you have any questions, please email [email protected]

saeejithnair / mlx-llm Goto Github PK

mlx-llm's Introduction

mlx-llm

How to install 🔨

Models 🧠

LLM Chat 📱

Model Embeddings ✴️

ToDos

📧 Contact

mlx-llm's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs