marcom / llama.jl Goto Github PK

View Code? Open in Web Editor NEW

23.0 23.0 2.0 116 KB

Julia interface to llama.cpp, a C/C++ library for running language models

License: MIT License

Julia 99.70% Makefile 0.30%

llama.jl's People

Stargazers

Watchers

Forkers

svilupp sethaxen

llama.jl's Issues

OpenWebUI as web-interface

It would be great to have https://docs.openwebui.com/ as (one of) the default interface(s) installed through Llama.jl.

I might give it a try at some point, but I am not that good at this sort of stuff.

`run_chat` cannot be interrupted with CTRL+C on MacOS

Expectation: When I run run_chat, I'd like to terminate the interactive session with CTRL+C (as per the llama.cpp manual).

Problem: When I press CTRL+C, the interrupt control sequence gets consumed by REPL and is not emitted. Ie, I cannot stop it and have to restart the REPL session

MWE

using Llama

model = "/Users/simljx/Documents/llama.cpp/models/rocket-3b-2.76bpw.gguf"
Llama.run_chat(; model, prompt="Say hi!", nthreads=1)
# press CTRL+C to terminate

Versions

llama.jl: master branch

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 8 on 6 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8

download_model not defined error

trying to follow the instructions. Get stopped at the point of "download_model". Julia says it is not defined.

Simplify Initial Setup for LLM Newcomers Using llama.jl Package

Thank you for creating this wrapper! I was about to do it myself -- glad I noticed the link in the past jll PRs :)

I wanted to test an idea with you.

People can use Llama.cpp directly if they want the low-level control, but that requires knowledge of prompt templates and compiling source code etc.

What if this package would serve as a Julia-only entry to have an LLM run on your laptop? Ie, no need to install anything else, we'll create a turnkey solution for you to get started.

Ollama is awesome and super user-friendly, but it's a separate application to download and its own limitations (eg, some performance issues, ngl defaults, etc)
There are many others (ooba, ..), but all are separate tools to install...

What do you think? I'm happy to draft a PR.

Objective:
Enhance the onboarding experience for first-time users of LLMs with llama.jl by simplifying the initial setup process.
Just: use Llama; run_server()

Proposal:

Implement a lightweight tracker of a few models in the artefact system or directly via HuggingFace hub (eg, 1-2 models in each size class). The goal is not to compete with HuggingFace or Ollama
Introduce an easy way to download a model. I call an alias, if a model isn't in the local folder, it automatically downloads from a provided URL.
Simple list of available models, eg, list_models (following the example of MLJ and their models)
Introduce a mechanism to pick a default model for run_server if no model argument is provided
add some API re-use vs restart mechanism (eg, reuse if kwargs don’t change, restart if model changes)
overtime we could roll our own server on top the libs, but it’s super low prio for me (I’d prefer to focus on shipping than duplicating)

Benefits:

Streamlines the process for new users, allowing them to start with minimal configuration.
Reduces the need for understanding complex aspects of LLMs initially.

Feedback and suggestions are welcome.

Disclaimer: I'm an author of https://github.com/svilupp/PromptingTools.jl, so I'd leverage the API from there and deepen the integration

EDIT: I have other goals/aspirations for llama.cpp jll (eg, data extraction with the grammar), but I think we should first simplify the setup for users.

Unit tests should use a tiny LLM for testing

Unit tests in runtests.jl should use a tiny (as small as possible for testing) language model to run tests on.

A tiny model with all params set to zero should be possible to compress to a very small size and commited directly to this repo. Then running tests doesn't require any kind of download.

Note: 100 MB models seem to be available

https://www.reddit.com/r/LocalLLaMA/comments/12vu0uz/minimal_ggml_file_or_are_there_any_tinytoy_models/

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble