GithubHelp home page GithubHelp logo

llama.jl's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

svilupp sethaxen

llama.jl's Issues

`run_chat` cannot be interrupted with CTRL+C on MacOS

Expectation: When I run run_chat, I'd like to terminate the interactive session with CTRL+C (as per the llama.cpp manual).

Problem: When I press CTRL+C, the interrupt control sequence gets consumed by REPL and is not emitted. Ie, I cannot stop it and have to restart the REPL session

MWE

using Llama

model = "/Users/simljx/Documents/llama.cpp/models/rocket-3b-2.76bpw.gguf"
Llama.run_chat(; model, prompt="Say hi!", nthreads=1)
# press CTRL+C to terminate

Versions

  • llama.jl: master branch

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 8 on 6 virtual cores
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS = 8

Simplify Initial Setup for LLM Newcomers Using llama.jl Package

Thank you for creating this wrapper! I was about to do it myself -- glad I noticed the link in the past jll PRs :)

I wanted to test an idea with you.

People can use Llama.cpp directly if they want the low-level control, but that requires knowledge of prompt templates and compiling source code etc.

What if this package would serve as a Julia-only entry to have an LLM run on your laptop? Ie, no need to install anything else, we'll create a turnkey solution for you to get started.

Ollama is awesome and super user-friendly, but it's a separate application to download and its own limitations (eg, some performance issues, ngl defaults, etc)
There are many others (ooba, ..), but all are separate tools to install...

What do you think? I'm happy to draft a PR.

Objective:
Enhance the onboarding experience for first-time users of LLMs with llama.jl by simplifying the initial setup process.
Just: use Llama; run_server()

Proposal:

  • Implement a lightweight tracker of a few models in the artefact system or directly via HuggingFace hub (eg, 1-2 models in each size class). The goal is not to compete with HuggingFace or Ollama
  • Introduce an easy way to download a model. I call an alias, if a model isn't in the local folder, it automatically downloads from a provided URL.
  • Simple list of available models, eg, list_models (following the example of MLJ and their models)
  • Introduce a mechanism to pick a default model for run_server if no model argument is provided
  • add some API re-use vs restart mechanism (eg, reuse if kwargs don’t change, restart if model changes)
  • overtime we could roll our own server on top the libs, but it’s super low prio for me (I’d prefer to focus on shipping than duplicating)

Benefits:

  • Streamlines the process for new users, allowing them to start with minimal configuration.
  • Reduces the need for understanding complex aspects of LLMs initially.

Feedback and suggestions are welcome.

Disclaimer: I'm an author of https://github.com/svilupp/PromptingTools.jl, so I'd leverage the API from there and deepen the integration

EDIT: I have other goals/aspirations for llama.cpp jll (eg, data extraction with the grammar), but I think we should first simplify the setup for users.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.