jncraton / languagemodels Goto Github PK

View Code? Open in Web Editor NEW

1.2K 10.0 78.0 372 KB

Explore large language models in 512MB of RAM

Home Page: https://jncraton.github.io/languagemodels/

License: MIT License

Python 83.83% Makefile 2.92% TeX 13.25%

llm nlp python

languagemodels's Introduction

Language Models

Python building blocks to explore large language models in as little as 512MB of RAM

This package makes using large language models from Python as simple as possible. All inference is performed locally to keep your data private by default.

Installation and Getting Started

This package can be installed using the following command:

pip install languagemodels

Once installed, you should be able to interact with the package in Python as follows:

>>> import languagemodels as lm
>>> lm.do("What color is the sky?")
'The color of the sky is blue.'

This will require downloading a significant amount of data (~250MB) on the first run. Models will be cached for later use and subsequent calls should be quick.

Example Usage

Here are some usage examples as Python REPL sessions. This should work in the REPL, notebooks, or in traditional scripts and applications.

Instruction Following

>>> import languagemodels as lm

>>> lm.do("Translate to English: Hola, mundo!")
'Hello, world!'

>>> lm.do("What is the capital of France?")
'Paris.'

Outputs can be restricted to a list of choices if desired:

>>> lm.do("Is Mars larger than Saturn?", choices=["Yes", "No"])
'No'

Adjusting Model Performance

The base model should run quickly on any system with 512MB of memory, but this memory limit can be increased to select more powerful models that will consume more resources. Here's an example:

>>> import languagemodels as lm
>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")
'You have 8 apples.'
>>> lm.config["max_ram"] = "4gb"
4.0
>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")
'I have 2 apples left.'

GPU Acceleration

If you have an NVIDIA GPU with CUDA available, you can opt in to using the GPU for inference:

>>> import languagemodels as lm
>>> lm.config["device"] = "auto"

Text Completions

>>> import languagemodels as lm

>>> lm.complete("She hid in her room until")
'she was sure she was safe'

Chat

>>> lm.chat('''
...      System: Respond as a helpful assistant.
...
...      User: What time is it?
...
...      Assistant:
...      ''')
'I'm sorry, but as an AI language model, I don't have access to real-time information. Please provide me with the specific time you are asking for so that I can assist you better.'

Code

A model tuned on Python code is included. It can be used to complete code snippets.

>>> import languagemodels as lm
>>> lm.code("""
... a = 2
... b = 5
...
... # Swap a and b
... """)
'a, b = b, a'

External Retrieval

Helper functions are provided to retrieve text from external sources that can be used to augment prompt context.

>>> import languagemodels as lm

>>> lm.get_wiki('Chemistry')
'Chemistry is the scientific study...

>>> lm.get_weather(41.8, -87.6)
'Partly cloudy with a chance of rain...

>>> lm.get_date()
'Friday, May 12, 2023 at 09:27AM'

Here's an example showing how this can be used (compare to previous chat example):

>>> lm.chat(f'''
...      System: Respond as a helpful assistant. It is {lm.get_date()}
...
...      User: What time is it?
...
...      Assistant:
...      ''')
'It is currently Wednesday, June 07, 2023 at 12:53PM.'

Semantic Search

Semantic search is provided to retrieve documents that may provide helpful context from a document store.

>>> import languagemodels as lm
>>> lm.store_doc(lm.get_wiki("Python"), "Python")
>>> lm.store_doc(lm.get_wiki("C language"), "C")
>>> lm.store_doc(lm.get_wiki("Javascript"), "Javascript")
>>> lm.get_doc_context("What does it mean for batteries to be included in a language?")
'From Python document: It is often described as a "batteries included" language due to its comprehensive standard library.Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language and first released it in 1991 as Python 0.9.

From C document: It was designed to be compiled to provide low-level access to memory and language constructs that map efficiently to machine instructions, all with minimal runtime support.'

Full documentation

Speed

This package currently outperforms Hugging Face transformers for CPU inference thanks to int8 quantization and the CTranslate2 backend. The following table compares CPU inference performance on identical models using the best available quantization on a 20 question test set.

Backend	Inference Time	Memory Used
Hugging Face transformers	22s	1.77GB
This package	11s	0.34GB

Note that quantization does technically harm output quality slightly, but it should be negligible at this level.

Models

Sensible default models are provided. The package should improve over time as stronger models become available. The basic models used are 1000x smaller than the largest models in use today. They are useful as learning tools, but perform far below the current state of the art.

Here are the current default models used by the package for a supplied max_ram value:

max_ram	Model Name	Parameters (B)
0.5	LaMini-Flan-T5-248M	0.248
1.0	LaMini-Flan-T5-783M	0.783
2.0	LaMini-Flan-T5-783M	0.783
4.0	flan-alpaca-gpt4-xl	3.0
8.0	openchat-3.5-0106	7.0

For code completions, the CodeT5+ series of models are used.

Commercial Use

This package itself is licensed for commercial use, but the models used may not be compatible with commercial use. In order to use this package commercially, you can filter models by license type using the require_model_license function.

>>> import languagemodels as lm
>>> lm.config['instruct_model']
'LaMini-Flan-T5-248M-ct2-int8'
>>> lm.require_model_license("apache|bsd|mit")
>>> lm.config['instruct_model']
'flan-t5-base-ct2-int8'

It is recommended to confirm that the models used meet the licensing requirements for your software.

Projects Ideas

One of the goals for this package is to be a straightforward tool for learners and educators exploring how large language models intersect with modern software development. It can be used to do the heavy lifting for a number of learning projects:

CLI Chatbot (see examples/chat.py)
Streamlit chatbot (see examples/streamlitchat.py)
Chatbot with information retrieval
Chatbot with access to real-time information
Tool use
Text classification
Extractive question answering
Semantic search over documents
Document question answering

Several example programs and notebooks are included in the examples directory.

languagemodels's People

Contributors

Stargazers

Watchers

languagemodels's Issues

Explore adding support for smaller Stable LM 2 models

Smaller 1.6B and 3B models are available from Stability AI that appear to have good performance characteristics for their size. It may be worth adding support for these, but this is not trivial at the moment as they are not currently supported by CTranslate2.

New Features: lm.get_URL(), lm_get_MD(), lm_get_PDF, lm_getTXT, etc...

It will be wonderfull if there were Helper functions for external retrieval like:

lm.get_URL(path) for using an URL
lm_get_MD(path) for using Markdown document
lm_get_PDF(path) for using PDF documents
lm_getTXT(path) for using txt documents
lm_get_DOC(path) for using MS Word documents
lm_get_JSON(path) for using JSON documents
etc
Like there is for lm.get_wiki (), lm.get_weather(), lm.getdate()
Thanks a lot and keep the great work! It's amazing!

Web UI feature request

This code looks so good and it would be so nice if we can access it in a web user interface locally instead of using python directly in the DOS mode and typing lm.do(" and such codes. Is there any chance you add this feature to this code and make it something like the demo you've sent in reddit ? (https://jncraton-languagemodels-examplesstreamlitchat-s4uj7z.streamlit.app/)

Support 100 translation languages with m2m-100

We could support more translation directions with m2m-100 in cTranslate or use easy translate.

Replacing LaMini for commercial use?

First off, a big thanks for this. Your project has inspired some great ideas and I love how you've simplified this.

I'd like to try and use this in a corporate environment, so the use of LaMini seems to be a deal breaker. l'm still struggling hard to understand all of this, but is there a way I could use something else? It seems like they've taken flan-t5-large and just trained it on a dataset they've created and released under Creative Commons.

So if I'm understanding things, I think I can look for other models that are based on flan-t5-large, or maybe just use flan-t5-large itself? I've tried comparing a similar prompt using your setup vs native flan-t5-large already and the results from your project are much better, so I assume it really is the additional training done by LaMini that makes this shine ?

Either way, thanks for the help and thanks for this project :)

Killed

i am running the lm.extract('prompt', document) function and it seems to sometimes get hung up and then python will exit and return to bash terminal i have the single word message - Killed

I typed sudo dmesg and saw this message in my log:

[300974.215769] Out of memory: Killed process 336943 (python) total-vm:20061840kB, anon-rss:13408616kB, file-rss:128kB, shmem-rss:0kB, UID:1000 pgtables:27924kB oom_score_adj:0

Commercial Models? (Follow Up)

Greetings,

Following up from my previous issue, I see you've added this section on commercial licenses, but I'm struggling to understand what sort of models may be returned by such a command. Do you have a list somewhere?

My real interest though is in figuring out how to make this work with the orca-3b model. I've played with it using GPT4All and llama (my goal is CPU inference on my Intel MacBook Pro) but I like your implementation the best.

I've tried looking into things, and it looks like you're using something called ctransformers2, but it's all a bit over my head still. Can you help dumb down some high level instructions I could do if I wanted to get orca-3b working with this?

Is it possible to manually load the trained weights?

Right now running lm.do() is downloading the models weights and other files.

I am working on a tiny experiment and I would like to package the trained weights. So basically download them and load them manually.

Is it something possible?

Is there any recommended Chinese models (from huggingface)?

when i input Chinese, like:
lm.do("你好")
it returns ' ⁇ , ⁇ , ⁇ '

It seems that the current model does not support Chinese
Is there any recommended Chinese models in huggingface?

How to choose model

Hi, is there a way to choose the model is used? currently I only know how to change the models by changing the max ram but inside of config.py there are references to other models (am particulary interested on phi-1_5). Thanks for your work!

AttributeError: module 'languagemodels' has no attribute 'chat'

Been trying to run this lib in termux and it just keep showing this msgs

lm.extract_answer method is failing

Hello Team,

When I execute lm.store_doc(), the code runs to completion and the get_model method has no issues.
When I execute lm.extract_answer(), I noticed that the get_model() method, unlike in the case above, executes the lines of python below:

elif not tokenizer_only: # Make sure the model is reloaded if we've unloaded it try: modelcache[model_name][1].load_model() except AttributeError: # Encoder-only models can't be unloaded in ctranslate2 pass

This then results in the following error being returned back

Exception has occurred: AttributeError
'NoneType' object has no attribute 'generate_batch'
File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\languagemodels\inference.py", line 154, in generate_instruct
results = model.generate_batch(
File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\languagemodels_init_.py", line 189, in extract_answer
return generate_instruct(f"{context}\n\n{question}")
File "C:\Users\ejmar\Documents\Eternev\ai_agent\languagemodels\test.py", line 9, in
answer = lm.extract_answer(question=prompt, context=context)
AttributeError: 'NoneType' object has no attribute 'generate_batch'

I cloned your repo and ran pip install -r requirements.txt

Would you happen to know what the issue is? I have been trying to debug for the past couple of hours and all I can see is that the get_models method is returning None for the model, which I assume is the issue.

I am working within the test.py file and I have included the project folder below. I would be super appreciative if you could help me debug since I am a huge fan of this project!

languagemodels.zip

partially initialized module 'streamlit' has no attribute 'title'

Very cool project.

> python ./examples/streamlitchat.py
Traceback (most recent call last):
  File "/Users/jeng/ai/languagemodels/./examples/streamlitchat.py", line 8, in <module>
    import streamlit as st
  File "/Users/jeng/ai/languagemodels/examples/streamlit.py", line 11, in <module>
    st.title("[languagemodels](https://github.com/jncraton/languagemodels) Demo")
    ^^^^^^^^
AttributeError: partially initialized module 'streamlit' has no attribute 'title' (most likely due to a circular import)

I just learned what streamlit was and it looks like it is running on my computer, so it doesn't make sense that the script doesn't work.

> streamlit hello

      👋 Welcome to Streamlit!

      If you’d like to receive helpful onboarding emails, news, offers, promotions,
      and the occasional swag, please enter your email address below. Otherwise,
...

Feature request: Training?

I know this is a big feature request. But you state the target audience is for learners and teachers. I've been trying to teach people AI stuff, and honestly I'm still a learner. People are pretty clueless about AI so it doesn't take too much AI knowledge to be the "expert" in the room.

This project would be really awesome if it could train a model file or create a fine tuning.

Another thing that would make it awesome is if there was a way to show how the models it comes with were trained.

How to make it run offline?

when i use it and then turn off the internet, it still works. but when i turn off the internet before launching it, it doesn't work anymore. is there any way to make it work offline?

idea: simple assistant

Please add simple assistant interactions like a 'mycroft'
what is time
calculate 2+18+9
what is weather
etc.

simple add a script/object to resolve specific data/know and using it together.

Introduce guardrails around long prompts

It is currently easy to cause an out-of-memory condition when prompting a model with a very long prompt. This is an expected result of the implementation of both certain tokenizers as well as transformer attention. Experienced users may intentionally want to use long prompts, but less experienced users may encounter this issue by accident and encounter confusing OOM conditions (#31) or extremely slow runtime performance.

It may be helpful to explore a mechanism to limit default prompt length in order to help users avoid these friction points.

store response in variable

How do I store the model's response in a variable. When I tried to do this I received the error:

SyntaxError: cannot assign to expression here. Maybe you meant '==' instead of '='?