getumbrel / llama-gpt Goto Github PK

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

Home Page: https://apps.umbrel.com/app/llama-gpt

License: MIT License

Shell 7.48% Dockerfile 1.57% Makefile 0.20% TypeScript 89.55% JavaScript 0.92% CSS 0.28%

ai chatgpt gpt gpt-4 gpt4all llama llama-2 llama-cpp llama2 llamacpp

llama-gpt's People

Contributors

Stargazers

Watchers

Forkers

itsharex stvhanna waleychen gravitylabllc jaytoday zanjani1 ikembakwem ashbt everhusk esmevane hbcbh1999 416c616e edicristofaro eevoo hien aiworkspace autohandai jimyhuang dorucioclea worthmining rsohlot kokizzu terirepobackups todaywasawesome key7men rsebi edgar971 arcrats androolloyd lucaspetti jclaudan gajendrarajoriya91 akarsh-ts duvitech-llc promoterx 0xforked bensonarafat changrong1023 hugobloem lilmike415 jakariyakatari marwensaidi iludolf vo1d-null xg31ixo chandrahas-b nf421hw jaedukseo aldekal hadryan max89c51 monsterxz9 dani-el-lo naveedharri gavin-christman filipe1992 jjuanrivvera99 fredatgithub ehamiter asvanthc lightszentip sreejit03 omarofo joshgay alpha-1729 hetcrypto radubalanpro aliabdel2555 kadantte reanimatedmanx farhadfa22 gonisulaimann a3r0id danyray420 bionicbluebell adrianschubek gmh5225 nleea ishan-seth freekang daddyunikii urbanist-ai edgeworths leevaleeth mbakpur123 adeliavale void-kun aveshrahmani05 rameshlavanuru lookming1566 ukaserge zijiny tinboi1 advivedi9 unckleg topkiller codenothappening gdjdvsksvsk hbenormous ctkqiang

llama-gpt's Issues

Greatwork! Can we also support coding block in the UI for it to output code?

The above is what I get from the UI.

UI is not working / chatbot-ui does not start

Hello!

Sorry for the dump question maybe, I just found llama-gpt, and this is the very first try to run a gpt model locally.

Trying it on my M1 Pro macbook

I run run-mac.sh, it downloaded everything, and I run the server successfully:

Uvicorn running on http://localhost:3001

If I go to localhost:3001/docs# - I can see API interface

When I go to localhost:3001 itself, I have a response

{"detail":"Not Found"}

And in log

INFO: ::1:55660 - "GET / HTTP/1.1" 404 Not Found

I tried to install chatbot-ui externally, but was not able to connect it with this server.

From the documentation I suppose that chatbot-ui should be installed, but I can't find it running anywhere.

So how I can see it running actually ?

Mac Pro 2019 (memory usage?)

I have a Mac Pro 2019 28core Xeon with 256GB of RAM and i would like to install LlamaGPT on it. But now i see only the docker compose option for x86 cpu's. Does this mean that it is not going to use my memory?

I am sure the 28c/56t Xeon will do the job, but is there a way to get use of my memory of even my 6900XT?

Permission denied when pulling image

Hello, I am trying to run it on my Ubuntu machine with docker but cannot pull the image for the api:

$ docker-compose up -d
Pulling llama-gpt-api (ghcr.io/getumbrel/llama-gpt-api-llama-2-7b-chat:latest)...
ERROR: Head "https://ghcr.io/v2/getumbrel/llama-gpt-api-llama-2-7b-chat/manifests/latest": denied

In any case, I was able to build the image and run it, so just wanted to inform about this since it's in the readme.
Thanks for the awesome work!

Memory Allocation Error

Currently, I am only able to get the 7B model running, and it takes 15-20 seconds per token.

Docker Desktop shows container memory usage at only 600-800mb / 1.89GB and only 2 cores allocated.

I'm getting this error:

warning: failed to mlock 73728000-byte buffer (after previously locking 73744384 bytes): Cannot allocate memory
llama-gpt-llama-gpt-api-7b-1  | Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

System specs:
Installed Physical Memory (RAM) 64.0 GB
Processor 12th Gen Intel(R) Core(TM) i7-12700K, 3600 Mhz, 12 Core(s), 20 Logical Processor(s)

Benchmark results:

llama-gpt-llama-gpt-api-7b-1  | llama_print_timings:        load time = 31486.69 ms
llama-gpt-llama-gpt-api-7b-1  | llama_print_timings:      sample time =    33.00 ms /    34 runs   (    0.97 ms per token,  1030.21 tokens per second)
llama-gpt-llama-gpt-api-7b-1  | llama_print_timings: prompt eval time = 31485.54 ms /    83 tokens (  379.34 ms per token,     2.64 tokens per second)
llama-gpt-llama-gpt-api-7b-1  | llama_print_timings:        eval time = 574080.73 ms /    33 runs   (17396.39 ms per token,     0.06 tokens per second)
llama-gpt-llama-gpt-api-7b-1  | llama_print_timings:       total time = 606068.42 ms
llama-gpt-llama-gpt-api-7b-1  |

Sorry if I'm missing something obvious, I've been troubleshooting for multiple hours now with no luck.

Can't install the 70B model

[+] Running 2/0
✔ Container llama-gpt-llama-gpt-ui-1 Created 0.0s
✔ Container llama-gpt-llama-gpt-api-70b-1 Created 0.0s
Attaching to llama-gpt-llama-gpt-api-70b-1, llama-gpt-llama-gpt-ui-1
llama-gpt-llama-gpt-ui-1 | [INFO wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1 | [INFO wait] docker-compose-wait 2.12.0
llama-gpt-llama-gpt-ui-1 | [INFO wait] ---------------------------
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] Starting with configuration:
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Hosts to be waiting for: [llama-gpt-api-70b:8000]
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Paths to be waiting for: []
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Timeout before failure: 21600 seconds
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - TCP connection timeout before retry: 5 seconds
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] - Sleeping time between retries: 1 seconds
llama-gpt-llama-gpt-ui-1 | [DEBUG wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1 | [INFO wait] Checking availability of host [llama-gpt-api-70b:8000]
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-api-70b-1 | /models/llama-2-70b-chat.bin model found.
llama-gpt-llama-gpt-api-70b-1 | python3 setup.py develop
llama-gpt-llama-gpt-api-70b-1 | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-70b-1 | !!
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-70b-1 | Please avoid running setup.py and easy_install.
llama-gpt-llama-gpt-api-70b-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-70b-1 | standards-based tools.
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | See pypa/setuptools#917 for details.
llama-gpt-llama-gpt-api-70b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | !!
llama-gpt-llama-gpt-api-70b-1 | easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | [0/1] Install the project...
llama-gpt-llama-gpt-api-70b-1 | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-70b-1 | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-70b-1 | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | running develop
llama-gpt-llama-gpt-api-70b-1 | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-70b-1 | !!
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-70b-1 | Please avoid running setup.py directly.
llama-gpt-llama-gpt-api-70b-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-70b-1 | standards-based tools.
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-70b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | !!
llama-gpt-llama-gpt-api-70b-1 | self.initialize_options()
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | running egg_info
llama-gpt-llama-gpt-api-70b-1 | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-70b-1 | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-70b-1 | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-70b-1 | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-70b-1 | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-70b-1 | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-70b-1 | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-70b-1 | running build_ext
llama-gpt-llama-gpt-api-70b-1 | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-70b-1 | llama-cpp-python 0.1.78 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Installed /app
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-70b-1 | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-70b-1 | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-70b-1 | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-70b-1 | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-70b-1 | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-70b-1 | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-70b-1 | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-70b-1 | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-70b-1 | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-70b-1 | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | Finished processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-70b-1 | Initializing server with:
llama-gpt-llama-gpt-api-70b-1 | Batch size: 2096
llama-gpt-llama-gpt-api-70b-1 | Number of CPU threads: 12
llama-gpt-llama-gpt-api-70b-1 | Number of GPU layers: 0
llama-gpt-llama-gpt-api-70b-1 | Context window: 4096
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-api-70b-1 | /usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model".
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).
llama-gpt-llama-gpt-api-70b-1 | warnings.warn(
llama-gpt-llama-gpt-api-70b-1 |
llama-gpt-llama-gpt-api-70b-1 | llama.cpp: loading model from /models/llama-2-70b-chat.bin
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: warning: assuming 70B model based on GQA == 8
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: format = ggjt v3 (latest)
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_vocab = 32000
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_ctx = 4096
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_embd = 8192
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_mult = 4096
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_head = 64
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_head_kv = 8
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_layer = 80
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_rot = 128
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_gqa = 8
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: rnorm_eps = 5.0e-06
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: n_ff = 28672
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: freq_base = 10000.0
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: freq_scale = 1
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: model size = 70B
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: ggml ctx size = 0.21 MB
llama-gpt-llama-gpt-api-70b-1 | llama_model_load_internal: mem required = 37070.96 MB (+ 1280.00 MB per state)
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-70b:8000] not yet available...
And it just keeps going like this. So I just stopped it after an hour

Add Kubernetes Support

I'll try today to get it working in Kubernetes. Would be great to have a Helm chart to install and run this.

70b and 13b do not work

llama-gpt-llama-gpt-api-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-1  |   warnings.warn(
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-llama-gpt-api-1  |     app = create_app(settings=settings)
llama-gpt-llama-gpt-api-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/server/app.py", line 313, in create_app
llama-gpt-llama-gpt-api-1  |     llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-1  |             ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama.py", line 308, in __init__
llama-gpt-llama-gpt-api-1  |     raise ValueError(f"Model path does not exist: {model_path}")
llama-gpt-llama-gpt-api-1  | ValueError: Model path does not exist: /models/llama-2-70b-chat.bin
llama-gpt-llama-gpt-api-1  | Exception ignored in: <function Llama.__del__ at 0x7ff786f19e40>
llama-gpt-llama-gpt-api-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama.py", line 1507, in __del__
llama-gpt-llama-gpt-api-1  |     if self.model is not None:
llama-gpt-llama-gpt-api-1  |        ^^^^^^^^^^
llama-gpt-llama-gpt-api-1  | AttributeError: 'Llama' object has no attribute 'model'

Question: Does this have some kind of rest api ?

How to use my own models?

model_config['protected_namespaces'] = ('settings_',)`

(base) ┌─(~/Downloads/llama-gpt-master)────────────────(iwis@iwisdeMacBook-Air:s000)─┐
└─(02:52:27)──> /Users/iwis/miniforge3/envs/llama-gpt/lib/python3.10/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).
warnings.warn(

Same issue on mac or win.

zsh syntax highlighting error

i'm getting an error trying to run the model. looks like one of my iterm plugins is causing it. however, even after removing it I still get the same issue.

(base) ➜  llama-gpt git:(master) ./run-mac.sh --model 13b
/usr/local/share/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh: line 31: alias: -L: invalid option
alias: usage: alias [-p] [name[=value] ... ]
/usr/local/share/zsh-syntax-highlighting/zsh-syntax-highlighting.zsh: line 36: unalias: -m: invalid option
unalias: usage: unalias [-a] name [name ...]

Wiki for examples for using api in python

This repo is amazing no matter how many stars you give they are less
Even a single example is enough in this replu as well

#feature request
Ability to select models from higgingface
like wizard uncensored ggml 7b or 13 b
one varible to setup in compose or web ui then we can chage models like oogaboga it will auto download and we can run it.

Thanks for this amazing project.

Memory usage low while machine has 64GB using 7B model

I noticed during generation that the RAM usage was only around 4.5GB. My machine has 64GB. Is there a way to allow for higher memory usage? Would this increase the tps at all?

Move model download from the Docker image build step to the run script

With the Docker files, as designed currently, you are staging a rather large file inside of a Docker image, which arguably violates the Docker Image best practice guideline of Don't install unnecessary packages. Ultimately, this dramatically inflates the size of the image to contain the model when it should be treated as a static asset that should be attached to a container.

By migrating the code to download the model to the run script, operators can now attach a volume to /models with storage that is defined outside of the context of the container. This could be a volume that lives on the same host, just as it did before if it was in the context of the Image, or could be provided by any other storage driver supported by Docker. Changes to the Docker Image will not require re-downloading and re-staging the model, significantly lowering both build time and bandwidth required to rebuild the image.

In summary:

Remove downloading the model from the Dockerfile for the API service.
Update the API service in the compose files to mount a Docker Volume to /models
Enhance run.sh with a simple model download manager to check for the existence of a model, and if it does not exist, download it
Optional: Expose the model to run as an environment variable, simplifying the amount of Dockerfiles needed to support the project. Have the run.sh model download manager use the environment variable as input to specify what model to check, optionally run, and launch with.

Error starting docker compose: `error loading model: llama.cpp: tensor 'layers.10.ffn_norm.weight' is missing from model`

After running:

docker compose up

I get a strange error in the API - looks like the the something's missing from the model?

llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-7b-1  | [0/1] Install the project...
llama-gpt-llama-gpt-api-7b-1  | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-7b-1  | -- Up-to-date: /app/_skbuild/linux-aarch64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  | copying _skbuild/linux-aarch64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | running develop
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   self.initialize_options()
llama-gpt-llama-gpt-api-7b-1  | running egg_info
llama-gpt-llama-gpt-api-7b-1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-7b-1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-7b-1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-7b-1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-7b-1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-7b-1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | running build_ext
llama-gpt-llama-gpt-api-7b-1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-7b-1  | llama-cpp-python 0.1.78 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Installed /app
llama-gpt-llama-gpt-api-7b-1  | Processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-7b-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-7b-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-7b-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Processing numpy-1.26.0b1-py3.11-linux-aarch64.egg
llama-gpt-llama-gpt-api-7b-1  | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-aarch64.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-7b-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-7b-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-7b-1  | Finished processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-7b-1  | Initializing server with:
llama-gpt-llama-gpt-api-7b-1  | Batch size: 2096
llama-gpt-llama-gpt-api-7b-1  | Number of CPU threads: 8
llama-gpt-llama-gpt-api-7b-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-7b-1  | Context window: 4096
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-7b-1  |   warnings.warn(
llama-gpt-llama-gpt-api-7b-1  | llama.cpp: loading model from /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: format     = ggjt v3 (latest)
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_vocab    = 32000
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_ctx      = 4096
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_embd     = 4096
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_mult     = 5504
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_head     = 32
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_head_kv  = 32
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_layer    = 32
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_rot      = 128
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_gqa      = 1
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: rnorm_eps  = 5.0e-06
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: n_ff       = 11008
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: freq_base  = 10000.0
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: freq_scale = 1
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: model size = 7B
llama-gpt-llama-gpt-api-7b-1  | llama_model_load_internal: ggml ctx size =    0.03 MB
llama-gpt-llama-gpt-api-7b-1  | error loading model: llama.cpp: tensor 'layers.10.ffn_norm.weight' is missing from model
llama-gpt-llama-gpt-api-7b-1  | llama_load_model_from_file: failed to load model
llama-gpt-llama-gpt-api-7b-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-llama-gpt-api-7b-1  |     app = create_app(settings=settings)
llama-gpt-llama-gpt-api-7b-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-7b-1  |     llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-7b-1  |             ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/llama.py", line 328, in __init__
llama-gpt-llama-gpt-api-7b-1  |     assert self.model is not None
llama-gpt-llama-gpt-api-7b-1  |            ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  | AssertionError

System specs

Apple M2 Max
32 GB memory

❯ docker system info
Client:
 Version:    24.0.2
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-compose
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.19
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-extension
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.4
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-sbom
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-scan
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  v0.12.0
    Path:     /Users/jpmcb/.docker/cli-plugins/docker-scout

Server:
 Containers: 16
  Running: 1
  Paused: 0
  Stopped: 15
 Images: 34
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.49-linuxkit-pr
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 8
 Total Memory: 15.61GiB
 Name: docker-desktop
 ID: ea3b77fd-563e-4bef-8f96-aa0f769b3f88
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

Kubernetes install on Raspberry Pi (v4) K3s environment

I ran the Kubernetes install (on K3s cluster on 4-node Raspberry Pi (v4) / 64-bit DietPi OS) as documented on the home page. I do not remember seeing any errors. Two pods were created, one the API and the other the GUI. The GUI appears to be running on port 3000. When showing the output of the pods after the installation, the GUI pod shows running, but the API pod shows pending. I am not sure if that is normal. I believe I have exposed the GUI correctly, but I am not able to connect. I am getting a 'bad gateway' message. I am trying to understand if that is due to the API pod not "running" or if it is because I have not exposed the connectivity correctly. If pending is not the correct status for the API pod, are there any suggestions on where to begin looking?

Thank You.

how to download the load the model

i am trying to install with docker, i am now in this step：“llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-13b:8000] not yet available..”， now my question is how to download the load the model? below is the guideline.

Note: On the first run, it may take a while for the model to be downloaded to the /models directory. You may see lots of output like for a few minutes, which is normal:

llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-13b:8000] not yet available...
After the model has been downloaded and loaded, and the API server is running, you'll see an output like:

llama-gpt-llama-gpt-api-13b-1 | INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

API container stuck restarting

OS: Windows 10 22H2
CPU: 9900K
RAM: 32 GB

Attempted the install from commit c9cfd24

git clone https://github.com/getumbrel/llama-gpt.git
cd llama-gpt
docker compose -f docker-compose-13b.yml up -d

The container llama-gpt-llama-gpt-api-1 does not start, and is stuck in a restart loop repeating:

: not found19:41:13 /app/run.sh: 2: 
'.  Stop.6 19:41:13 make: *** No rule to make target 'build
: not found19:41:13 /app/run.sh: 4: 
: not found19:41:13 /app/run.sh: 7: 
: not found19:41:13 /app/run.sh: 10: 
: not found19:41:13 /app/run.sh: 13: 
2023-08-16 19:41:13 /app/run.sh: 29: Syntax error: end of file unexpected (expecting "then")

开源AI集成应用，支持多AI同时回复

https://github.com/win4r/AISuperDomain

Assertion Error

When i try to run the "docker compose up" command it downloads the model and then throws an AssertionError. I have tried deleting the model manually multiple times, but it still doesnt seem to work.

llama-gpt-api | Anwser incomplete or foreign language in the answer

Hi,

First off, thank you for your work.
It's realy nice being able to implement a local version of this tool.

I'm having a weird issue with the API where the answers are either incomplete or mixed with some foreign language (mostly german or russian in my english chat).

I'm using the v1/chat/completions with the 7b model.

here are some examples :

    - question => "Hi, how are you?"
    - answer => "I am doing well. Hinweis: Das TTS-Modul muss aktiviert"

    - question => "Why should we hire John as a web developer?"
    - answer => "John has 7 years of experience in web development and is prof"

    - question => "Hi, are you OK?"
    - answer => "Yes. październik 21, 2020 at 8:53am This is the newest model of AI Assistant from Apple, Siri. It can answer a lot of questions and do many tasks for users without any human intervention."

Thank you for your help.

Wow another duplicate that can't ingest your own data.

Stuck in an infinite loop while installing

llama-gpt-llama-gpt-api-7b-1 | Warning: Failed to create the file /models/llama-2-7b-chat.bin: Permission
llama-gpt-llama-gpt-api-7b-1 | Warning: denied
0 3616M 0 15819 0 0 38489 0 27:21:54 --:--:-- 27:21:54 38489
llama-gpt-llama-gpt-api-7b-1 | curl: (23) Failure writing output to destination
llama-gpt-llama-gpt-api-7b-1 | Download failed. Trying with TLS 1.2...
llama-gpt-llama-gpt-api-7b-1 | % Total % Received % Xferd Average Speed Time Time Time Current
llama-gpt-llama-gpt-api-7b-1 | Dload Upload Total Spent Left
llama-gpt-llama-gpt-api-7b-1 | Speed
0 0 0 0 0 0 0
llama-gpt-llama-gpt-api-7b-1 | 0
llama-gpt-llama-gpt-api-7b-1 | --:--:--
llama-gpt-llama-gpt-api-7b-1 | --
llama-gpt-llama-gpt-api-7b-1 | :
llama-gpt-llama-gpt-api-7b-1 | --:-- --:
llama-gpt-llama-gpt-api-7b-1 | --:--
llama-gpt-llama-gpt-api-7b-1 | 0
100 1260 100 1260 0 0 6331 0 --:--:-- --:--:-- --:--:-- 6363
llama-gpt-llama-gpt-api-7b-1 | Warning: Failed to create the file /models/llama-2-7b-chat.bin: Permission
llama-gpt-llama-gpt-api-7b-1 | Warning: denied
0 3616M 0 15819 0 0 47647 0 22:06:19
llama-gpt-llama-gpt-api-7b-1 | --:--:-- 22:06:19 47647
llama-gpt-llama-gpt-api-7b-1 | curl: (23) Failure writing output to destination
llama-gpt-llama-gpt-api-7b-1 | python3 setup.py develop
llama-gpt-llama-gpt-api-7b-1 | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-7b-1 | !!
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-7b-1 | Please avoid running setup.py and easy_install.
llama-gpt-llama-gpt-api-7b-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1 | standards-based tools.
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | See pypa/setuptools#917 for details.
llama-gpt-llama-gpt-api-7b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | !!
llama-gpt-llama-gpt-api-7b-1 | easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | [0/1] Install the project...
llama-gpt-llama-gpt-api-7b-1 | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-7b-1 | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1 | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | running develop
llama-gpt-llama-gpt-api-7b-1 | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-7b-1 | !!
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-7b-1 | Please avoid running setup.py directly.
llama-gpt-llama-gpt-api-7b-1 | Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1 | standards-based tools.
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-7b-1 | ********************************************************************************
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | !!
llama-gpt-llama-gpt-api-7b-1 | self.initialize_options()
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | running egg_info
llama-gpt-llama-gpt-api-7b-1 | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-7b-1 | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-7b-1 | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-7b-1 | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-7b-1 | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1 | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-7b-1 | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1 | running build_ext
llama-gpt-llama-gpt-api-7b-1 | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-7b-1 | llama-cpp-python 0.1.78 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Installed /app
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-7b-1 | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-7b-1 | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-7b-1 | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1 | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-7b-1 | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-7b-1 | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1 | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1 | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-7b-1 | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-7b-1 | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Finished processing dependencies for llama-cpp-python==0.1.78
llama-gpt-llama-gpt-api-7b-1 | Initializing server with:
llama-gpt-llama-gpt-api-7b-1 | Batch size: 2096
llama-gpt-llama-gpt-api-7b-1 | Number of CPU threads: 12
llama-gpt-llama-gpt-api-7b-1 | Number of GPU layers: 0
llama-gpt-llama-gpt-api-7b-1 | Context window: 4096
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-api-7b-1 | /usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model".
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).
llama-gpt-llama-gpt-api-7b-1 | warnings.warn(
llama-gpt-llama-gpt-api-7b-1 |
llama-gpt-llama-gpt-api-7b-1 | Traceback (most recent call last):
llama-gpt-llama-gpt-api-7b-1 | File "", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-7b-1 | File "", line 88, in _run_code
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/server/main.py", line 46, in
llama-gpt-llama-gpt-api-7b-1 | app = create_app(settings=settings)
llama-gpt-llama-gpt-api-7b-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-7b-1 | llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-7b-1 | ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/llama.py", line 317, in init
llama-gpt-llama-gpt-api-7b-1 | raise ValueError(f"Model path does not exist: {model_path}")
llama-gpt-llama-gpt-api-7b-1 | ValueError: Model path does not exist: /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-7b-1 exited with code 1
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1 | [INFO wait] Host [llama-gpt-api-7b:8000] not yet available...

This has been on loop for the past 4 hours. Tried installing the smallest and largest (7B and 70B) models.
Windows 10 Pro
i7-8700K CPU 3.70GHz
64.0 GB RAM
GTX 1080

Can you please add more details on OPENAI_API_KEY env var in Readme

The UI has OPENAI_API_KEY set to sk-xxxxxxxxxxxxx.
Please can you explain in readme, what is its purpose? How can we generate new API keys for different teams using this deployment?

Crashes on long message - Internal Server Error.

I probably ran out of memory, it'd be helpful if:

There was an error message other than "Internal Server Error" when this happens
The application recovered by restarting. Right now it is temporarily broken until the docker-compose is restarted.

I ran docker compose -f docker-compose-13b.yml up

Logs

llama-gpt-llama-gpt-api-1  | INFO:     172.19.0.2:47554 - "GET /v1/models HTTP/1.1" 200 OK
llama-gpt-llama-gpt-ui-1   | {
llama-gpt-llama-gpt-ui-1   |   id: '/models/llama-2-13b-chat.bin',
llama-gpt-llama-gpt-ui-1   |   name: 'Llama 2 13B',
llama-gpt-llama-gpt-ui-1   |   maxLength: 12000,
llama-gpt-llama-gpt-ui-1   |   tokenLimit: 4000
llama-gpt-llama-gpt-ui-1   | } 'You are a helpful and friendly AI assistant. Respond very concisely.' 1 '' [
llama-gpt-llama-gpt-ui-1   |   {
llama-gpt-llama-gpt-ui-1   |   role: 'user',
llama-gpt-llama-gpt-ui-1   |   content: 'You are a journalist editor. Please summarize this article in exactly three (3) bullet points. Here is the article: "<long text>"
llama-gpt-llama-gpt-ui-1   | }
llama-gpt-llama-gpt-ui-1   | ]
llama-gpt-llama-gpt-ui-1   |  [TypeError: fetch failed] {
llama-gpt-llama-gpt-ui-1   |   cause:  [SocketError: other side closed] {
llama-gpt-llama-gpt-ui-1   |   name: 'SocketError',
llama-gpt-llama-gpt-ui-1   |   code: 'UND_ERR_SOCKET',
llama-gpt-llama-gpt-ui-1   |   socket: {
llama-gpt-llama-gpt-ui-1   |   localAddress: '172.19.0.2',
llama-gpt-llama-gpt-ui-1   |   localPort: 59410,
llama-gpt-llama-gpt-ui-1   |   remoteAddress: '172.19.0.3',
llama-gpt-llama-gpt-ui-1   |   remotePort: 8000,
llama-gpt-llama-gpt-ui-1   |   remoteFamily: 'IPv4',
llama-gpt-llama-gpt-ui-1   |   timeout: undefined,
llama-gpt-llama-gpt-ui-1   |   bytesWritten: 2812,
llama-gpt-llama-gpt-ui-1   |   bytesRead: 0
llama-gpt-llama-gpt-ui-1   | }
llama-gpt-llama-gpt-ui-1   | }
llama-gpt-llama-gpt-ui-1   | }
llama-gpt-llama-gpt-api-1 exited with code 139
llama-gpt-llama-gpt-ui-1   | making request to  http://llama-gpt-api:8000/v1/models
llama-gpt-llama-gpt-ui-1   |  [TypeError: fetch failed] {
llama-gpt-llama-gpt-ui-1   |   cause:  [Error: getaddrinfo ENOTFOUND llama-gpt-api] {
llama-gpt-llama-gpt-ui-1   |   errno: -3008,
llama-gpt-llama-gpt-ui-1   |   code: 'ENOTFOUND',
llama-gpt-llama-gpt-ui-1   |   syscall: 'getaddrinfo',
llama-gpt-llama-gpt-ui-1   |   hostname: 'llama-gpt-api'
llama-gpt-llama-gpt-ui-1   | }

replies / slow

how to configure it to show replies in real time and not wait for the end of the generation.

Thanks

Can Llama-gpt be installed w/out Docker?

I use Ubuntu 22.04.3 LTS and also LXD "system" containers and try to avoid using Docker when I can because it
is duplicating in an application container (docker) what I'd rather install into an LXD container (fedora, debian, ubuntu, suse, alpine etc)

Docker runs fine when I have to use it and it also runs ok "nested" in an LXD container but Is there a way to install without docker?

thanks for any info
brian

Assertion error in windows

llama-gpt-llama-gpt-api-7b-1 | Traceback (most recent call last):
llama-gpt-llama-gpt-api-7b-1 | File "", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-7b-1 | File "", line 88, in _run_code
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/server/main.py", line 46, in
llama-gpt-llama-gpt-api-7b-1 | app = create_app(settings=settings)
llama-gpt-llama-gpt-api-7b-1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-7b-1 | llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-7b-1 | ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1 | File "/app/llama_cpp/llama.py", line 328, in init
llama-gpt-llama-gpt-api-7b-1 | assert self.model is not None
llama-gpt-llama-gpt-api-7b-1 | ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1 | AssertionError

[Feature Request] is possible to add a password or a login screen before we can acess the chat ?

Crashes on startup - memlock

Hi,

I am trying to run but I get this error after unning the 70B docker-compose.

System specs are Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 12 Cores and 128Gb of RAM.

My best guess is that the docker file has to be modified to bypass mlock.

Starting llama-gpt_llama-gpt-api_1 ... done
Starting llama-gpt_llama-gpt-ui_1  ... done
Attaching to llama-gpt_llama-gpt-ui_1, llama-gpt_llama-gpt-api_1
llama-gpt-api_1  | python3 setup.py develop
llama-gpt-ui_1   | [INFO  wait] --------------------------------------------------------
llama-gpt-ui_1   | [INFO  wait]  docker-compose-wait 2.12.0
llama-gpt-ui_1   | [INFO  wait] ---------------------------
llama-gpt-ui_1   | [DEBUG wait] Starting with configuration:
llama-gpt-ui_1   | [DEBUG wait]  - Hosts to be waiting for: [llama-gpt-api:8000]
llama-gpt-ui_1   | [DEBUG wait]  - Paths to be waiting for: []
llama-gpt-ui_1   | [DEBUG wait]  - Timeout before failure: 600 seconds
llama-gpt-ui_1   | [DEBUG wait]  - TCP connection timeout before retry: 5 seconds
llama-gpt-ui_1   | [DEBUG wait]  - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-ui_1   | [DEBUG wait]  - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-ui_1   | [DEBUG wait]  - Sleeping time between retries: 1 seconds
llama-gpt-ui_1   | [DEBUG wait] --------------------------------------------------------
llama-gpt-ui_1   | [INFO  wait] Checking availability of host [llama-gpt-api:8000]
llama-gpt-ui_1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-api_1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-api_1  | !!
llama-gpt-api_1  |
llama-gpt-api_1  |         ********************************************************************************
llama-gpt-api_1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-api_1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-api_1  |         standards-based tools.
llama-gpt-api_1  |
llama-gpt-api_1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-api_1  |         ********************************************************************************
llama-gpt-api_1  |
llama-gpt-api_1  | !!
llama-gpt-api_1  |   easy_install.initialize_options(self)
llama-gpt-api_1  | [0/1] Install the project...
llama-gpt-api_1  | -- Install configuration: "Release"
llama-gpt-api_1  | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-api_1  | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-api_1  |
llama-gpt-api_1  | running develop
llama-gpt-api_1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-api_1  | !!
llama-gpt-api_1  |
llama-gpt-api_1  |         ********************************************************************************
llama-gpt-api_1  |         Please avoid running ``setup.py`` directly.
llama-gpt-api_1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-api_1  |         standards-based tools.
llama-gpt-api_1  |
llama-gpt-api_1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-api_1  |         ********************************************************************************
llama-gpt-api_1  |
llama-gpt-api_1  | !!
llama-gpt-api_1  |   self.initialize_options()
llama-gpt-api_1  | running egg_info
llama-gpt-api_1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-api_1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-api_1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-api_1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-api_1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-api_1  | adding license file 'LICENSE.md'
llama-gpt-api_1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-api_1  | running build_ext
llama-gpt-api_1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-api_1  | llama-cpp-python 0.1.77 is already the active version in easy-install.pth
llama-gpt-api_1  |
llama-gpt-api_1  | Installed /app
llama-gpt-api_1  | Processing dependencies for llama-cpp-python==0.1.77
llama-gpt-api_1  | Searching for diskcache==5.6.1
llama-gpt-api_1  | Best match: diskcache 5.6.1
llama-gpt-api_1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-api_1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-api_1  |
llama-gpt-api_1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-api_1  | Searching for numpy==1.25.1
llama-gpt-api_1  | Best match: numpy 1.25.1
llama-gpt-api_1  | Processing numpy-1.25.1-py3.11-linux-x86_64.egg
llama-gpt-api_1  | Adding numpy 1.25.1 to easy-install.pth file
llama-gpt-api_1  | Installing f2py script to /usr/local/bin
llama-gpt-api_1  | Installing f2py3 script to /usr/local/bin
llama-gpt-api_1  | Installing f2py3.11 script to /usr/local/bin
llama-gpt-api_1  |
llama-gpt-api_1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.25.1-py3.11-linux-x86_64.egg
llama-gpt-api_1  | Searching for typing-extensions==4.7.1
llama-gpt-api_1  | Best match: typing-extensions 4.7.1
llama-gpt-api_1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-api_1  |
llama-gpt-api_1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-api_1  | Finished processing dependencies for llama-cpp-python==0.1.77
llama-gpt-api_1  | Initializing server with:
llama-gpt-api_1  | Batch size: 2096
llama-gpt-api_1  | Number of CPU threads: 12
llama-gpt-api_1  | Number of GPU layers: 0
llama-gpt-api_1  | Context window: 4096
llama-gpt-ui_1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-api_1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-api_1  |
llama-gpt-api_1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-api_1  |   warnings.warn(
llama-gpt-api_1  | llama.cpp: loading model from /models/llama-2-70b-chat.bin
llama-gpt-api_1  | llama_model_load_internal: format     = ggjt v3 (latest)
llama-gpt-api_1  | llama_model_load_internal: n_vocab    = 32000
llama-gpt-api_1  | llama_model_load_internal: n_ctx      = 4096
llama-gpt-api_1  | llama_model_load_internal: n_embd     = 8192
llama-gpt-api_1  | llama_model_load_internal: n_mult     = 4096
llama-gpt-api_1  | llama_model_load_internal: n_head     = 64
llama-gpt-api_1  | llama_model_load_internal: n_head_kv  = 64
llama-gpt-api_1  | llama_model_load_internal: n_layer    = 80
llama-gpt-api_1  | llama_model_load_internal: n_rot      = 128
llama-gpt-api_1  | llama_model_load_internal: n_gqa      = 1
llama-gpt-api_1  | llama_model_load_internal: rnorm_eps  = 1.0e-06
llama-gpt-api_1  | llama_model_load_internal: n_ff       = 24576
llama-gpt-api_1  | llama_model_load_internal: freq_base  = 10000.0
llama-gpt-api_1  | llama_model_load_internal: freq_scale = 1
llama-gpt-api_1  | llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama-gpt-api_1  | llama_model_load_internal: model size = 65B
llama-gpt-api_1  | llama_model_load_internal: ggml ctx size =    0.21 MB
llama-gpt-api_1  | warning: failed to mlock 221184-byte buffer (after previously locking 0 bytes): Cannot allocate memory
llama-gpt-api_1  | Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
llama-gpt-api_1  | error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama-gpt-api_1  | llama_load_model_from_file: failed to load model
llama-gpt-api_1  | Traceback (most recent call last):
llama-gpt-api_1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-api_1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-api_1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-api_1  |     app = create_app(settings=settings)
llama-gpt-api_1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api_1  |   File "/app/llama_cpp/server/app.py", line 313, in create_app
llama-gpt-api_1  |     llama = llama_cpp.Llama(
llama-gpt-api_1  |             ^^^^^^^^^^^^^^^^
llama-gpt-api_1  |   File "/app/llama_cpp/llama.py", line 313, in __init__
llama-gpt-api_1  |     assert self.model is not None
llama-gpt-api_1  |            ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api_1  | AssertionError
llama-gpt-api_1  | Exception ignored in: <function Llama.__del__ at 0x7f5f36b09e40>
llama-gpt-api_1  | Traceback (most recent call last):
llama-gpt-api_1  |   File "/app/llama_cpp/llama.py", line 1510, in __del__
llama-gpt-api_1  |     if self.ctx is not None:
llama-gpt-api_1  |        ^^^^^^^^
llama-gpt-api_1  | AttributeError: 'Llama' object has no attribute 'ctx'
llama-gpt_llama-gpt-api_1 exited with code 1
^CGracefully stopping... (press Ctrl+C again to force)
Stopping llama-gpt_llama-gpt-ui_1  ... done```

Unable to start in docker

Thanks for the advice in advance!

I have docker running on an Unraid server, cloned the GitHub repo using the instructions here in my appdata folder: https://github.com/getumbrel/llama-gpt#install-llamagpt-anywhere-else-with-docker-cpu-only

and then started the model using docker compose up.

After a few minutes, the download completed, but I get this repeated error:

llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-7b-1  | [0/1] Install the project...
llama-gpt-llama-gpt-api-7b-1  | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-7b-1  | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | running develop
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   self.initialize_options()
llama-gpt-llama-gpt-api-7b-1  | running egg_info
llama-gpt-llama-gpt-api-7b-1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-7b-1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-7b-1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-7b-1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-7b-1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-7b-1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | running build_ext
llama-gpt-llama-gpt-api-7b-1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-7b-1  | llama-cpp-python 0.1.79 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | Installed /app
llama-gpt-llama-gpt-api-7b-1  | Processing dependencies for llama-cpp-python==0.1.79
llama-gpt-llama-gpt-api-7b-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-7b-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-7b-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1  | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-7b-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-7b-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-7b-1  | Finished processing dependencies for llama-cpp-python==0.1.79
llama-gpt-llama-gpt-api-7b-1  | Initializing server with:
llama-gpt-llama-gpt-api-7b-1  | Batch size: 2096
llama-gpt-llama-gpt-api-7b-1  | Number of CPU threads: 56
llama-gpt-llama-gpt-api-7b-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-7b-1  | Context window: 4096
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-7b-1  |   warnings.warn(
llama-gpt-llama-gpt-api-7b-1  | gguf_init_from_file: invalid magic number 67676a74
llama-gpt-llama-gpt-api-7b-1  | error loading model: llama_model_loader: failed to load model from /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-7b-1  | 
llama-gpt-llama-gpt-api-7b-1  | llama_load_model_from_file: failed to load model
llama-gpt-llama-gpt-api-7b-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-llama-gpt-api-7b-1  |     app = create_app(settings=settings)
llama-gpt-llama-gpt-api-7b-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-7b-1  |     llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-7b-1  |             ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/llama.py", line 323, in __init__
llama-gpt-llama-gpt-api-7b-1  |     assert self.model is not None
llama-gpt-llama-gpt-api-7b-1  |            ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  | AssertionError
llama-gpt-llama-gpt-api-7b-1 exited with code 1
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...

Do you have any advice on where to look to fix this? It happens on all three model types, and I've tried deleting everything and starting over, as well as changing permissions for the folders.

WARNING: Package(s) not found: llama-cpp-python

Getting the message during installation while it is installed in the system

Error getting credentials on m1 mbp

Output from when trying to run it on an m1 mbp, with docker-compose installed with brew and the command run from inside the root directory after cloning it. Docker Desktop is running, but I'm not signed in. Does that matter? Thank you for your time.

docker-compose up -d
Building llama-gpt-api
[+] Building 0.2s (3/3) FINISHED docker:desktop-linux
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 689B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> ERROR [internal] load metadata for ghcr.io/abetlen/llama-cpp-python:latest 0.2s

[internal] load metadata for ghcr.io/abetlen/llama-cpp-python:latest:

Dockerfile:8

6 | ARG MODEL_DOWNLOAD_URL=https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GGML/resolve/main/nous-hermes-llama-2-7b.ggmlv3.q4_0.bin
7 |
8 | >>> FROM ${IMAGE}
9 |
10 | ARG MODEL_FILE

ERROR: failed to solve: ghcr.io/abetlen/llama-cpp-python:latest: error getting credentials - err: exec: "docker-credential-desktop": executable file not found in $PATH, out: ``
ERROR: Service 'llama-gpt-api' failed to build : Build failed

Run with models without censorship

Hey guys,

Thank you for one more great app!

Is there a way to load and run models that are not following the OpenAI censorship mechanisms?

For example, the models that we have now are censored and would be nice if we could, easily, load
and use other models that will follow or not any censorship mechanism.

This model https://huggingface.co/NousResearch/GPT4-x-Vicuna-13b-4bit, for instance, is not censored. But I didn't
find a way, yet, to load and use it.

[]'s

CUDA / Metal support

Since the API is built locally, it would be cool if there was a Metal-specific Dockerfile for macs (LLAMA_METAL=1 make should be enough for that) and a CUDA build for Nvidia cards. (llama.cpp provides CUDA images already if that helps)

[Question] Light Mode?

The dark UI is terrible as it is so dark. Is there a way to turn on light mode because I tried looking found no such option.

docker: 'compose' is not a docker command.

Running Ubuntu 23.04.

Followed the git README instructions (git clone, cd, docker), I get:

╭─arthur at aquarelle in ~/dev/ai/llama-gpt on master✔ 23-08-24 - 3:47:53
╰─⠠⠵ docker compose up                                                                                                                           on master|✔
docker: 'compose' is not a docker command.
See 'docker --help'
╭─arthur at aquarelle in ~/dev/ai/llama-gpt on master✔ 23-08-24 - 3:48:00
╰─⠠⠵ docker compose -f docker-compose-13b.yml up                                                                                                 on master|✔
unknown shorthand flag: 'f' in -f
See 'docker --help'.

Usage:  docker [OPTIONS] COMMAND

A self-sufficient runtime for containers

Options:
      --config string      Location of client config files (default "/home/arthur/.docker")
  -c, --context string     Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker
                           context use")
  -D, --debug              Enable debug mode
  -H, --host list          Daemon socket(s) to connect to
  -l, --log-level string   Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default "info")
      --tls                Use TLS; implied by --tlsverify
      --tlscacert string   Trust certs signed only by this CA (default "/home/arthur/.docker/ca.pem")
      --tlscert string     Path to TLS certificate file (default "/home/arthur/.docker/cert.pem")
      --tlskey string      Path to TLS key file (default "/home/arthur/.docker/key.pem")
      --tlsverify          Use TLS and verify the remote
  -v, --version            Print version information and quit

Management Commands:
  builder     Manage builds
  config      Manage Docker configs
  container   Manage containers
  context     Manage contexts
  image       Manage images
  manifest    Manage Docker image manifests and manifest lists
  network     Manage networks
  node        Manage Swarm nodes
  plugin      Manage plugins
  secret      Manage Docker secrets
  service     Manage services
  stack       Manage Docker stacks
  swarm       Manage Swarm
  system      Manage Docker
  trust       Manage trust on Docker images
  volume      Manage volumes

Commands:
  attach      Attach local standard input, output, and error streams to a running container
  build       Build an image from a Dockerfile
  commit      Create a new image from a container's changes
  cp          Copy files/folders between a container and the local filesystem
  create      Create a new container
  diff        Inspect changes to files or directories on a container's filesystem
  events      Get real time events from the server
  exec        Run a command in a running container
  export      Export a container's filesystem as a tar archive
  history     Show the history of an image
  images      List images
  import      Import the contents from a tarball to create a filesystem image
  info        Display system-wide information
  inspect     Return low-level information on Docker objects
  kill        Kill one or more running containers
  load        Load an image from a tar archive or STDIN
  login       Log in to a Docker registry
  logout      Log out from a Docker registry
  logs        Fetch the logs of a container
  pause       Pause all processes within one or more containers
  port        List port mappings or a specific mapping for the container
  ps          List containers
  pull        Pull an image or a repository from a registry
  push        Push an image or a repository to a registry
  rename      Rename a container
  restart     Restart one or more containers
  rm          Remove one or more containers
  rmi         Remove one or more images
  run         Run a command in a new container
  save        Save one or more images to a tar archive (streamed to STDOUT by default)
  search      Search the Docker Hub for images
  start       Start one or more stopped containers
  stats       Display a live stream of container(s) resource usage statistics
  stop        Stop one or more running containers
  tag         Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
  top         Display the running processes of a container
  unpause     Unpause all processes within one or more containers
  update      Update configuration of one or more containers
  version     Show the Docker version information
  wait        Block until one or more containers stop, then print their exit codes

Run 'docker COMMAND --help' for more information on a command.

To get more help with docker, check out our guides at https://docs.docker.com/go/guides/

╭─arthur at aquarelle in ~/dev/ai/llama-gpt on master✔ 23-08-24 - 3:48:55
╰─⠠⠵ docker compose up                                                                                                                           on master|✔
docker: 'compose' is not a docker command.
See 'docker --help'
╭─arthur at aquarelle in ~/dev/ai/llama-gpt on master✔ 23-08-24 - 3:49:03
╰─⠠⠵

Any idea what I'm doing wrong?

Cannot run on M1 with Oh My Zsh installed

Attempting to run the instructions for an M1 mac results in the following error on my machine whether running in a bash shell or in zsh.

Error: Oh My Zsh can't be loaded from: /bin/bash. You need to run zsh instead.
Here's the process tree:

 PPID   PID COMMAND
    1 33088 /System/Applications/Utilities/Terminal.app/Contents/MacOS/Terminal
33088 33518 login -pf armstrys
33518 33519 -zsh
33519 33594 /bin/bash ./run-mac.sh --model 7b

No way to get 70b model from Umbrel app store?

I've got a machine with 80GB of RAM that I'd like to try the 70B param model on, using the UmbrelOS app if possible. But I don't see a way to get that model.

Good job. But I try to replace the openai model in the langchain. I get the type error.

import os
os.environ["OPENAI_API_BASE"] = "http://mac-pro:3001/v1"
os.environ["OPENAI_API_KEY"] = "xxxx"
index = VectorstoreIndexCreator().from_loaders([loaders])

When I try to use with the langchain. It's throw this error.

Retrying langchain.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised APIError: [{'type': 'string_type', 'loc': ('body', 'input', 'str'), 'msg': 'Input should be a valid string', 'input': [[456, 3635, 3833, 271, 657, 3105, 2427, 418, 271, 12965, 271, 35836, 271, 37, 672, 198, 262, 220, 13817, 271, 12988, 198, 692, 220, 21, 74, 271, 32, 659, 39689, 291, 11, 27258, 11, 13149, 38, 2898, 12970, 6369, 6465, 13, 60720, 555, 445, 81101, 220, 17, 13, 220, 1041, 4, 879, 11, 449, 912, 828, 9564, 701, 3756, 382, 28735, 13, 3635, 3833, 916, 10867, 14, 657, 3105, 2427, 418, 271, 10028, 271, 18836, 5842, 271, 21, 74, 198, 692, 9958, 271, 13817, 198, 692, 74674, 271, 4142, 271, 12988, 271, 35836, 271, 2123, 271, 86928, 198, 692, 220, 605, 271, 37168, 7540, 198, 692, 220, 16, 271, 13245, 271, 31058, 198, 692, 220, 15, 271, 15712, 271, 15841, 2866, 271, 7816, 271, 2123, 271, 86928, 271, 37168, 7540, 271, 13245, 271, 31058, 271, 15712, 271, 15841, 2866, 271, 456, 3635, 3833, 14, 657, 3105, 2427, 418, 271, 2028, 5379, 1587, 539, 9352, 311, 904, 9046, 389, 420, 12827, 11, 323, 1253, 9352, 311, 264, 23243, 4994, 315, 279, 12827, 382, 13943, 271, 17259, 23962, 85560, 271, 13191, 539, 2865, 23962, 271, 24714, 311, 1501, 271, 3052, 2098, 678, 8256, 262, 1670, 271, 860, 682, 23962, 271, 13191, 539, 2865, 9681, 271, 24714, 311, 1501, 271, 3052, 2098, 678, 8256, 262, 1670, 271, 860, 682, 9681, 271, 678, 2736, 304, 1005], [32, 4877, 2736, 6866, 449, 279, 3984, 9046, 836, 13, 9176, 21804, 11545, 4287, 2225, 4877, 323, 9046, 5144, 11, 779, 6968, 420, 9046, 1253, 5353, 16907, 7865, 13, 8886, 499, 2771, 499, 1390, 311, 1893, 420, 9046, 1980, 19, 198, 692, 23962, 271, 17, 198, 286, 9681, 271, 2123, 7361, 7071, 1827, 15152, 7071, 2355, 7071, 1827, 1881, 7361, 2355, 9586, 1827, 78887, 415, 8949, 88179, 1827, 78887, 394, 41661, 28438, 271, 15152, 1084, 27381, 2342, 14211, 2355, 257, 1038, 14211, 15152, 220, 29013, 1038, 14211, 1084, 16554, 310, 62144, 198, 16554, 310, 33195, 40377, 7361, 19124, 78887, 31879, 1084, 15152, 15152, 80326, 1084, 286, 5560, 21804, 477, 28363, 449, 76814, 1701, 279, 3566, 5665, 627, 1084, 80326, 78887, 31879, 1084, 15152, 15152, 80326, 1084, 415, 5664, 5043, 449, 1057, 4033, 40377, 627, 415, 15281, 810, 922, 279, 40377, 627, 1084, 256, 1038, 14211, 1084, 15152, 262, 5377, 449, 33195, 36400, 1432, 78887, 15152, 262, 8745, 57774], [7412, 763, 12948, 14211, 27381, 394, 5321, 198, 394, 1879, 304, 198, 394, 311, 1005, 41661, 28438, 627, 2342, 35683, 220, 24083, 287, 33195, 36400, 14211, 262, 1442, 4400, 8741, 11, 4232, 33195, 36400, 323, 1456, 1578, 627, 2355, 257, 1038, 220, 24083, 287, 33195, 36400, 14211, 262, 1442, 4400, 8741, 11, 4232, 33195, 36400, 323, 1456, 1578, 627, 2355, 257, 1038, 220, 24083, 287, 1630, 1889, 14211, 262, 1442, 4400, 8741, 11, 4232, 1630, 1889, 323, 1456, 1578, 627, 2355, 257, 1038, 2355, 262, 24083, 287, 20796, 19074, 6247, 198, 262, 4718, 14236, 1330, 690, 1825, 3131, 5644, 627, 262, 2684, 574, 264, 3575, 20646, 701, 14236, 1330, 11, 4587, 1456, 1578, 382, 33092, 5379, 271, 18864, 1201, 331, 10796, 969, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 17335, 19416, 18, 271, 22630, 220, 1691, 11, 220, 2366, 18, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 47662, 10691, 271, 17153, 60201, 2611, 42635, 271, 11043, 271, 3976, 47718, 271, 941, 271, 678, 271, 33092, 5379, 1984, 271, 34541, 892, 271, 11267, 36162, 39240, 271, 11313, 5448, 389, 4877, 271, 2113, 271, 52660, 14284, 449, 42754, 220, 16, 13, 17, 271, 36894, 80687, 271, 2261, 67474, 1862, 30183, 972, 696, 6644], [34541, 4211, 5534, 271, 2005, 271, 4387, 330, 8960, 612, 30270, 1, 54048, 14807, 30183, 2148, 696, 33899, 12620, 271, 2261, 662, 13178, 12620, 311, 29262, 13263, 1584, 77697, 271, 33899, 13431, 271, 34541, 4211, 5534, 271, 65468, 22030, 271, 2261, 1862, 369, 220, 1032, 33, 323, 220, 2031, 33, 4211, 11, 29388, 11, 84784, 271, 55775, 22030, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 29748, 66170, 12, 1032, 65, 35036, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 29748, 66170, 12, 2031, 65, 35036, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 29748, 66170, 35036, 271, 41173, 5377, 15836, 81315, 5446, 3622, 271, 43, 81101, 38, 2898, 271, 38513, 271, 4438, 311, 4685, 271, 25718, 445, 81101, 38, 2898, 389, 701, 37781, 3833, 3204, 2162, 3622, 271, 25718, 445, 81101, 38, 2898, 12660, 775, 449, 41649, 271, 25718, 445, 81101, 38, 2898, 449, 67474, 271, 5109, 15836, 18641, 5446, 271, 33, 20345, 15914, 271, 66932, 95063, 445, 81101, 220, 17, 220, 22, 33, 320, 23050, 2735, 2874, 19, 62, 15, 696, 66932, 95063, 445, 81101, 220, 17, 220, 1032, 33, 320, 23050, 2735, 2874, 19, 62, 15, 696, 12452, 445, 81101, 220, 17, 220, 2031, 33, 13149, 320, 23050, 2735, 2874, 19, 62, 15, 696, 58059, 2235, 323, 29820, 271, 91336, 84127, 271, 55775, 22030, 271, 43, 81101, 38, 2898, 271, 32, 659, 39689, 291, 11, 27258, 11, 13149, 38, 2898, 12970, 6369, 6465, 11, 23134, 555, 445, 81101, 220, 17, 13, 220, 1041, 4, 879, 11, 449, 912, 828, 9564, 701, 3756, 627, 1084, 262, 37781, 3833, 916, 320, 906, 2351, 24009, 8, 18796, 38513], [43, 81101, 38, 2898, 17295, 19, 271, 4438, 311, 4685, 271, 25718, 445, 81101, 38, 2898, 389, 701, 37781, 3833, 3204, 2162, 3622, 271, 19527, 445, 81101, 38, 2898, 389, 459, 37781, 3833, 3204, 2162, 3622, 374, 832, 4299, 13, 29524, 4685, 433, 505, 279,

Infinite loop trying to `docker compose up` on intel mac

processor: 2 GHz Quad-Core Intel Core i5

Error message looping:

llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` and ``easy_install``.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         See https://github.com/pypa/setuptools/issues/917 for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   easy_install.initialize_options(self)
llama-gpt-llama-gpt-api-7b-1  | [0/1] Install the project...
llama-gpt-llama-gpt-api-7b-1  | -- Install configuration: "Release"
llama-gpt-llama-gpt-api-7b-1  | -- Up-to-date: /app/_skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  | copying _skbuild/linux-x86_64-3.11/cmake-install/llama_cpp/libllama.so -> llama_cpp/libllama.so
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | running develop
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |         Please avoid running ``setup.py`` directly.
llama-gpt-llama-gpt-api-7b-1  |         Instead, use pypa/build, pypa/installer or other
llama-gpt-llama-gpt-api-7b-1  |         standards-based tools.
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  |         See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
llama-gpt-llama-gpt-api-7b-1  |         ********************************************************************************
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | !!
llama-gpt-llama-gpt-api-7b-1  |   self.initialize_options()
llama-gpt-llama-gpt-api-7b-1  | running egg_info
llama-gpt-llama-gpt-api-7b-1  | writing llama_cpp_python.egg-info/PKG-INFO
llama-gpt-llama-gpt-api-7b-1  | writing dependency_links to llama_cpp_python.egg-info/dependency_links.txt
llama-gpt-llama-gpt-api-7b-1  | writing requirements to llama_cpp_python.egg-info/requires.txt
llama-gpt-llama-gpt-api-7b-1  | writing top-level names to llama_cpp_python.egg-info/top_level.txt
llama-gpt-llama-gpt-api-7b-1  | reading manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | adding license file 'LICENSE.md'
llama-gpt-llama-gpt-api-7b-1  | writing manifest file 'llama_cpp_python.egg-info/SOURCES.txt'
llama-gpt-llama-gpt-api-7b-1  | running build_ext
llama-gpt-llama-gpt-api-7b-1  | Creating /usr/local/lib/python3.11/site-packages/llama-cpp-python.egg-link (link to .)
llama-gpt-llama-gpt-api-7b-1  | llama-cpp-python 0.1.79 is already the active version in easy-install.pth
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Installed /app
llama-gpt-llama-gpt-api-7b-1  | Processing dependencies for llama-cpp-python==0.1.79
llama-gpt-llama-gpt-api-7b-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-7b-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-7b-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for numpy==1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Best match: numpy 1.26.0b1
llama-gpt-llama-gpt-api-7b-1  | Processing numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1  | Adding numpy 1.26.0b1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.26.0b1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-7b-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-7b-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-7b-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-7b-1  | Finished processing dependencies for llama-cpp-python==0.1.79
llama-gpt-llama-gpt-api-7b-1  | Initializing server with:
llama-gpt-llama-gpt-api-7b-1  | Batch size: 2096
llama-gpt-llama-gpt-api-7b-1  | Number of CPU threads: 2
llama-gpt-llama-gpt-api-7b-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-7b-1  | Context window: 4096
llama-gpt-llama-gpt-ui-1      | [INFO  wait] Host [llama-gpt-api-7b:8000] not yet available...
llama-gpt-llama-gpt-api-7b-1  | /usr/local/lib/python3.11/site-packages/pydantic/_internal/_fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-7b-1  |   warnings.warn(
llama-gpt-llama-gpt-api-7b-1  | gguf_init_from_file: invalid magic number 67676a74
llama-gpt-llama-gpt-api-7b-1  | error loading model: llama_model_loader: failed to load model from /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-7b-1  |
llama-gpt-llama-gpt-api-7b-1  | llama_load_model_from_file: failed to load model
llama-gpt-llama-gpt-api-7b-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 198, in _run_module_as_main
llama-gpt-llama-gpt-api-7b-1  |   File "<frozen runpy>", line 88, in _run_code
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/__main__.py", line 46, in <module>
llama-gpt-llama-gpt-api-7b-1  |     app = create_app(settings=settings)
llama-gpt-llama-gpt-api-7b-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-llama-gpt-api-7b-1  |     llama = llama_cpp.Llama(
llama-gpt-llama-gpt-api-7b-1  |             ^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  |   File "/app/llama_cpp/llama.py", line 323, in __init__
llama-gpt-llama-gpt-api-7b-1  |     assert self.model is not None
llama-gpt-llama-gpt-api-7b-1  |            ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-7b-1  | AssertionError
llama-gpt-llama-gpt-api-7b-1 exited with code 1

I am pretty sure the model has finished downloading since the size of the file hasn't changed for several hours. Currently at

3791725184 Aug 25 22:44 llama-2-7b-chat.bin

Crashes on launch

I get the following after running docker compose (7b):

llama-gpt-llama-gpt-api-1  | Processing dependencies for llama-cpp-python==0.1.77
llama-gpt-llama-gpt-api-1  | Searching for diskcache==5.6.1
llama-gpt-llama-gpt-api-1  | Best match: diskcache 5.6.1
llama-gpt-llama-gpt-api-1  | Processing diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  | Adding diskcache 5.6.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/diskcache-5.6.1-py3.11.egg
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Searching for numpy==1.25.1
llama-gpt-llama-gpt-api-1  | Best match: numpy 1.25.1
llama-gpt-llama-gpt-api-1  | Processing numpy-1.25.1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  | Adding numpy 1.25.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  | Installing f2py script to /usr/local/bin
llama-gpt-llama-gpt-api-1  | Installing f2py3 script to /usr/local/bin
llama-gpt-llama-gpt-api-1  | Installing f2py3.11 script to /usr/local/bin
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages/numpy-1.25.1-py3.11-linux-x86_64.egg
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Searching for typing-extensions==4.7.1
llama-gpt-llama-gpt-api-1  | Best match: typing-extensions 4.7.1
llama-gpt-llama-gpt-api-1  | Adding typing-extensions 4.7.1 to easy-install.pth file
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Using /usr/local/lib/python3.11/site-packages
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Finished processing dependencies for llama-cpp-python==0.1.77
llama-gpt-llama-gpt-api-1  | Initializing server with:
llama-gpt-llama-gpt-api-1  | Batch size: 2096
llama-gpt-llama-gpt-api-1  | Number of CPU threads: 4
llama-gpt-llama-gpt-api-1  | Number of GPU layers: 0
llama-gpt-llama-gpt-api-1  | Context window: 4096
llama-gpt-llama-gpt-api-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama_cpp.py", line 67, in _load_shared_library
llama-gpt-llama-gpt-api-1  |     return ctypes.CDLL(str(_lib_path), **cdll_args)
llama-gpt-llama-gpt-api-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/usr/local/lib/python3.11/ctypes/__init__.py", line 376, in __init__
llama-gpt-llama-gpt-api-1  |     self._handle = _dlopen(self._name, mode)
llama-gpt-llama-gpt-api-1  |                    ^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  | OSError: /app/llama_cpp/libllama.so: file too short
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | During handling of the above exception, another exception occurred:
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  | Traceback (most recent call last):
llama-gpt-llama-gpt-api-1  |
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 189, in _run_module_as_main
llama-gpt-llama-gpt-api-1  |   File "<frozen runpy>", line 112, in _get_module_details
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/__init__.py", line 1, in <module>
llama-gpt-llama-gpt-api-1  |     from .llama_cpp import *
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama_cpp.py", line 80, in <module>
llama-gpt-llama-gpt-api-1  |     _lib = _load_shared_library(_lib_base_name)
llama-gpt-llama-gpt-api-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-llama-gpt-api-1  |   File "/app/llama_cpp/llama_cpp.py", line 69, in _load_shared_library
llama-gpt-llama-gpt-api-1  |     raise RuntimeError(f"Failed to load shared library '{_lib_path}': {e}")
llama-gpt-llama-gpt-api-1  | RuntimeError: Failed to load shared library '/app/llama_cpp/libllama.so': /app/llama_cpp/libllama.so: file too short
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-api-1 exited with code 1
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
llama-gpt-llama-gpt-ui-1   | [INFO  wait] Host [llama-gpt-api:8000] not yet available...
^CGracefully stopping... (press Ctrl+C again to force)
Aborting on container exit...

Is there a way to increase the token limit?

I asked llama-gpt what the maximum token limit was and it said 1000, is that right?

Any plans to implement RAG (retrieval augmented generation)?

I think this is a great project when it comes to data privacy and compliance for Organizations looking to adopt Gen AI with LLAMA 2. Adding capabilities for this private GPT to get ingested with Org specific data (like docs,confluence pages) as vectors would take this to next level. let me know if I can contribute. thank you!

[Feature Request] Support for Llama2 Code Interpreter

Is it possible and planned to integrate Llama2 Code Interpreter?

https://github.com/SeungyounShin/Llama2-Code-Interpreter

We may be able to contribute to such an outcome.

Max token input window & completion output

Good day,

Model: 7B

What is the maximum token input window?
How can I set or limit the number of completion tokens?

No matter how many prompt_tokens I supply, I always receive 16 completion tokens back.

   "usage": {
      "prompt_tokens": 191,
      "completion_tokens": 16,
      "total_tokens": 207
   }

warning: failed to mlock 86016-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Running into this mlock issue when using docker-compose up command with the cloned repository docker-compose.yml file. Did a git pull today for the repo.

All I found on the subject is from this thread in another repository:
abetlen/llama-cpp-python#254

I adjusted the server mlock to encompass the memory size limit, and now I tried using "ulimit -l unlimited."

Completely removed and re-built the images using "docker system prune -a --volumes" after making this change and still get the mlock error output on docker-compose up.

Console error output:
llama-gpt-api-7b_1 | /usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:127: UserWarning: Field "model_alias" has conflict with protected namespace "model".
llama-gpt-api-7b_1 |
llama-gpt-api-7b_1 | You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).
llama-gpt-api-7b_1 | warnings.warn(
llama-gpt-api-7b_1 | llama.cpp: loading model from /models/llama-2-7b-chat.bin
llama-gpt-api-7b_1 | llama_model_load_internal: format = ggjt v3 (latest)
llama-gpt-api-7b_1 | llama_model_load_internal: n_vocab = 32000
llama-gpt-api-7b_1 | llama_model_load_internal: n_ctx = 4096
llama-gpt-api-7b_1 | llama_model_load_internal: n_embd = 4096
llama-gpt-api-7b_1 | llama_model_load_internal: n_mult = 5504
llama-gpt-api-7b_1 | llama_model_load_internal: n_head = 32
llama-gpt-api-7b_1 | llama_model_load_internal: n_head_kv = 32
llama-gpt-api-7b_1 | llama_model_load_internal: n_layer = 32
llama-gpt-api-7b_1 | llama_model_load_internal: n_rot = 128
llama-gpt-api-7b_1 | llama_model_load_internal: n_gqa = 1
llama-gpt-api-7b_1 | llama_model_load_internal: rnorm_eps = 5.0e-06
llama-gpt-api-7b_1 | llama_model_load_internal: n_ff = 11008
llama-gpt-api-7b_1 | llama_model_load_internal: freq_base = 10000.0
llama-gpt-api-7b_1 | llama_model_load_internal: freq_scale = 1
llama-gpt-api-7b_1 | llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama-gpt-api-7b_1 | llama_model_load_internal: model size = 7B
llama-gpt-api-7b_1 | llama_model_load_internal: ggml ctx size = 0.08 MB
llama-gpt-api-7b_1 | warning: failed to mlock 86016-byte buffer (after previously locking 0 bytes): Cannot allocate memory
llama-gpt-api-7b_1 | Try increasing RLIMIT_MLOCK ('ulimit -l' as root).
llama-gpt-api-7b_1 | error loading model: llama.cpp: tensor 'layers.30.ffn_norm.weight' is missing from model
llama-gpt-api-7b_1 | llama_load_model_from_file: failed to load model
llama-gpt-api-7b_1 | Traceback (most recent call last):
llama-gpt-api-7b_1 | File "", line 198, in _run_module_as_main
llama-gpt-api-7b_1 | File "", line 88, in _run_code
llama-gpt-api-7b_1 | File "/app/llama_cpp/server/main.py", line 46, in
llama-gpt-api-7b_1 | app = create_app(settings=settings)
llama-gpt-api-7b_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api-7b_1 | File "/app/llama_cpp/server/app.py", line 317, in create_app
llama-gpt-api-7b_1 | llama = llama_cpp.Llama(
llama-gpt-api-7b_1 | ^^^^^^^^^^^^^^^^
llama-gpt-api-7b_1 | File "/app/llama_cpp/llama.py", line 328, in init
llama-gpt-api-7b_1 | assert self.model is not None
llama-gpt-api-7b_1 | ^^^^^^^^^^^^^^^^^^^^^^
llama-gpt-api-7b_1 | AssertionError
llama-gpt_llama-gpt-api-7b_1 exited with code 1

Does not run on MacBookPro 13-inch, M1, 2020

Chip: Apple M1
macOS Ventura 13.0.1

It says
This script is intended to be run on MacOS with M1/M2 chips. Exiting...

But it is a Apple M1 chip.

Error on Macbook Pro with M1 and 64GB (docker compose -f docker-compose-70b.yml up -d)

sing /usr/local/lib/python3.11/site-packages

Finished processing dependencies for llama-cpp-python==0.1.77

Initializing server with:

Batch size: 2096

Number of CPU threads: 8

Number of GPU layers: 0

Context window: 4096

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff863f6200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

/usr/local/lib/python3.11/site-packages/setuptools/command/develop.py:40: EasyInstallDeprecationWarning: easy_install command is deprecated.

    ********************************************************************************

    Please avoid running ``setup.py`` and ``easy_install``.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://github.com/pypa/setuptools/issues/917 for details.

    ********************************************************************************

easy_install.initialize_options(self)

/usr/local/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.

    ********************************************************************************

    Please avoid running ``setup.py`` directly.

    Instead, use pypa/build, pypa/installer or other

    standards-based tools.


    See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.

    ********************************************************************************

self.initialize_options()

/usr/local/lib/python3.11/site-packages/pydantic/_internal/fields.py:126: UserWarning: Field "model_alias" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = ('settings_',).

warnings.warn(

llama.cpp: loading model from /models/llama-2-70b-chat.bin

llama_model_load_internal: format = ggjt v3 (latest)

llama_model_load_internal: n_vocab = 32000

llama_model_load_internal: n_ctx = 4096

llama_model_load_internal: n_embd = 8192

llama_model_load_internal: n_mult = 4096

llama_model_load_internal: n_head = 64

llama_model_load_internal: n_head_kv = 64

llama_model_load_internal: n_layer = 80

llama_model_load_internal: n_rot = 128

llama_model_load_internal: n_gqa = 1

llama_model_load_internal: rnorm_eps = 1.0e-06

llama_model_load_internal: n_ff = 24576

llama_model_load_internal: freq_base = 10000.0

llama_model_load_internal: freq_scale = 1

llama_model_load_internal: ftype = 2 (mostly Q4_0)

llama_model_load_internal: model size = 65B

llama_model_load_internal: ggml ctx size = 0.21 MB

warning: failed to mlock 221184-byte buffer (after previously locking 0 bytes): Cannot allocate memory

Try increasing RLIMIT_MLOCK ('ulimit -l' as root).

error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024

llama_load_model_from_file: failed to load model

Traceback (most recent call last):

File "", line 198, in _run_module_as_main

File "", line 88, in _run_code

File "/app/llama_cpp/server/main.py", line 46, in

app = create_app(settings=settings)

      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/llama_cpp/server/app.py", line 313, in create_app

llama = llama_cpp.Llama(

        ^^^^^^^^^^^^^^^^

File "/app/llama_cpp/llama.py", line 313, in init

assert self.model is not None

       ^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Exception ignored in: <function Llama.del at 0xffff9187a200>

Traceback (most recent call last):

File "/app/llama_cpp/llama.py", line 1510, in del

if self.ctx is not None:

   ^^^^^^^^

AttributeError: 'Llama' object has no attribute 'ctx'

getumbrel / llama-gpt Goto Github PK

llama-gpt's People

Contributors

Stargazers

Watchers

Forkers

llama-gpt's Issues

System specs

Dockerfile:8

6 | ARG MODEL_DOWNLOAD_URL=https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GGML/resolve/main/nous-hermes-llama-2-7b.ggmlv3.q4_0.bin 7 | 8 | >>> FROM ${IMAGE} 9 | 10 | ARG MODEL_FILE

Recommend Projects

Recommend Topics

Recommend Org

Jobs

6 | ARG MODEL_DOWNLOAD_URL=https://huggingface.co/TheBloke/Nous-Hermes-Llama-2-7B-GGML/resolve/main/nous-hermes-llama-2-7b.ggmlv3.q4_0.bin
7 |
8 | >>> FROM ${IMAGE}
9 |
10 | ARG MODEL_FILE