GithubHelp home page GithubHelp logo

Comments (17)

telugu-boy avatar telugu-boy commented on June 4, 2024 3

any update please. keeps spaming make: *** No rule to make target 'build'.

from llama-gpt.

allan-null avatar allan-null commented on June 4, 2024 1

@Wh1t3Fox Sure. I just noticed the Makefile error still appears, but things just work. Literally just removed all docker containers and cloned the repo again just to get this output:

~/Programs/llama-gpt$ ./run.sh --with-cuda
No model value provided. Defaulting to 7b. If you want to change the model, exit the script and use --model to provide the model value.
Supported models are 7b, 13b, 70b, code-7b, code-13b, code-34b.
[+] Building 1.4s (30/30) FINISHED                                                                                                                                                                  docker:default
 => [llama-gpt-ui internal] load build definition from Dockerfile                                                                                                                                             0.0s
 => => transferring dockerfile: 859B                                                                                                                                                                          0.0s
 => [llama-gpt-ui internal] load .dockerignore                                                                                                                                                                0.0s
 => => transferring context: 82B                                                                                                                                                                              0.0s
 => [llama-gpt-api-cuda-ggml internal] load .dockerignore                                                                                                                                                     0.0s
 => => transferring context: 2B                                                                                                                                                                               0.0s
 => [llama-gpt-api-cuda-ggml internal] load build definition from ggml.Dockerfile                                                                                                                             0.0s
 => => transferring dockerfile: 958B                                                                                                                                                                          0.0s
 => [llama-gpt-ui internal] load metadata for ghcr.io/ufoscout/docker-compose-wait:latest                                                                                                                     0.7s
 => [llama-gpt-ui internal] load metadata for docker.io/library/node:19-alpine                                                                                                                                1.2s
 => [llama-gpt-api-cuda-ggml internal] load metadata for docker.io/nvidia/cuda:12.1.1-devel-ubuntu22.04                                                                                                       1.2s
 => [llama-gpt-api-cuda-ggml 1/5] FROM docker.io/nvidia/cuda:12.1.1-devel-ubuntu22.04@sha256:7012e535a47883527d402da998384c30b936140c05e2537158c80b8143ee7425                                                 0.0s
 => [llama-gpt-api-cuda-ggml internal] load build context                                                                                                                                                     0.0s
 => => transferring context: 3.62kB                                                                                                                                                                           0.0s
 => [llama-gpt-ui base 1/3] FROM docker.io/library/node:19-alpine@sha256:8ec543d4795e2e85af924a24f8acb039792ae9fe8a42ad5b4bf4c277ab34b62e                                                                     0.0s
 => [llama-gpt-ui internal] load build context                                                                                                                                                                0.1s
 => => transferring context: 1.28MB                                                                                                                                                                           0.0s
 => [llama-gpt-ui] FROM ghcr.io/ufoscout/docker-compose-wait:latest@sha256:ee1b58447dcf9ae2aaf84e5904ffc00ed5a983bf986535b19aeb6f2d4a7ceb8a                                                                   0.0s
 => CACHED [llama-gpt-api-cuda-ggml 2/5] RUN apt-get update && apt-get upgrade -y     && apt-get install -y git build-essential     python3 python3-pip gcc wget     ocl-icd-opencl-dev opencl-headers clinf  0.0s
 => CACHED [llama-gpt-api-cuda-ggml 3/5] COPY . .                                                                                                                                                             0.0s
 => CACHED [llama-gpt-api-cuda-ggml 4/5] RUN python3 -m pip install --upgrade pip pytest cmake scikit-build setuptools fastapi uvicorn sse-starlette pydantic-settings                                        0.0s
 => CACHED [llama-gpt-api-cuda-ggml 5/5] RUN CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.78                                                                                0.0s
 => [llama-gpt-api-cuda-ggml] exporting to image                                                                                                                                                              0.0s
 => => exporting layers                                                                                                                                                                                       0.0s
 => => writing image sha256:66bb4b0e40422bc1e57962061a253e7354b1673361d6f82f89dc521992b47272                                                                                                                  0.0s
 => => naming to docker.io/library/llama-gpt-llama-gpt-api-cuda-ggml                                                                                                                                          0.0s
 => CACHED [llama-gpt-ui base 2/3] WORKDIR /app                                                                                                                                                               0.0s
 => CACHED [llama-gpt-ui base 3/3] COPY package*.json ./                                                                                                                                                      0.0s
 => CACHED [llama-gpt-ui dependencies 1/1] RUN npm ci                                                                                                                                                         0.0s
 => CACHED [llama-gpt-ui production 3/9] COPY --from=dependencies /app/node_modules ./node_modules                                                                                                            0.0s
 => CACHED [llama-gpt-ui build 1/2] COPY . .                                                                                                                                                                  0.0s
 => CACHED [llama-gpt-ui build 2/2] RUN npm run build                                                                                                                                                         0.0s
 => CACHED [llama-gpt-ui production 4/9] COPY --from=build /app/.next ./.next                                                                                                                                 0.0s
 => CACHED [llama-gpt-ui production 5/9] COPY --from=build /app/public ./public                                                                                                                               0.0s
 => CACHED [llama-gpt-ui production 6/9] COPY --from=build /app/package*.json ./                                                                                                                              0.0s
 => CACHED [llama-gpt-ui production 7/9] COPY --from=build /app/next.config.js ./next.config.js                                                                                                               0.0s
 => CACHED [llama-gpt-ui production 8/9] COPY --from=build /app/next-i18next.config.js ./next-i18next.config.js                                                                                               0.0s
 => CACHED [llama-gpt-ui production 9/9] COPY --from=ghcr.io/ufoscout/docker-compose-wait:latest /wait /wait                                                                                                  0.0s
 => [llama-gpt-ui] exporting to image                                                                                                                                                                         0.0s
 => => exporting layers                                                                                                                                                                                       0.0s
 => => writing image sha256:54f25f18841b9f9e211026f055d2acd5d7400cd229148d550b390c13c71b2f58                                                                                                                  0.0s
 => => naming to docker.io/library/llama-gpt-llama-gpt-ui                                                                                                                                                     0.0s
[+] Running 2/2
 ✔ Container llama-gpt-llama-gpt-api-cuda-ggml-1  Created                                                                                                                                                     0.1s 
 ✔ Container llama-gpt-llama-gpt-ui-1             Created                                                                                                                                                     0.1s 
Attaching to llama-gpt-llama-gpt-api-cuda-ggml-1, llama-gpt-llama-gpt-ui-1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait]  docker-compose-wait 2.12.1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] ---------------------------
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] Starting with configuration:
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Hosts to be waiting for: [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Paths to be waiting for: []
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Timeout before failure: 3600 seconds 
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - TCP connection timeout before retry: 5 seconds 
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time before checking for hosts/paths availability: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time once all hosts/paths are available: 0 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait]  - Sleeping time between retries: 1 seconds
llama-gpt-llama-gpt-ui-1             | [DEBUG wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Checking availability of host [llama-gpt-api-cuda-ggml:8000]
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  | == CUDA ==
llama-gpt-llama-gpt-api-cuda-ggml-1  | ==========
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | CUDA Version 12.1.1
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
llama-gpt-llama-gpt-api-cuda-ggml-1  | By pulling and using the container, you accept the terms and conditions of this license:
llama-gpt-llama-gpt-api-cuda-ggml-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | Model file not found. Downloading...
llama-gpt-llama-gpt-api-cuda-ggml-1  | curl is not installed. Installing...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
llama-gpt-llama-gpt-api-cuda-ggml-1  | Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:4 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [109 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1576 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1304 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Fetched 3217 kB in 2s (1546 kB/s)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading package lists...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading package lists...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Building dependency tree...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Reading state information...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | The following additional packages will be installed:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   libcurl4
llama-gpt-llama-gpt-api-cuda-ggml-1  | The following NEW packages will be installed:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   curl libcurl4
llama-gpt-llama-gpt-api-cuda-ggml-1  | 0 upgraded, 2 newly installed, 0 to remove and 2 not upgraded.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Need to get 483 kB of archives.
llama-gpt-llama-gpt-api-cuda-ggml-1  | After this operation, 1260 kB of additional disk space will be used.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:1 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libcurl4 amd64 7.81.0-1ubuntu1.15 [289 kB]
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 curl amd64 7.81.0-1ubuntu1.15 [194 kB]
llama-gpt-llama-gpt-api-cuda-ggml-1  | debconf: delaying package configuration, since apt-utils is not installed
llama-gpt-llama-gpt-api-cuda-ggml-1  | Fetched 483 kB in 1s (403 kB/s)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Selecting previously unselected package libcurl4:amd64.
(Reading database ... 18739 files and directories currently installed.)
llama-gpt-llama-gpt-api-cuda-ggml-1  | Preparing to unpack .../libcurl4_7.81.0-1ubuntu1.15_amd64.deb ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Unpacking libcurl4:amd64 (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Selecting previously unselected package curl.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Preparing to unpack .../curl_7.81.0-1ubuntu1.15_amd64.deb ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Unpacking curl (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Setting up libcurl4:amd64 (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Setting up curl (7.81.0-1ubuntu1.15) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  | Processing triggers for libc-bin (2.35-0ubuntu3.5) ...
llama-gpt-llama-gpt-api-cuda-ggml-1  |   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
llama-gpt-llama-gpt-api-cuda-ggml-1  |                                  Dload  Upload   Total   Spent    Left  Speed
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
100  1258  100  1258    0     0   2274      0 --:--:-- --:--:-- --:--:--  2278
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
100 3616M  100 3616M    0     0  21.7M      0  0:02:45  0:02:45 --:--:-- 20.3M
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 12
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | ggml_init_cublas: found 1 CUDA devices:
llama-gpt-llama-gpt-api-cuda-ggml-1  |   Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | /usr/local/lib/python3.10/dist-packages/pydantic/_internal/_fields.py:149: UserWarning: Field "model_alias" has conflict with protected namespace "model_".
llama-gpt-llama-gpt-api-cuda-ggml-1  | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ('settings_',)`.
llama-gpt-llama-gpt-api-cuda-ggml-1  |   warnings.warn(
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama.cpp: loading model from /models/llama-2-7b-chat.bin
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: format     = ggjt v3 (latest)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_vocab    = 32000
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_ctx      = 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_embd     = 4096
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_mult     = 5504
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_head     = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_head_kv  = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_layer    = 32
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_rot      = 128
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_gqa      = 1
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: rnorm_eps  = 5.0e-06
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: n_ff       = 11008
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: freq_base  = 10000.0
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: freq_scale = 1
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: ftype      = 2 (mostly Q4_0)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: model size = 7B
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: ggml ctx size =    0.08 MB
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: using CUDA for GPU acceleration
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: mem required  = 3055.79 MB (+ 2048.00 MB per state)
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: allocating batch_size x (512 kB + n_ctx x 128 B) = 512 MB VRAM for the scratch buffer
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: offloading 10 repeating layers to GPU
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: offloaded 10/35 layers to GPU
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_model_load_internal: total VRAM used: 1598 MB
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-api-cuda-ggml-1  | llama_new_context_with_model: kv self size  = 2048.00 MB
llama-gpt-llama-gpt-api-cuda-ggml-1  | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Started server process [1]
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Waiting for application startup.
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Application startup complete.
llama-gpt-llama-gpt-api-cuda-ggml-1  | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] is now available!
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | [INFO  wait] docker-compose-wait - Everything's fine, the application can now start!
llama-gpt-llama-gpt-ui-1             | [INFO  wait] --------------------------------------------------------
llama-gpt-llama-gpt-ui-1             | 
llama-gpt-llama-gpt-ui-1             | > [email protected] start
llama-gpt-llama-gpt-ui-1             | > next start
llama-gpt-llama-gpt-ui-1             | 
llama-gpt-llama-gpt-ui-1             | ready - started server on 0.0.0.0:3000, url: http://localhost:3000

from llama-gpt.

Wh1t3Fox avatar Wh1t3Fox commented on June 4, 2024 1

For anyone here having issues, update ARG CUDA_IMAGE="12.3.1-devel-ubuntu22.04" to match the version you have running on your system, Default is 12.1.1.

❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

from llama-gpt.

Wh1t3Fox avatar Wh1t3Fox commented on June 4, 2024 1

@tstechnologies the makefile error will always be there as there is not a file. I'm not sure if it is actually meant to be there or not. I also have not been able to get this working on Arch and started using ollama instead which was been flawless. It definitely appears to related to deps and drivers.

You'll notice the loop happening because the cuda image keeps crashing when the server attempts to start

from llama-gpt.

tstechnologies avatar tstechnologies commented on June 4, 2024 1

@tstechnologies the makefile error will always be there as there is not a file. I'm not sure if it is actually meant to be there or not. I also have not been able to get this working on Arch and started using ollama instead which was been flawless. It definitely appears to related to deps and drivers.

You'll notice the loop happening because the cuda image keeps crashing when the server attempts to start

BIG UPS for that recommendation holy moly. ollama-webui has been the solution I've been seeking

from llama-gpt.

StephenFacente avatar StephenFacente commented on June 4, 2024

I'm seeing the same thing, running WSL Ubuntu

2023-11-14 16:18:42 ==========
2023-11-14 16:18:42 == CUDA ==
2023-11-14 16:18:42 ==========
2023-11-14 16:18:42 
2023-11-14 16:18:42 CUDA Version 12.1.1
2023-11-14 16:18:42 
2023-11-14 16:18:42 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-11-14 16:18:42 
2023-11-14 16:18:42 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-11-14 16:18:42 By pulling and using the container, you accept the terms and conditions of this license:
2023-11-14 16:18:42 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-11-14 16:18:42 
2023-11-14 16:18:42 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-11-14 16:18:42 
2023-11-14 16:18:43 /models/llama-2-7b-chat.bin model found.
2023-11-14 16:18:43 Initializing server with:
2023-11-14 16:18:43 Batch size: 1024
2023-11-14 16:18:43 Number of CPU threads: 7
2023-11-14 16:18:43 Number of GPU layers: 10
2023-11-14 16:18:43 Context window: 4096
2023-11-14 16:17:34 make: *** No rule to make target 'build'.  Stop.
2023-11-14 16:18:43 make: *** No rule to make target 'build'.  Stop.

from llama-gpt.

danielbeast avatar danielbeast commented on June 4, 2024

Also seeing the same issue on ubuntu server 22.04.3

llama-gpt-api-cuda-ggml_1 | ==========
llama-gpt-api-cuda-ggml_1 | == CUDA ==
llama-gpt-api-cuda-ggml_1 | ==========
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | CUDA Version 12.1.1
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
llama-gpt-api-cuda-ggml_1 | By pulling and using the container, you accept the terms and conditions of this license:
llama-gpt-api-cuda-ggml_1 | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | /models/llama-2-7b-chat.bin model found.
llama-gpt-api-cuda-ggml_1 | make: *** No rule to make target 'build'. Stop.
llama-gpt-api-cuda-ggml_1 | Initializing server with:
llama-gpt-api-cuda-ggml_1 | Batch size: 2096
llama-gpt-api-cuda-ggml_1 | Number of CPU threads: 32
llama-gpt-api-cuda-ggml_1 | Number of GPU layers: 10
llama-gpt-api-cuda-ggml_1 | Context window: 4096
llama-gpt-ui_1 | [INFO wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-api-cuda-ggml_1 |
llama-gpt-api-cuda-ggml_1 | /models/llama-2-7b-chat.bin model found.
llama-gpt-api-cuda-ggml_1 | make: *** No rule to make target 'build'. Stop.
llama-gpt-api-cuda-ggml_1 | Initializing server with:
llama-gpt-api-cuda-ggml_1 | Batch size: 2096
llama-gpt-api-cuda-ggml_1 | Number of CPU threads: 32
llama-gpt-api-cuda-ggml_1 | Number of GPU layers: 10
llama-gpt-api-cuda-ggml_1 | Context window: 4096
llama-gpt-ui_1 | [INFO wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt_llama-gpt-api-cuda-ggml_1 exited with code 132

from llama-gpt.

BeardedTek avatar BeardedTek commented on June 4, 2024

same for me here.

==========
== CUDA ==
==========

CUDA Version 12.1.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

/models/code-llama-13b-chat.gguf model found.
make: *** No rule to make target 'build'.  Stop.
Initializing server with:
Batch size: 2096
Number of CPU threads: 16
Number of GPU layers: 10
Context window: 4096

from llama-gpt.

tstechnologies avatar tstechnologies commented on June 4, 2024

Seeing the same on my end with 13b:

llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-13b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 4
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 132
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...

from llama-gpt.

allan-null avatar allan-null commented on June 4, 2024

Hoping this might help somebody: I was having the same issue, then I updated my NVIDIA drivers and it started working.

from llama-gpt.

tstechnologies avatar tstechnologies commented on June 4, 2024

Hoping this might help somebody: I was having the same issue, then I updated my NVIDIA drivers and it started working.

which driver version did you go to and which model did you get to run?

from llama-gpt.

BeardedTek avatar BeardedTek commented on June 4, 2024

Also, which kernel, distro, etc?

from llama-gpt.

Wh1t3Fox avatar Wh1t3Fox commented on June 4, 2024

@allan-null this makes no sense as there is not a physical Makefile in the folder. If it's working for you can you provide us with whatever it's using for the build, output, etc.

from llama-gpt.

allan-null avatar allan-null commented on June 4, 2024

Also, which kernel, distro, etc?
which driver version did you go to and which model did you get to run?

Ubuntu 22.04 64-bit
Kernel 6.2.0
Nvidia GTX 1070 mobile
Nvidia drivers 535.129
Master branch
Default model

@allan-null this makes no sense as there is not a physical Makefile in the folder. If it's working for you can you provide us with whatever it's using for the build, output, etc.

I cloned the repo and then simply ran ./run.sh --with-cuda. After everything started working I started messing with the variable n_gpu_layers on the file cuda/run.sh, so the project could better use my GPU.

from llama-gpt.

Wh1t3Fox avatar Wh1t3Fox commented on June 4, 2024

@allan-null can you provide us with the output of docker-compose?

from llama-gpt.

Wh1t3Fox avatar Wh1t3Fox commented on June 4, 2024

I suppose it's running because of:

# Run the server
exec python3 -m llama_cpp.server --n_ctx $n_ctx --n_threads $n_threads --n_gpu_layers $n_gpu_layers --n_batch $n_batch

In my instance (i've done exactly the same as you) on Archlinux, llama_cpp.server does not exist and causes a crash.

Edit: Additionally trying to run the docker container as standalone the application segfaults and returns an exit code 0. Perhaps it is something related to drivers and different versions

from llama-gpt.

tstechnologies avatar tstechnologies commented on June 4, 2024

Sorry gents but i think this may still be bugged:

Ubuntu 22.04
Nvidia drivers: 545
Cuda Version: 12.3.1 (made change to ./llama-gpt/cuda/*.DockerFile per @Wh1t3Fox suggestion above)
Model: 13b

Same make: *** No rule to make target 'build'. Stop. and loop:

llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | CUDA Version 12.3.1
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
llama-gpt-llama-gpt-api-cuda-ggml-1  | By pulling and using the container, you accept the terms and conditions of this license:
llama-gpt-llama-gpt-api-cuda-ggml-1  | https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
llama-gpt-llama-gpt-api-cuda-ggml-1  |
llama-gpt-llama-gpt-api-cuda-ggml-1  | /models/llama-2-13b-chat.bin model found.
llama-gpt-llama-gpt-api-cuda-ggml-1  | make: *** No rule to make target 'build'.  Stop.
llama-gpt-llama-gpt-api-cuda-ggml-1  | Initializing server with:
llama-gpt-llama-gpt-api-cuda-ggml-1  | Batch size: 2096
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of CPU threads: 8
llama-gpt-llama-gpt-api-cuda-ggml-1  | Number of GPU layers: 10
llama-gpt-llama-gpt-api-cuda-ggml-1  | Context window: 4096
llama-gpt-llama-gpt-api-cuda-ggml-1 exited with code 132
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...
llama-gpt-llama-gpt-ui-1             | [INFO  wait] Host [llama-gpt-api-cuda-ggml:8000] not yet available...

Tried letting it run in that loop for about 8 hours no dice

Also several dependencies are needed specifically for Ubuntu might be worth adding this to the readme at some point when this is working. Happy to submit a commit/pull request with instructions once resolved.

from llama-gpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.