GithubHelp home page GithubHelp logo

segfault running rag.py about llmware HOT 9 CLOSED

narimantos avatar narimantos commented on May 14, 2024
segfault running rag.py

from llmware.

Comments (9)

turnham avatar turnham commented on May 14, 2024 1

Yes sorry I can't be of more help yet. We've tested on native Linux (x86_64 and arm64) but have not had bandwidth yet to test on WSL. So there may be something unique there going on that we haven't encountered yet.

We are currently working on native Windows support and can add WSL testing/validation to the list of items. Is there anything special about your self-compiled kernel or do you think testing on vanilla WSL2 should be sufficient?

from llmware.

narimantos avatar narimantos commented on May 14, 2024 1

Thanks you so much for your time!
I think the vanilla kernel will be sufficient.

from llmware.

turnham avatar turnham commented on May 14, 2024

Hi, yes see the discussion here for the fix/workaround: #48

from llmware.

narimantos avatar narimantos commented on May 14, 2024

Thanks for the fast response, but after running the ulimit -s 32768000 i'm getting following output:

OpenBLAS blas_thread_init: pthread_create failed for thread 1 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 2 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 3 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 4 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 5 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 6 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 7 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 8 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 9 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 10 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 11 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 12 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 13 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 14 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
OpenBLAS blas_thread_init: pthread_create failed for thread 15 of 16: Resource temporarily unavailable
OpenBLAS blas_thread_init: RLIMIT_NPROC 63943 current, 63943 max
Traceback (most recent call last):
  File "/home/nari/dev/llmware/examples/rag.py", line 3, in <module>
    from llmware.library import Library
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/llmware/library.py", line 24, in <module>
    from llmware.util import Utilities, Graph
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/llmware/util.py", line 29, in <module>
    import numpy as np
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/numpy/__init__.py", line 130, in <module>
    from numpy.__config__ import show as show_config
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/numpy/__config__.py", line 4, in <module>
    from numpy.core._multiarray_umath import (
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/numpy/core/__init__.py", line 24, in <module>
    from . import multiarray
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/numpy/core/multiarray.py", line 10, in <module>
    from . import overrides
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/numpy/core/overrides.py", line 8, in <module>
    from numpy.core._multiarray_umath import (
  File "<frozen importlib._bootstrap>", line 203, in _lock_unlock_module
KeyboardInterrupt

tried to set my processes to unlimited ulimit -u unlimited but, i think its already at its max.

$ ulimit -a

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             32768000
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       63943
-n: file descriptors                1048576
-l: locked-in-memory size (kbytes)  65536
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 63943
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited

from llmware.

turnham avatar turnham commented on May 14, 2024

Sorry about that. We don't see that in our testing. I did a quick google and based on the answer here:
https://stackoverflow.com/questions/52026652/openblas-blas-thread-init-pthread-create-resource-temporarily-unavailable

Can you try adding this to your test code to see if that works around the issue (if so, we'll look at adding it to our initialization code):

import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'

Also can you share some specifics on the system you are testing on and the output from:

uname -a

from llmware.

narimantos avatar narimantos commented on May 14, 2024

by placing

import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'

At the top of the rag.py i get the following:

Traceback (most recent call last):
  File "/home/nari/dev/llmware/examples/rag.py", line 46, in <module>
    end_to_end_rag()
  File "/home/nari/dev/llmware/examples/rag.py", line 17, in end_to_end_rag
    library = Library().create_new_library("Agreements")
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/llmware/library.py", line 82, in __init__
    if not check_db_uri(timeout_secs=3):
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/llmware/resources.py", line 73, in check_db_uri
    client = MongoClient(uri_string, unicode_decode_error_handler='ignore')
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/mongo_client.py", line 861, in __init__
    self._get_topology()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1238, in _get_topology
    self._topology.open()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/topology.py", line 201, in open
    self._ensure_opened()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/topology.py", line 631, in _ensure_opened
    self._update_servers()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/topology.py", line 781, in _update_servers
    server.open()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/server.py", line 70, in open
    self._monitor.open()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/monitor.py", line 87, in open
    self._executor.open()
  File "/home/nari/dev/llmware/venv/lib/python3.9/site-packages/pymongo/periodic_executor.py", line 95, in open
    thread.start()
  File "/usr/lib/python3.9/threading.py", line 899, in start
    _start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread

output uname -a:
Linux Nari-pc 5.15.133.1-microsoft-standard+ #3 SMP Wed Nov 1 16:18:04 CET 2023 x86_64 x86_64 x86_64 GNU/Linux
i'm currently on a self compiled kernel for WSL2, branch linux-msft-wsl-5.15.y

from llmware.

narimantos avatar narimantos commented on May 14, 2024

After running the rag.py I get the following trying to restart the docker-compose:

docker compose kill   
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0x7f81d6f749fc m=0 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: g 0: unknown pc 0x7f81d6f749fc
stack: frame={sp:0x7ffcfbb03530, fp:0x0} stack=[0x7ff52bb04ab0,0x7ffcfbb03ac0)
0x00007ffcfbb03430:  0x000055ead9e652c0 <runtime.mmap.func1+0x0000000000000000>  0x00007ffcfbb03428 
0x00007ffcfbb03440:  0x00007f81b1d7b000  0x0000000000407000 
0x00007ffcfbb03450:  0x0000003200000003  0x00000000ffffffff 
0x00007ffcfbb03460:  0x00007ffcfbb034a8  0x000055ead9e797d9 <runtime.sysMapOS+0x0000000000000039> 
0x00007ffcfbb03470:  0x00007f81afc66000  0x000055ead9e652c0 <runtime.mmap.func1+0x0000000000000000> 
0x00007ffcfbb03480:  0x000055ead9e79765 <runtime.sysHugePageOS+0x0000000000000065>  0x00007f81b1e00000 
0x00007ffcfbb03490:  0x0000000000200000  0x000000000000000e 
0x00007ffcfbb034a0:  0x00007f8100000000  0x00007ffcfbb034d8 
0x00007ffcfbb034b0:  0x000055ead9e79667 <runtime.sysUsedOS+0x0000000000000027>  0x000055eada8fba10 
0x00007ffcfbb034c0:  0x0000000000407000  0x000055ead9ecf2be <runtime.callCgoMmap+0x000000000000003e> 
0x00007ffcfbb034d0:  0x00007f81d6f6f870  0x000000001c000004 
0x00007ffcfbb034e0:  0x00007f81d6f20520  0x0000000000000000 
0x00007ffcfbb034f0:  0x000055ead9e8a545 <runtime.(*mheap).alloc.func1+0x0000000000000065>  0x000055eadb507a80 
0x00007ffcfbb03500:  0x0000000000000001  0x0000000000000000 
0x00007ffcfbb03510:  0x0000000000000000  0x0000000000000000 
0x00007ffcfbb03520:  0x0000000000000000  0x00007f81d6f749ee 
0x00007ffcfbb03530: <0x00007ffcfbb03788  0x000055ead9e8a485 <runtime.(*mheap).alloc+0x0000000000000065> 
0x00007ffcfbb03540:  0x00007ffcfbb03760  0x00007f81d6ea7588 
0x00007ffcfbb03550:  0x000055ead9e8a4e0 <runtime.(*mheap).alloc.func1+0x0000000000000000>  0x000055eadb507a80 
0x00007ffcfbb03560:  0x0000000000000001  0x0000000000000003 
0x00007ffcfbb03570:  0x00070c0038600000  0x0000000000000007 
0x00007ffcfbb03580:  0x00007ffcfbb03640  0x0000000000000012 
0x00007ffcfbb03590:  0x0000000000000003  0x0000000000000000 
0x00007ffcfbb035a0:  0x0000000000002030  0x0000000000002031 
0x00007ffcfbb035b0:  0x00007f81d6dfbc00  0xc7ef563063d51f00 
0x00007ffcfbb035c0:  0x00007f81d6edb740  0x0000000000000006 
0x00007ffcfbb035d0:  0x000055eadaad3b94  0x00007ffcfbb03900 
0x00007ffcfbb035e0:  0x000055eadb4ee580  0x00007f81d6f20476 
0x00007ffcfbb035f0:  0x00007f81d70f8e90  0x00007f81d6f067f3 
0x00007ffcfbb03600:  0x0000000000000020  0xc7ef563063d51f00 
0x00007ffcfbb03610:  0x00007ffcfbb03640  0x00007f81d6f73881 
0x00007ffcfbb03620:  0x0000000000000000  0x00007ffcfbb03840 
runtime: g 0: unknown pc 0x7f81d6f749fc
stack: frame={sp:0x7ffcfbb03530, fp:0x0} stack=[0x7ff52bb04ab0,0x7ffcfbb03ac0)
0x00007ffcfbb03430:  0x000055ead9e652c0 <runtime.mmap.func1+0x0000000000000000>  0x00007ffcfbb03428 
0x00007ffcfbb03440:  0x00007f81b1d7b000  0x0000000000407000 
0x00007ffcfbb03450:  0x0000003200000003  0x00000000ffffffff 
0x00007ffcfbb03460:  0x00007ffcfbb034a8  0x000055ead9e797d9 <runtime.sysMapOS+0x0000000000000039> 
0x00007ffcfbb03470:  0x00007f81afc66000  0x000055ead9e652c0 <runtime.mmap.func1+0x0000000000000000> 
0x00007ffcfbb03480:  0x000055ead9e79765 <runtime.sysHugePageOS+0x0000000000000065>  0x00007f81b1e00000 
0x00007ffcfbb03490:  0x0000000000200000  0x000000000000000e 
0x00007ffcfbb034a0:  0x00007f8100000000  0x00007ffcfbb034d8 
0x00007ffcfbb034b0:  0x000055ead9e79667 <runtime.sysUsedOS+0x0000000000000027>  0x000055eada8fba10 
0x00007ffcfbb034c0:  0x0000000000407000  0x000055ead9ecf2be <runtime.callCgoMmap+0x000000000000003e> 
0x00007ffcfbb034d0:  0x00007f81d6f6f870  0x000000001c000004 
0x00007ffcfbb034e0:  0x00007f81d6f20520  0x0000000000000000 
0x00007ffcfbb034f0:  0x000055ead9e8a545 <runtime.(*mheap).alloc.func1+0x0000000000000065>  0x000055eadb507a80 
0x00007ffcfbb03500:  0x0000000000000001  0x0000000000000000 
0x00007ffcfbb03510:  0x0000000000000000  0x0000000000000000 
0x00007ffcfbb03520:  0x0000000000000000  0x00007f81d6f749ee 
0x00007ffcfbb03530: <0x00007ffcfbb03788  0x000055ead9e8a485 <runtime.(*mheap).alloc+0x0000000000000065> 
0x00007ffcfbb03540:  0x00007ffcfbb03760  0x00007f81d6ea7588 
0x00007ffcfbb03550:  0x000055ead9e8a4e0 <runtime.(*mheap).alloc.func1+0x0000000000000000>  0x000055eadb507a80 
0x00007ffcfbb03560:  0x0000000000000001  0x0000000000000003 
0x00007ffcfbb03570:  0x00070c0038600000  0x0000000000000007 
0x00007ffcfbb03580:  0x00007ffcfbb03640  0x0000000000000012 
0x00007ffcfbb03590:  0x0000000000000003  0x0000000000000000 
0x00007ffcfbb035a0:  0x0000000000002030  0x0000000000002031 
0x00007ffcfbb035b0:  0x00007f81d6dfbc00  0xc7ef563063d51f00 
0x00007ffcfbb035c0:  0x00007f81d6edb740  0x0000000000000006 
0x00007ffcfbb035d0:  0x000055eadaad3b94  0x00007ffcfbb03900 
0x00007ffcfbb035e0:  0x000055eadb4ee580  0x00007f81d6f20476 
0x00007ffcfbb035f0:  0x00007f81d70f8e90  0x00007f81d6f067f3 
0x00007ffcfbb03600:  0x0000000000000020  0xc7ef563063d51f00 
0x00007ffcfbb03610:  0x00007ffcfbb03640  0x00007f81d6f73881 
0x00007ffcfbb03620:  0x0000000000000000  0x00007ffcfbb03840 

goroutine 1 [running]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:463 fp=0xc00008c780 sp=0xc00008c778 pc=0x55ead9ecb200
runtime.main()
        /usr/local/go/src/runtime/proc.go:170 +0x6d fp=0xc00008c7e0 sp=0xc00008c780 pc=0x55ead9e9bced
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00008c7e8 sp=0xc00008c7e0 pc=0x55ead9ecd4a1

rax    0x0
rbx    0x7f81d6edb740
rcx    0x7f81d6f749fc
rdx    0x6
rdi    0xff0
rsi    0xff0
rbp    0xff0
rsp    0x7ffcfbb03530
r8     0x7ffcfbb03600
r9     0x7fffffff
r10    0x8
r11    0x246
r12    0x6
r13    0x16
r14    0x55eadb4ee580
r15    0x1
rip    0x7f81d6f749fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

from llmware.

turnham avatar turnham commented on May 14, 2024

That looks like your system is maxed out in terms of running processes and can't even create a new thread to run docker compose kill. Are you able to get a list of processes with ps or top and see if there are infact many processes running and kill some manually (or restart your WSL instance?)

from llmware.

narimantos avatar narimantos commented on May 14, 2024

Just got back form a fresh restart but still no luck

Maybe its my swap space? because that is set at 0.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                 
  789 root      20   0    2348    124      0 S   0.0   0.0   0:00.03 Relay(790)                                                                                                                                                              
top - 01:02:27 up 11 min,  0 users,  load average: 0.00, 0.07, 0.07
Tasks:  31 total,   1 running,  30 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15993.1 total,   7961.2 free,   2597.0 used,   5434.9 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  13066.5 avail Mem 

And ps -aux does not show that many processes 🤷‍♂️

from llmware.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.