Comments (4)
Hi @AlbelTec I just tried on WSL2 (Windows 10) and was able to get things working with:
ulimit -s 160000
(The higher 32768000 value seems to be only required when running Linux in a container on Mac)
However, in your case it looks like the ulimit setting might not be taking effect at all. You may be hitting this issue:
Can you try the workaround suggested at the bottom of that issue?:
sudo prlimit --stack=unlimited --pid $$; ulimit -s unlimited
from llmware.
Hi @AlbelTec - please try the work around described in #48.
from llmware.
Hi @JessBerl
Actually I did use : ulimit -s 32768000
but still getting the error :
> Parsing folder: data...
Segmentation fault
(llmware) albel@Thinkpad:~/llmware$
here is my code :
def parsing_pdf():
# Create a parser
parser = Parser()
# Parse entire folder to json
print (f"\n > Parsing folder: {dataDir}...")
pdf_parsed_output = Parser().parse_one_pdf("/home/albel/llmware/data/", "Large Language Models.pdf")
page_number = pdf_parsed_output[0]["master_index"]
block_text = pdf_parsed_output[0]["text"]
print(f"\nFirst block found on page {page_number}:\n{block_text}")
# Parse to json
#blocks = parser.ingest_to_json(dataDir)
# print (f"Total Blocks: {len(parser.parser_output)}")
# print (f"Files Parsed:")
# for processed_file in blocks["processed_files"]:
# print(f" - {processed_file}")
parsing_pdf()
with json it's more verbose :
albel@Thinkpad:~/llmware$ source /home/albel/llmware/bin/activate
(llmware) albel@Thinkpad:~/llmware$ /home/albel/llmware/bin/python3.10 /home/albel/llmware/llmware_pdf.py
> Parsing folder: data...
update: pdf_parser - START NEW PDF Processing - file path-/home/albel/llmware_data/tmp/parser_tmp/process_pdf_files/Large Language Models.pdf
update: pdf_parser - build_obj_master_list - obj created - 3130
update: pdf_parser - Catalog Dict - <<
/Type /Catalog
/Version /1.4
/Pages 2 0 R
/StructTreeRoot 3 0 R
/MarkInfo 4 0 R
/Lang (en-GB)
/ViewerPreferences 5 0 R
/Metadata 6 0 R
>
update: pdf_parser - filelen - 5447062
update: pdf_parser - created additional hidden objstm objects - 0
update: pdf_parser - page count - 31- pages_found - 31
update: pdf_parser - global font count- 40
update: pdf_parser - PAGE PROCESSING-MAIN-LOOP -0-content entries-1
Segmentation fault
from llmware.
@turnham Thanks! finally it worked. Actually ulimit was static with 8192 as value. it turned out that with prlimit
with root privileges it assigned unlimited
as value and the issue is gone. The only drawback, it has to run for every session. I can live with it for now until Windows version to be released.
from llmware.
Related Issues (20)
- torch load mmap error
- Add class docstrings to four modules HOT 1
- In text citation HOT 3
- array out of bounds error in retrieval HOT 4
- Add class docstring to setup module
- Creating embedding with MongoDB text store when library contains CSV file fails HOT 7
- Add class docstring to module retrieval
- SLIM Models - OSError: [WinError -1073741795] Windows Error 0xc000001d in 0.2.4 HOT 11
- JSON files not being parsed and are being rejected HOT 6
- Add class docstrings to module prompts HOT 1
- quickstart_rag_colab.ipynb
- streamlit and other UI examples HOT 1
- google colab examples and start up scripts HOT 1
- jupyter notebook - more examples and better support HOT 2
- Add Cohere Command R model
- GGUF models not utilising GPU on Windows HOT 2
- PDF files getting rejected in parse step HOT 4
- Can I use SLIM-Agents for german language?
- Error in Prompt.load(from_hf) : model_card (NoneType) is not iterable HOT 2
- llmware.exceptions.ModelNotFoundException: HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llmware.