GithubHelp home page GithubHelp logo

thiswillbeyourgithub / doctoolsllm Goto Github PK

View Code? Open in Web Editor NEW
61.0 2.0 6.0 4.11 MB

Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable, under developpement

License: GNU General Public License v3.0

Python 84.49% Shell 15.51%
langchain llm news python question-answering summarizer

doctoolsllm's Issues

No module named 'lazy_import'

I followed your instructions and when I try to run the sample command to summarize the arstechnica article, I get the following error:

python -m DocToolsLLM --task="summary" --path="https://arstechnica.com/science/2024/06/to-pee-or-not-to-pee-that-is-a-question-for-the-bladder-and-the-brain/".
Traceback (most recent call last):
File "", line 189, in _run_module_as_main
File "", line 148, in _get_module_details
File "", line 112, in get_module_details
File "E:\dev\DocToolsLLM.venv\Lib\site-packages\DocToolsLLM_init
.py", line 19, in
import lazy_import
ModuleNotFoundError: No module named 'lazy_import'

Any Idea what am I missing?

Not able to load a txt document

Trying to test this locally.

Python 3.12.

I added the following to requirements:

dill
youtube_dl
goose3
pdftotext
langdetect

I also had to change langdetect imports:

from
from ftlangdetect import detect as language_detect
to
from langdetect import detect as language_detect

❯ python ./DoctoolsLLM.py --task="query" --path="90-days.txt" --filetype="infer"
 ____            _____           _     _     _     __  __
|  _ \  ___   __|_   _|__   ___ | |___| |   | |   |  \/  |
| | | |/ _ \ / __|| |/ _ \ / _ \| / __| |   | |   | |\/| |
| |_| | (_) | (__ | | (_) | (_) | \__ \ |___| |___| |  | |
|____/ \___/ \___||_|\___/ \___/|_|___/_____|_____|_|  |_|


Hashing files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 214.54doc/s]
Deduplicating files
Loading txt: '90-days.txt'
Error when loading doc with filetype txt: 'string indices must be integers, not 'str''. Arguments: () ; {'task': 'query', 'debug': False, 'path': '90-days.txt', 'filetype': 'txt', 'file_hash': 'b114c1a8b1994861991f'}
Loading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.29doc/s]
Number of failed documents: 1
Traceback (most recent call last):
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/./DoctoolsLLM.py", line 1646, in <module>
    instance = fire.Fire(DocToolsLLM)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/venv/lib/python3.12/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/venv/lib/python3.12/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/venv/lib/python3.12/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/./DoctoolsLLM.py", line 698, in __init__
    self.loaded_docs = load_doc(
                       ^^^^^^^^^
  File "/Users/dmitrymarkushevich/workspace/github/DocToolsLLM/utils/file_loader.py", line 262, in load_doc
    assert docs, "No documents were succesfully loaded!"
           ^^^^
AssertionError: No documents were succesfully loaded!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.