GithubHelp home page GithubHelp logo

lfoppiano / document-qa Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 4.0 610 KB

Scientific Document Insight Q/A

Home Page: https://lfoppiano-document-qa.hf.space/

License: Apache License 2.0

Dockerfile 0.79% Python 99.21%
llm rag scientific-documents text text-mining

document-qa's Introduction

Hi there ๐Ÿ‘‹

Artificial intelligence specialist with 10+ years of experience in software engineering. I have expertise in Text and Data Mining, Natural Language Processing, and Data Science. I'm interested in the development of specialized processes for scientific text, in particular document parsing and structuring.

I like mostly anything related to the outdoor and travel.

document-qa's People

Contributors

lfoppiano avatar t29mato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

document-qa's Issues

Streamlit-pdf-viewer Component Interfering with Chat History Auto-Scroll Functionality

Description

I've encountered an issue when implementing the streamlit-pdf-viewer component in the document-qa application. It appears that the auto-scroll functionality of the chat history, which normally brings the view to the bottom of the chat when a user gets an answer from generative AI, stops working.

Potential Cause and Solution

I suspect this might be related to the current limitation of the st.chat_input widget, which cannot be used inside st.columns. If st.chat_input were adaptable for use within st.columns, this might resolve the issue.

Relevant Information

According to a recent comment on issue #7296 in the Streamlit repository, there is a planned implementation to solve this problem within the next three months.

Current Approach

For now, the strategy is to wait for updates or fixes to the Streamlit library.

add chat memory

  • add memory between each chat messages and responses
  • at every document uploaded the memoery should be reset

Example:

image

interact with the pdf document loaded

We could link the paragraphs extracted from the RAG with the PDF coordinates and show some visual aids of the part of the document used for answering...

Add summarization

To properly summarise papers, we need to iteratively compress information in a slightly different way than the Q/A.

The summarisation could be implemented as separate function than the Q/A, in the same interface.

Could not browse (read) the uploaded pdf file

Environment: Safari Version 17.0 (19616.1.27.211.1)
Frequency: every time
Steps to reproduce error:

  1. Input the Chat GPT API-key
  2. Browse file from local source (pdf)
  3. Error happened when executed
TypeError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
    exec(code, module.__dict__)
File "/app/document-qa/streamlit_app.py", line 262, in <module>
    st.session_state['doc_id'] = hash = st.session_state['rqa'][model].create_memory_embeddings(tmp_file.name,
File "/app/document-qa/document_qa/document_qa_engine.py", line 204, in create_memory_embeddings
    texts, metadata, ids = self.get_text_from_document(pdf_path, chunk_size=chunk_size, perc_overlap=perc_overlap)
File "/app/document-qa/document_qa/document_qa_engine.py", line 169, in get_text_from_document

Load different structures

We can extract three type of structures, which are then loaded in separate storages:

  • fulltext
  • figures / tables
  • bibliographic data
  • references / text callout

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.