GithubHelp home page GithubHelp logo

tmcjp / kg_rag Goto Github PK

View Code? Open in Web Editor NEW

This project forked from baranzinilab/kg_rag

0.0 0.0 0.0 9.3 MB

Empower Large Language Models (LLM) using Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) for knowledge intensive tasks

License: Apache License 2.0

Python 100.00%

kg_rag's Introduction

Table of Contents

What is KG-RAG

Example use case of KG-RAG

How to run KG-RAG

What is KG-RAG?

KG-RAG stands for Knowledge Graph-based Retrieval Augmented Generation.

Start by watching the video of KG-RAG

KG_RAG_schematics.mov

It is a task agnostic framework that combines the explicit knowledge of a Knowledge Graph (KG) with the implicit knowledge of a Large Language Model (LLM). Here is the arXiv preprint of the work.

Here, we utilize a massive biomedical KG called SPOKE as the provider for the biomedical context. SPOKE has incorporated over 40 biomedical knowledge repositories from diverse domains, each focusing on biomedical concept like genes, proteins, drugs, compounds, diseases, and their established connections. SPOKE consists of more than 27 million nodes of 21 different types and 53 million edges of 55 types [Ref]

The main feature of KG-RAG is that it extracts "prompt-aware context" from SPOKE KG, which is defined as:

the minimal context sufficient enough to respond to the user prompt.

Hence, this framework empowers a general-purpose LLM by incorporating an optimized domain-specific 'prompt-aware context' from a biomedical KG.

Example use case of KG-RAG

Following snippet shows the news from FDA website about the drug "setmelanotide" approved by FDA for weight management in patients with Bardet-Biedl Syndrome

Ask GPT-4 about the above drug:

WITHOUT KG-RAG

Note: This example was run using KG-RAG v0.3.0. We are prompting GPT from the terminal, NOT from the chatGPT browser. Temperature parameter is set to 0 for all the analysis. Refer this yaml file for parameter setting

bbsyndrome_without_kgrag.mov

WITH KG-RAG

Note: This example was run using KG-RAG v0.3.0. Temperature parameter is set to 0 for all the analysis. Refer this yaml file for parameter setting

bbsyndrome_with_kgrag.mov

You can see that, KG-RAG was able to give the correct information about the FDA approved drug.

How to run KG-RAG

Note: At the moment, KG-RAG is specifically designed for running prompts related to Diseases. We are actively working on improving its versatility.

Step 1: Clone the repo

Clone this repository. All Biomedical data used in the paper are uploaded to this repository, hence you don't have to download that separately.

Step 2: Create a virtual environment

Note: Scripts in this repository were run using python 3.10.9

conda create -n kg_rag python=3.10.9
conda activate kg_rag
cd KG_RAG

Step 3: Install dependencies

pip install -r requirements.txt

Step 4: Update config.yaml

config.yaml holds all the necessary information required to run the scripts in your machine. Make sure to populate this yaml file accordingly.

Note: There is another yaml file called system_prompts.yaml. This is already populated and it holds all the system prompts used in the KG-RAG framework.

Step 5: Run the setup script

Note: Make sure you are in KG_RAG folder

Setup script runs in an interactive fashion.

Running the setup script will:

  • create disease vector database for KG-RAG
  • download Llama model in your machine (optional, you can skip this and that is totally fine)
python -m kg_rag.run_setup

Step 6: Run KG-RAG from your terminal

Note: Make sure you are in KG_RAG folder

You can run KG-RAG using GPT and Llama model.

Using GPT

python -m kg_rag.rag_based_generation.GPT.text_generation -g <your favorite gpt model - "gpt-4" or "gpt-35-turbo">

Example:

Note: The following example was run on AWS p3.8xlarge EC2 instance and using KG-RAG v0.3.0.

gpt_demo.mov

Using GPT interactive mode

This allows the user to go over each step of the process in an interactive fashion

python -m kg_rag.rag_based_generation.GPT.text_generation -i True -g <your favorite gpt model - "gpt-4" or "gpt-35-turbo">

Using Llama

Note: If you haven't downloaded Llama during setup step, then when you run the following, it may take sometime since it will download the model first.

python -m kg_rag.rag_based_generation.Llama.text_generation -m <method-1 or method2, if nothing is mentioned it will take 'method-1'>

Example:

Note: The following example was run on AWS p3.8xlarge EC2 instance and using KG-RAG v0.3.0.

llama_demo.mov

Using Llama interactive mode

This allows the user to go over each step of the process in an interactive fashion

python -m kg_rag.rag_based_generation.Llama.text_generation -i True -m <method-1 or method2, if nothing is mentioned it will take 'method-1'>

kg_rag's People

Contributors

karthiksoman avatar namin avatar eltociear avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.