GithubHelp home page GithubHelp logo

strivin0311 / self-long-instruct Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 1.0 2.04 MB

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

License: Apache License 2.0

Jupyter Notebook 92.07% Python 7.93%

self-long-instruct's Introduction

self-long-instruct

The project to enhance the self-instruct method to long-context instruction tuning dataset auto-generation based on long-LLMs, Retrieval-Augmented Generation (RAG) and LLMs-as-Agents

Base

  • self-instruct:
  • Retrieval-Augmented Generation (RAG):
  • LLMs-as-Agents:

Preparation

  • install the pip dependences:
    pip install -r requirements.txt
  • download the punkt from nltk :
    • method1: download through the api
      import nltk
      nltk.download('punkt')
    • method2: if the api fails, you can go to the github repo and follow the steps below:
      • step1: download the whole packages directory into your conda env path like /home/user/anaconda3/envs/myenv/ and rename it nltk_data
      • step2: unzip the zip files through the nltk_data, especially the tokenizers/ and taggers/, and to make it convenient, we also provide a function to do it automatically:
        from src.utils import unzip_nltk_data
        nltk_data_dir = "/home/user/anaconda3/envs/myenv/"
        unzip_nltk_data(nltk_data_dir, remove=True) 
  • install the poppler tools to make pdf2image work (Assuming your OS is Linux, well if not, you can check pdf2image installation guide further):
    sudo apt-get install poppler-utils
  • follow the guide here and install the LibreOffice tool to make unstructured.partition.doc work

self-long-instruct's People

Contributors

strivin0311 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sorokinvld

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.