GithubHelp home page GithubHelp logo

zhanrongwork / mygptreader Goto Github PK

View Code? Open in Web Editor NEW

This project forked from madawei2699/mygptreader

0.0 0.0 0.0 141 KB

myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.

License: MIT License

Python 99.77% Procfile 0.23%

mygptreader's Introduction

myGPTReader

myGPTReader is a slack bot that can read any webpage, ebook or document and summarize it with chatGPT. It can also talk to you via voice using the content in the channel.

For now it is in development, but you can try it out by join this channel.

The exciting part is that the development of this project is also paired with chatGPT. I document the development process in this CDDR file.

Features

  • Integrated with slack bot
    • Bot replies messages in the same thread
  • Support web page reading with chatGPT
  • Support RSS reading with chatGPT
    • RSS is a bunch of links, so it is equivalent to reading a web page to get the content.
  • Support newsletter reading with chatGPT
    • Most newsletters are public and can be accessed online, so we can just give the url to the slack bot.
  • Prompt fine-tue
    • Support for custom prompt
    • Show prompt templates by slack app slash commands
    • Auto collect the good prompt to #gpt-prompt channel by message shortcut
  • Cost saving
    • by caching the web page llama index
      • Consider to use sqlite-vss to store and search the text embeddings
      • Use chromadb to store and search the text embeddings
      • Use the llama index file to restore the index
    • Consider to use sentence-transformers or txtai to generate embeddings (vectors)
      • Not good as the embeddings of OpenAI, rollback to use the OpenAI embeddings, and if enable to use the custom embeddings, the minimum of server's memory is 2GB which still increase the cost.
    • Consider to fine-tue the chunk size of index node and prompt to save the cost
      • If the chunk size is too big, it will cause the index node to be too large and the cost will be high.
  • Bot can read historical messages from the same thread, thus providing context to chatGPT
  • Index fine-tune
    • Use the GPTListIndex to summarize multiple URLs
    • Use the GPTTreeIndex with summarize mode to summarize a single web page
  • Bot regularly send hot summarizes(expensive cost) news in the slack channel (#daily-news)
    • Refer to this approach
      • World News
        • Zhihu daily hot answers
        • V2EX daily hot topics
        • 1point3acres daily hot topics
        • Reddit world hot news
      • Dev News
        • Hacker News daily hot topics
        • Product Hunt daily hot topics
      • Invest News
        • Xueqiu daily hot topics
        • Jisilu daily hot topics
  • Support file reading and analysis ๐Ÿ’ฅ
    • Considering the expensive billing, it needs to use the slack userID whitelist to restrict the access this feature
    • Need to cache the file Documents to save extract cost
    • EPUB
    • DOCX
    • MD
    • TEXT
    • PDF
    • Image
      • may use GPT4
  • Support voice reading with self-hosting whisper
    • (whisper -> chatGPT -> azure text2speech) to play language speaking practices ๐Ÿ’ฅ
    • Support language
      • Chinese
      • English
        • ๐Ÿ‡บ๐Ÿ‡ธ
        • ๐Ÿ‡ฌ๐Ÿ‡ง
        • ๐Ÿ‡ฆ๐Ÿ‡บ
        • ๐Ÿ‡ฎ๐Ÿ‡ณ
      • Japanese
      • German
  • Integrated with Azure OpenAI Service
  • User access limit
    • Limit the number of requests to bot per user per day to save the cost
  • Support discord bot โ“
  • Rewrite the code in Typescript โ“
  • Upgrade chat model (gpt-3.5-turbo) to GPT4 (gpt-4-0314) ๐Ÿ’ฅ
  • Documentation
  • Publish bot to make it can be used in other workspaces
    • Slack marketplace

mygptreader's People

Contributors

madawei2699 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.