GithubHelp home page GithubHelp logo

ruravi / chatgpt-hn-plugin Goto Github PK

View Code? Open in Web Editor NEW

This project forked from anantn/hn-chatgpt-plugin

0.0 0.0 0.0 298 KB

ChatGPT plugin for Hacker News

Shell 0.26% JavaScript 2.09% Python 6.84% Go 0.97% Jupyter Notebook 89.84%

chatgpt-hn-plugin's Introduction

ChatGPT ๐Ÿค Hacker News

Answer any question based on the discussion corpus on Hacker News through ChatGPT!

Dataset

As of early April 2023, Hacker News contained 35,663,259 items of content (story submissions, comments, and polls), and 859,467 unique users.

The data isn't that large, and can be fetched through the Firebase API.

๐Ÿ‘‰ Download the SQLite DB from ๐Ÿค—.

I tried a bunch of different methods to maximize download throughput: python, go, and node.js. Ultimately node was the most robust and reliable (though not the fastest) mechanism. It's possible to parallelize the download process โ€” which I did โ€” and ended up merging the databases.

  • fetch.js is the core download script.
  • run.sh is a quick-and-dirty user-script to parallelize the download on AWS EC2. Note the hard-coded number of machines.
  • fetch-users.js is a script to fetch user data profiles, can be done on a single machine.
  • merge.py can be used to merge each partition into a single sqlite file.

The final output is a sqlite file that's ~25GB. It compressed down to ~5GB with zstd which is the version hosted on HuggingFace. This includes indexes on some common fields, if you want to reduce the size further you can always DROP INDEX.

Decompressing it should take less than a minute on a good computer with an SSD:

โฏ time pzstd -kd hn-sqlite-20230420.db.zst
hn-sqlite-20230420.db.zst: 23664996352 bytes
real    0m38.103s

Algolia Plugin

Earlier attempt, but still useful: integrates Algolia's Hacker News search API with ChatGPT plugins to have conversations about content on hacker news.

If you have plugin access, you can try it:

$ cd algolia
$ pip install -r requirements.txt
$ python app.py

Open a chat with plugins enabled, then: Plugin store > Develop your own plugin > localhost:3333 > Fetch manifest.

ChatGPT seems to hallucinate some parameters to the API, particularly the sortBy and sortOrder arguments โ€” which may make sense to implement.

chatgpt-hn-plugin's People

Contributors

anantn avatar ruravi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.