miragecoa / wallstreetpulse Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 2.0 10.61 MB

License: Apache License 2.0

Python 100.00%

wallstreetpulse's People

Contributors

Stargazers

Watchers

Forkers

yc-5002 a5pir1n

wallstreetpulse's Issues

Json helper function

We only wanna visit a post once using GPT, so after we finish our visit, we should mark the post as visited and store our result data locally.

Goal: Create a jsonHelper.py that supports the following:
1. Create
2. Search if some topics/titles already created
3. Edit
4. Delete
Users should be able to pass in multiple parameters(Can be a dictionary of {Title: "something", upvotes: 10, ETC. }). Make sure you handle these requests correctly

This helper function is intended to be a local database similar to MySQL and MongoDB.

update: we tried sqlite3. Looks like a good approach

Create a helper class to visualize stock market data

We need to create a helper class that can fetch and plot stock market data from various sources, such as Yahoo Finance or Alpha Vantage. This will allow us to compare our predictions from the GPT analysis with the actual market trends and performance.

Consider registering for APIs on Yahoo Finance or Alpha Vantage
The helper class should have

Methods to get data of any given ticker
Plot the data. Numpy, matplotlib, or similar libraries will be useful

Improve Reddit_Posts.py

Currently, we have a working version of getting posts from Reddit using Praw module. Relevant codes are in Reddit_Posts.py and main.py. However, when retrieving posts/comments, Praw returns a generator and it runs extremely slow even with a few posts/comments.

Potential solutions

Implement our own library interacting with Reddit API in reddit_api.py. (Focus on the json file returned by get_hot_posts function)
Improve our implementation of Reddit_Posts. (My current Posts class is god awful)

Graph visulizer for communities and influential users

Connecting issue 9: #9
Consider we have 2 tables in SQL with the following tables (achievable using SQLhelper.py)
table 1 for main post information: id, url, article_id, poster_username, post_content, published_date, visited_date, upvotes, downvotes
table 2 for comments; id, article_id(to know which post this comment is under), username, content, parent_comment_id(if this comment replies to a previous comment), upvotes, downvotes

Make a Graph visualization function to visualize the information flow.
For example: in this image, you will see the red user is the most influential
Make functions to find the top k nodes with the most degrees
There can be multiple influential users affecting multiple communities, and different communities are also connecting(but much less connection than within the community). Make a function to identify and split users into different communities. (e.g. community detection algorithm "Louvain")

Collect Posts data into a local dataset

We have already implemented access functions for Custom Search Engine to apply a time filter search on Reddit posts. We also implemented a SQL helper function that uses sqlite3 for storing dictionaries into a local .db file, functioning the same as an SQL database.

Goal:
Make use of both access functions and SQL functions to retrieve as much post information as possible into a local file for future analysis, Including poster username, post content, comments, commenters username.
Workflow:
Custom Search Engine get post URLs -> Reddit API get post info by article id -> SQL helper

I suggest making 2 tables with the following structure
table 1 for main post information: id, url, article_id, poster_username, post_content, published_date, visited_date, upvotes, downvotes
table 2 for comments; id, article_id(to know which post this comment is under), username, content, parent_comment_id(if this comment replies to a previous comment), upvotes, downvotes

We will eventually visualize a graph with nodes(users) and edges(connections)

Try language models

We are capable of retrieving post data using Reddit_Posts.py. Now we wanna interact with our language model (GPT in this case) to get useful data.

Goal:

Identify what each post is about
Identify whether others' comments support the post
Somehow record your results for each post. Make sure we only look over a post one time, not 100 times. (We may also wanna record the visited dates)
This will be achieved with jsonHelper.py

Check gpt_api.py. Use get_response function.
I might introduce functions capable of doing online search in the future

Apply models to WallStreetBets Forum data for final result

We are entering the final stage of this project. We need to show the results/findings of our research.

Task: Give codes to try different models and find the appropriate models on our WallStreetBets forum data.

Come chat with me if you have any thoughts/findings or any idea about what our result should be.

Useful ratios for Reddit_Posts class to get influential users/posts

Currently, our Reddit_Posts.py has titles, contents, upvotes, downvotes, etc. We will need some data to determine the "influential" posts.
Think about something like upvote/downvote ratios, how often the poster posts, the number of different users appear in our retrieved post,
Or any numbers/ratios that might be helpful for determining the "influential" posts/users

Add appropriate functions and comments to Reddit_Posts.py or a new .py file

Get influential users/posts

Goal:

We need an algorithm to identify how much a user or a post impacts the community. Generate a score, and in the future we will use this score as a weighted coefficients when determining the impactful users/posts

Consider num_posts, num_followers, upvotes in a given time span

miragecoa / wallstreetpulse Goto Github PK

wallstreetpulse's People

Contributors

Stargazers

Watchers

Forkers

wallstreetpulse's Issues

Json helper function

Create a helper class to visualize stock market data

Improve Reddit_Posts.py

Graph visulizer for communities and influential users

Collect Posts data into a local dataset

Try language models

Apply models to WallStreetBets Forum data for final result

Useful ratios for Reddit_Posts class to get influential users/posts

Get influential users/posts

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs