GithubHelp home page GithubHelp logo

wallstreetpulse's People

Contributors

a5pir1n avatar fastninja30 avatar jeffreyi03 avatar lordbowlingball avatar miragecoa avatar nghongwatsang avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

yc-5002 a5pir1n

wallstreetpulse's Issues

Json helper function

We only wanna visit a post once using GPT, so after we finish our visit, we should mark the post as visited and store our result data locally.

Goal: Create a jsonHelper.py that supports the following:
1. Create
2. Search if some topics/titles already created
3. Edit
4. Delete
Users should be able to pass in multiple parameters(Can be a dictionary of {Title: "something", upvotes: 10, ETC. }). Make sure you handle these requests correctly

This helper function is intended to be a local database similar to MySQL and MongoDB.

update: we tried sqlite3. Looks like a good approach

Create a helper class to visualize stock market data

We need to create a helper class that can fetch and plot stock market data from various sources, such as Yahoo Finance or Alpha Vantage. This will allow us to compare our predictions from the GPT analysis with the actual market trends and performance.

Consider registering for APIs on Yahoo Finance or Alpha Vantage
The helper class should have

  1. Methods to get data of any given ticker
  2. Plot the data. Numpy, matplotlib, or similar libraries will be useful

Improve Reddit_Posts.py

Currently, we have a working version of getting posts from Reddit using Praw module. Relevant codes are in Reddit_Posts.py and main.py. However, when retrieving posts/comments, Praw returns a generator and it runs extremely slow even with a few posts/comments.

Potential solutions

  1. Implement our own library interacting with Reddit API in reddit_api.py. (Focus on the json file returned by get_hot_posts function)
  2. Improve our implementation of Reddit_Posts. (My current Posts class is god awful)

Graph visulizer for communities and influential users

Connecting issue 9: #9
Consider we have 2 tables in SQL with the following tables (achievable using SQLhelper.py)
table 1 for main post information: id, url, article_id, poster_username, post_content, published_date, visited_date, upvotes, downvotes
table 2 for comments; id, article_id(to know which post this comment is under), username, content, parent_comment_id(if this comment replies to a previous comment), upvotes, downvotes

  1. Make a Graph visualization function to visualize the information flow.
    For example: in this image, you will see the red user is the most influential
    image
  2. Make functions to find the top k nodes with the most degrees
  3. There can be multiple influential users affecting multiple communities, and different communities are also connecting(but much less connection than within the community). Make a function to identify and split users into different communities. (e.g. community detection algorithm "Louvain")

Collect Posts data into a local dataset

We have already implemented access functions for Custom Search Engine to apply a time filter search on Reddit posts. We also implemented a SQL helper function that uses sqlite3 for storing dictionaries into a local .db file, functioning the same as an SQL database.

Goal:
Make use of both access functions and SQL functions to retrieve as much post information as possible into a local file for future analysis, Including poster username, post content, comments, commenters username.
Workflow:
Custom Search Engine get post URLs -> Reddit API get post info by article id -> SQL helper

I suggest making 2 tables with the following structure
table 1 for main post information: id, url, article_id, poster_username, post_content, published_date, visited_date, upvotes, downvotes
table 2 for comments; id, article_id(to know which post this comment is under), username, content, parent_comment_id(if this comment replies to a previous comment), upvotes, downvotes

We will eventually visualize a graph with nodes(users) and edges(connections)

Try language models

We are capable of retrieving post data using Reddit_Posts.py. Now we wanna interact with our language model (GPT in this case) to get useful data.

Goal:

  1. Identify what each post is about
  2. Identify whether others' comments support the post
  3. Somehow record your results for each post. Make sure we only look over a post one time, not 100 times. (We may also wanna record the visited dates)
    This will be achieved with jsonHelper.py

Check gpt_api.py. Use get_response function.
I might introduce functions capable of doing online search in the future

Apply models to WallStreetBets Forum data for final result

We are entering the final stage of this project. We need to show the results/findings of our research.

Task: Give codes to try different models and find the appropriate models on our WallStreetBets forum data.

Models can be found on Netlogo models: https://ccl.northwestern.edu/netlogo/models/
Here is a list of models we can try and see which one fits:
Bidding Market: https://ccl.northwestern.edu/netlogo/models/BiddingMarket
Preferential Attachment: https://ccl.northwestern.edu/netlogo/models/PreferentialAttachment3D
Rumor Mill: https://ccl.northwestern.edu/netlogo/models/RumorMill
Virus on a Network: https://ccl.northwestern.edu/netlogo/models/VirusonaNetwork
Wealth Distribution: https://ccl.northwestern.edu/netlogo/models/WealthDistribution
Minority Game: https://ccl.northwestern.edu/netlogo/models/MinorityGame
Scatter: https://ccl.northwestern.edu/netlogo/models/Scatter
Small Worlds: https://ccl.northwestern.edu/netlogo/models/SmallWorlds
The question here is 1. Does this model fit to our data? 2. What result can we show to others as our final project report?
Appropriate results can be: whether the market will crash given data at a specific date of GME event.

Come chat with me if you have any thoughts/findings or any idea about what our result should be.

Useful ratios for Reddit_Posts class to get influential users/posts

Currently, our Reddit_Posts.py has titles, contents, upvotes, downvotes, etc. We will need some data to determine the "influential" posts.
Think about something like upvote/downvote ratios, how often the poster posts, the number of different users appear in our retrieved post,
Or any numbers/ratios that might be helpful for determining the "influential" posts/users

Add appropriate functions and comments to Reddit_Posts.py or a new .py file

Get influential users/posts

Goal:

  1. We need an algorithm to identify how much a user or a post impacts the community. Generate a score, and in the future we will use this score as a weighted coefficients when determining the impactful users/posts

Consider num_posts, num_followers, upvotes in a given time span

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.