GithubHelp home page GithubHelp logo

listening-to-chaotic-whishpers--code's Introduction

LCW-Code

The whole repo need refactoring. I'll try to do it asap. I particularly need to find a new dataset to reproduce the results

This repo is in progress. It is dedicated to an implementation of Listening to Chaotic Whispers. https://arxiv.org/abs/1712.02136v1 We are writing a serie of blogpost to explain each step of our workflow. You may find the first the first post on Medium : https://medium.com/@gkeng/make-your-computer-invest-like-a-human-ef0654ccdcff

Description of each file :

  • SP500_nasdaq100.csv : Csv file containing all companies in S&P 500 and Nasdaq
  • extract_reuters : Parallelized scraping of article from reuters.com
  • extract_wsj : Attempt of scraping Wall Street Journal
  • data_process : some data processing on articles collected
  • doc2vec : Doc2Vec vectorization of press articles
  • word2vec : Word2Vec vectorization of press articles, but we preferred to continue with Doc2vec
  • list_firm : List of all firms we choosed for this implementation
  • create_dataset : A script to create our 4 dimensions dataset for each company
  • picklizer : A script to make pickle file of all press articles for each firm
  • action : A class that implements methods and object to simulate a portfolio
  • han : Implementation of the Hybrid Attention Network
  • han_training : Implementation and training of HAN
  • pickle : a folder with all pickle files for stock price of companies
  • pickle_article : a folder with all pickle files for articles on each company
  • daterange : to link the ID of day to the actual day (year/month/day).

Folders :

  • sample_of_scrap : sample of the articles we scraped
  • stock_value : Contains stock values and stock moves of the companies.
  • pickle : Contains dictionaries of all stock moves in pickles files. Used to create y_train and y_test
  • pickle_article : Contains dictionnaries { str day : str [ list of all articles ID for this company on this day] } in pickle file.
  • firm_csv_folder_old : Contains csv with IDs of all articles for each company.

Steps to follow to run the project :

  1. Run extract_reuters.py it will organize articles in folder like this : your_chosen_folder <=== day_folder <== journal_dir <== article_title.txt

  2. Use functions in data_process.py to process the data. In this order :

    • rename_dir : will rename all directories. The directory for the first day( 1st January 201X) will be "0001"
    • rename_file : will give an ID to every file. The 15th article of the first day will be "0001_15.txt"
    • create_csv_firm will create a csv for each company in which one can find every day and ID of articles in which the company is cited
  3. Run picklizer.py. This creates a dictionnary for each company and saves it as a pickle file. { str day : str [ list of all articles ID for this company on this day] }

  4. Run doc2vec.py to train the doc2vec model and vectorize all the press articles. The output file is heavy.. For years 2015 to 2017 our doc2vec file was 2 Go of size

Now, focus on the stock prices and stock moves. We took most of our stock values from here : https://www.kaggle.com/camnugent/sandp500 You can also find many here : https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs

  1. Run pickle_stock_value.py to transform all csv of stock values in a pickle file containing a dictionary : { str day : float stock_value }

  2. Run make_stock_move.py to create a csv of stock moves from day t to day t+1.

  3. Run pickle_stock_move.py to create a dic of stock moves from day t to day t+1 stored in a pickle. { str day : int stock_move }

  4. Run create_dataset.py to create the 4 dimension datase. tRefer to the comments in the code for more details.

  5. Train the model with han_training.py

  6. Test the model with show_results.py

listening-to-chaotic-whishpers--code's People

Contributors

gkeng avatar piepie33 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

listening-to-chaotic-whishpers--code's Issues

Crawler is unavailable now

May due to site change, the url of Reuters and wsj is now unavailable, and will get 404 error. How can I get the news text data now? Sincerely hoping for your reply.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.