GithubHelp home page GithubHelp logo

dorucioclea / stocknet-dataset Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yumoxu/stocknet-dataset

0.0 1.0 0.0 66.95 MB

A comprehensive dataset for stock movement prediction from tweets and historical stock prices.

License: MIT License

stocknet-dataset's Introduction

stocknet-dataset

This repository releases a comprehensive dataset for stock movement prediction from tweets and historical stock prices. Please cite the following paper [bib] if you use this dataset,

Yumo Xu and Shay B. Cohen. 2018. Stock Movement Prediction from Tweets and Historical Prices. In Proceedings of the 56st Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia, volume 1.

Stock movement prediction is a challenging problem: the market is highly stochastic, and we make temporally-dependent predictions from chaotic data. We treat these three complexities and present a novel deep generative model jointly exploiting text and price signals for this task. Unlike the case with discriminative or topic modeling, our model introduces recurrent, continuous latent variables for a better treatment of stochasticity, and uses neural variational inference to address the intractable posterior inference. We also provide a hybrid objective with temporal auxiliary to flexibly capture predictive dependencies. We demonstrate the state-of-the-art performance of our proposed model on a new stock movement prediction dataset which we collected.

You might also be interested in our code for stock movement prediction.

Should you have any query please contact me at [email protected].

Dataset Overview

Two-year price movements from 01/01/2014 to 01/01/2016 of 88 stocks are selected to target, coming from all the 8 stocks in the Conglomerates sector and the top 10 stocks in capital size in each of the other 8 sectors. The full list of 88 stocks and their companies selected from 9 sectors is available in StockTable, a facsimile of the paper appendix appendix_table_of_target_stocks.pdf.

Data Component

This dataset comprises two main components,

Each component contains their raw data and preprocessed data organized by stocks,

  • ./tweet/raw
  • ./tweet/preprocessed

and

  • ./price/raw
  • ./price/preprocessed

Data Format

Raw Tweet Data

Format: JSON
Keys: see Introduction to Tweet JSON

Preprocessed Tweet Data

Format: JSON
Keys: 'text', 'user_id_str', 'created_at'

Raw Price Data

Format: CSV
Entries: date, open price, high price, low price, close price, adjust close price, volume

Preprocessed Price Data

Format: TXT
Entries: date, movement percent, open price, high price, low price, close price, volume
Note: open, high, low, close prices are normalized values.

stocknet-dataset's People

Contributors

yumoxu avatar dtaylor-530 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.