/All Data:
- Origin Data: origin news article data from Google News API
- Data_sentiment: data with sentiment score in it.
- Prediction: test data from Apr 11 to Apr 26.
- Stock Data: stock price data from four companies.
- process_result: processed csv files which are used to calculate correlation coefficient and to build prediction models.
/Code:
- classification.py: train classification models over all domains.
- correlation.py: calculate correlation coefficient and train classification models over separate news sources.
- dataprocessing1.ipynb: aggregate news sentiment with stock price data in the same csv file.
- dataprocessing2.ipynb: aggregate three types of text corpus with all three domains into one csv file.
- getGraphCSV.py: process data for data visualization.
- labelStock.py: label the stock price data with label 0 and 1.
- sentimentality.py: parse all the urls we get from google news API and calculate sentiment score for each article.
/graphs: All visualizations we created in Tableau
Classification:Regression Results.pdf: records of all results of regression and classification models. correlation.xlsx: The correlation coefficient of news sentiment and stock prices from different features.