GithubHelp home page GithubHelp logo

cody-lange / financialnewssentimentclassifier Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 193 KB

Performing sentiment analysis of financial news headlines with a RoBERTa model

HTML 89.92% Jupyter Notebook 10.08%

financialnewssentimentclassifier's Introduction

FinancialNewsSentimentClassifier

Web Apllication that Classifies Headlines from marketwatch.com: https://langecod.pythonanywhere.com/news_classifier

Motivation: Market sentiment derived from news articles is one factor that can influence trading and investment decisions. By understanding the sentiment of financial news headlines, investors, traders, and financial institutions have another tool that they can use to make more informed decisions, assess risks, and potentially even design algorithmic trading strategies. This notebook demonstrates the process of training a RoBERTa model on the financial_phrasebank dataset that will be used to classify Marketwatch.com headlines into three sentiment categories: positive, neutral, and negative. While this exercise uses financial headlines as an input, the underlying techniques used in this notebook can be transferred to simmilar text classification tasks. The methods outlined in this notebook also serve as a less-complicated, indirect sample for my actual data science work making a similar transformer model for multilabel text classification.

Data Source: The financial_phrasebank dataset, labeled by annotators from the Aalto University School of Business, comprises sentiments of financial news headlines from companies listed in OMX Helsinki.

Methodology: RoBERTa, a transformer model excelling in multiple NLP tasks, was chosen to leverage its advanced contextual understanding and exceptional text classification performance. The model processes unprocessed text to maintain semantic richness, leveraging attention mechanisms and optimized training to interpret nuanced language intricacies efficiently. Financial RoBERTa was used for the transfer learning opportunity, providing an enhanced understanding of financial texts, obtained from extensive fine-tuning on various financial documents (not including the financial phrasebank).

Optimization: To combat extensive memory usage and computational slowdowns experienced in past projects, several optimizations were made. Mixed precision training and the 8BitAdam optimizer were employed to reduce memory footprint and accelerate training without significant performance loss. Early stopping was also implemented to prevent overfitting and unnecessary computations, essential for handling large datasets and intricate models efficiently.

Results: The model demonstrated robust performance across various evaluation metrics. The following are the key results obtained post-evaluation:

Test Accuracy: 95.66% Weighted Precision: 95.87% Macro Precision: 94.92% Weighted Recall: 95.66% Macro Recall: 96.80% Weighted F1 Score: 95.71% Macro F1 Score: 95.80%

Points of Consideration: While the model shows high reliability and accuracy in classifying financial news headlines, several considerations are crucial. The labeling process, involving multiple annotators, could introduce biases affecting the reliability of the dataset. Additionally, the specificity of the dataset to OMX Helsinki companies might limit the model's adaptability to diverse financial news sources and broader topics. Finally, the complex nature of transformer models like RoBERTa emphasizes the importance of interpretability in the financial sector to understand model predictions adequately for making informed decisions.

financialnewssentimentclassifier's People

Contributors

cody-lange avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.