GithubHelp home page GithubHelp logo

timkoornstra / fintwitbert Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 108.23 MB

FinTwitBERT: Specialized BERT Model for Financial Twitter Analysis. Trained on vast financial tweets, it's ideal for sentiment analysis, trend prediction, and financial NLP tasks.

License: MIT License

Python 91.81% Jupyter Notebook 8.19%
ai bert cryptocurrency data-science financial-tweets fintech language-model machine-learning nlp python

fintwitbert's Introduction

Hi there, I'm Tim πŸ‘‹

I'm an AI Developer and recent graduate of a master's degree in AI at Utrecht University. I'm passionate about all things AI, with a specific interest in Natural Language Processing (NLP) and Computer Vision.

πŸ“š My Highlighted Repositories

Here are some of the projects I've worked on that I'm most proud of:

  • SAURON: This repository contains the code for my master's thesis. By running this code, you can create a writing style representation transformer. It's a project I'm particularly proud of, as it represents the culmination of my studies in AI.

  • Automatic Piano Fingering: This is the code for my bachelor's thesis. It creates a Q-Learning algorithm from scratch to determine the most optimal piano fingering. This project was a great opportunity to apply AI to a unique and interesting problem.

  • TiML: This repository is a from-scratch implementation of the most common machine learning methods. It includes simple explanations and is implemented in Python + NumPy. This project was a great opportunity to delve deep into the inner workings of machine learning algorithms.

🀝 My Collaboration Efforts

I believe in the power of collaboration and have had the opportunity to work with some amazing teams. Here are a few collaborations I'm proud of:

  • FinTwit_Bot: This is a Discord bot written in Python, with the purpose of providing an overview of the financial markets discussed on Twitter. The bot is able to distinguish multiple markets based on the tickers mentioned in the tweets and provides detailed information of the financial data discussed in a Tweet.

  • Axie_Manager_Bot: This is a Discord bot written in Python, with the purpose of helping our guild manage the scholars in our Discord server. The purpose of this bot is that it can be used for guilds with multiple scholars and different managers, who each have their own scholars and wallets.

More collaboration projects to come...

πŸ“« How to reach me

The best way to reach me is by email at [email protected]. I'm always open to new opportunities and collaborations.

Thanks for stopping by!

fintwitbert's People

Contributors

stephanakkerman avatar timkoornstra avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

fintwitbert's Issues

Add .ipynb notebook for analysis

SHAP -> only works using notebook...

import shap
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from datasets import load_dataset

# Load pre-trained model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("StephanAkkerman/FinTwitBERT-sentiment")
model = AutoModelForSequenceClassification.from_pretrained(
    "StephanAkkerman/FinTwitBERT-sentiment"
)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0,
    top_k=None,
)


dataset = load_dataset(
    "financial_phrasebank",
    cache_dir="datasets/",
    split="train",
    name="sentences_50agree",
)

# Rename sentence to text
dataset = dataset.rename_column("sentence", "text")

short_data = [v[:500] for v in dataset["text"][:20]]

# define the explainer
explainer = shap.Explainer(classifier)

# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:2])

# Try visualizing the SHAP values without indexing
shap.plots.text(shap_values)

LIME

import torch.nn.functional as F
from lime.lime_text import LimeTextExplainer

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("StephanAkkerman/FinTwitBERT-sentiment")
model = AutoModelForSequenceClassification.from_pretrained(
    "StephanAkkerman/FinTwitBERT-sentiment"
)
class_names = ["BULLISH", "BEARISH", "NEUTRAL"]


def predictor(texts):
    outputs = model(**tokenizer(texts, return_tensors="pt", padding=True))
    probas = F.softmax(outputs.logits).detach().numpy()
    return probas


explainer = LimeTextExplainer(class_names=class_names)

str_to_predict = "surprising increase in revenue in spite of decrease in market share"
exp = explainer.explain_instance(
    str_to_predict, predictor, num_features=20, num_samples=2000
)

exp.save_to_file("temp.html")
fig = exp.as_pyplot_figure()
fig.savefig("lime_report.jpg")

Increase (bearish) labeled data

Bullish Sentiments: 17,368
Bearish Sentiments: 8,542
Neutral Sentiments: 12,181

It would be nice if we could balance the datasets and increase them all, to for instance 25k each

  • Find bullish tweet examples
  • Find bearish tweet examples
  • Find neutral tweet examples
  • Create a prompt that we can use for Mixtral 8x7B for generating synthetic tweets
  • Use together.ai / pplx.ai / anyscale.com for getting the LLM model results

Increase number of datasets

Look on https://hf.co/datasets for more useful datasets

Unlabeled:

Twitter sentiment datasets (similar to tweet-eval):

Labeled:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.