GithubHelp home page GithubHelp logo

atieng / sentiment-analysis-2024-olympics Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 1.0 147.93 MB

Developing a comprehensive sentiment analysis model to accurately capture and interpret public sentiment about the Paris 2024 Olympics from diverse social media data.

License: MIT License

Jupyter Notebook 99.67% Python 0.33%

sentiment-analysis-2024-olympics's Introduction

Sentiment Analysis-Paris Olympics 2024

attachment:logo.png

Table of Contents

  1. Business Understanding

  2. Data Understanding

  3. Data Preparation

  4. Modeling

  5. Conclusion

  6. Recommendations

  7. Next Steps

  8. Deployment

  9. Libraries and Tools Used

  10. License

  11. Contributing Members

  12. Contacts

  13. Repository Structure

Business Understanding

Overview

The Paris Olympics 2024 promises to be one of the most significant global events of the decade, uniting nations, cultures, and athletes from around the world. As excitement builds and discussions around the event intensify, understanding public sentiment becomes crucial for stakeholders, including organizers, sponsors, media outlets, and even fans. This project aims to analyze the sentiment of conversations surrounding the Paris Olympics 2024, offering insights into public opinion, perceptions, and the overall sentiment landscape leading up to and during the event.

Goal

This project aims to capture and decode the global sentiment surrounding the Paris Olympics 2024. By analyzing emotions and opinions from diverse sources, we seek to provide real-time insights that empower organizers, brands, and media to align with public sentiment, ensuring the Paris Olympics 2024 resonates positively worldwide.

Objectives

  1. Develop a comprehensive social media sentiment analysis model that accurately captures and interprets public sentiment about the Paris Olympics from social media data.
  2. To extract, preprocess and clean social media data from multiple platforms addressing quality issues and handling multilingual content related to the Paris Olympics.
  3. To develop and train advanced natural language processing models to accurately classify sentiments incorporating techniques to handle sarcasm and contextual nuances.
  4. To create interactive visualizations to display sentiment trends and key events providing actionable insights to stakeholders based on comprehensive analysis of public opinions.

Stakeholders

  1. Organizers of the Paris Olympics 2024 - Sentiment analysis helps them gauge public opinion allowing them to make informed decisions and adjust their strategies accordingly.
  2. Sponsors - Sentiment analysis helps them understand how their brand is perceived in relation to the Olympics.
  3. Media outlets - Sentiment analysis provides them with insights into public interest and trending topics.
  4. Fans and general public - They are the primary audience for the Olympics and their sentiment directly impacts the event's success.
  5. Athletes - They are the central figures of the Olympics and public sentiment towards them can affect their performance and well-being.
  6. Local authorities and businesses in Paris - The Olympics significantly impact the host city and sentiment analysis can help gauge public opinion on local issues related to the event.

Data Understanding

The data was extracted from X using Octorparse Webscraping Tool. The focus was on tweets in the form of hashtags, comments and retweets discussing the various aspects of the Paris Olympics.

Data Preparation

The data processing step involved analyzing and cleaning a merged dataset of tweets related to the 2024 Paris Olympics originally composed of multiple CSV files. A DataUnderstanding class was created to examine the dataset revealing missing values and discrepancies as well as a large number of apparent duplicates most of which were false positives due to partial similarities.

Modeling

The model development and evaluation process involved testing several approaches. We started with traditional machine learning models, including Logistic Regression, Support Vector Machine, Random Forest, and Naive Bayes. Among these, the Random Forest model emerged as the best performer initially achieving 97.4% accuracy which slightly decreased to 96.6% after tuning. We also implemented an XGBoost model using RandomizedSearchCV for hyperparameter tuning, which achieved 82.2% accuracy. The VADER model demonstrated excellent performance with 94.92% accuracy and impressive overall metrics: 95.20% Precision, 94.92% Recall and 95.01% F1-Score. In contrast, the DistilBERT model showed lower performance with 44.34% accuracy and overall metrics of 45.94% Precision, 44.34% Recall and 44.62% F1-Score.

Conclusion

The VADER (Valence Aware Dictionary and sEntiment Reasoner) model significantly outperformed all other models including the tuned Random Forest. The success of VADER, a rule-based model designed specifically for social media text, highlights the importance of domain-specific tools in sentiment analysis especially when dealing with the nuanced language of Olympic-related discussions on social media platforms.

Recommendations

  1. Implement a real-time sentiment tracking dashboard for organizers and media partners, allowing them to respond quickly to shifts in public opinion.
  2. Develop a multi-lingual sentiment analysis capability to cater to the international nature of the Olympics using language-specific versions of VADER where available.
  3. Create a sentiment-based alert system for potentially controversial or viral topics enabling rapid response from the communications team.
  4. Integrate sentiment analysis results with other data sources (e.g. ticket sales, TV ratings) to provide a comprehensive view of public engagement.
  5. Use sentiment trends to guide content creation and social media strategies focusing on themes and athletes that generate positive engagement.
  6. Provide regular sentiment reports to sponsors helping them optimize their Olympic-related marketing campaigns.
  7. Collaborate with local Paris businesses to use sentiment data for improving visitor experiences during the Olympics.

Next steps

  1. Incorporate Olympics-specific features such as mentions of specific sports, athletes or events to improve classification accuracy.
  2. Create a specialized lexicon for VADER that includes Olympic-specific terms and their sentiment associations.
  3. Extend the sentiment analysis to multiple social media platforms and news sources for a more comprehensive view.
  4. Develop user-friendly and interactive dashboards for stakeholders to explore sentiment data in real-time.
  5. Set up a system to compare sentiment trends with previous Olympic events to identify unique characteristics of the Paris Olympics.
  6. Develop algorithms to automatically identify and report on significant shifts in sentiment or emerging trends.
  7. Offer training sessions for various stakeholders on how to interpret and act upon the sentiment analysis results.
  8. Set up infrastructure for continued analysis post-Olympics to track the event's lasting impact on public sentiment towards Paris and the Olympic movement.

Deployment

Check out our app by clicking on your favorite color: Paris Olympics Sentiment Analysis App

Installation

To run the application locally, follow the following steps:

Clone the repository

https:

git clone https://github.com/Atieng/sentiment-analysis-2024-olympics.git

ssh:

git clone [email protected]:Atieng/sentiment-analysis-2024-olympics.git

Navigate to the project directory

cd sentiment-analysis-2024-olympics.git

Create a virtual environment

python -m venv vader_env

Activate the virtual environment

Windows:

vader_env\Scripts\activate

MacOS/Linux:

source vader_env/bin/activate

Install dependencies

pip install -r requirements.txt

Execute the app on Streamlit

streamlit run vader.py

๐Ÿ”— Libraries and Tools Used

numpy pandas python tensorflow keras matplotlib nltk streamlit vadersentiment scikitlearn

License

MIT License

Contributing members

Contacts

Kindly don't hesitate to reach out to the team if you have any questions.

Repository Structure

Sentiment Analysis-Paris Olympics 2024/
โ”‚
โ””โ”€โ”€ Project Files/
    โ”œโ”€โ”€ .ipynb_checkpoints
    โ”œโ”€โ”€ .streamlit
    โ”œโ”€โ”€ Csv Files
    โ”œโ”€โ”€ Images
    โ”œโ”€โ”€ Models
    โ”œโ”€โ”€ Notebooks
    โ”œโ”€โ”€ the_team
    โ”œโ”€โ”€ .DS_Store
    โ”œโ”€โ”€ .gitattributes
    โ”œโ”€โ”€ .gitignore
    โ”œโ”€โ”€ LICENSE
    โ”œโ”€โ”€ README.md
    โ”œโ”€โ”€ Sentiment Analysis Presentation.pdf
    โ”œโ”€โ”€ requirements.txt
    โ”œโ”€โ”€ sentiment_analysis_paris_olympics.docx
    โ””โ”€โ”€ vader.py
      

sentiment-analysis-2024-olympics's People

Contributors

atieng avatar eva-claire avatar sheila-mulwa avatar kaluma-67 avatar elizabethmasai avatar

Watchers

 avatar

Forkers

eva-claire

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.