GithubHelp home page GithubHelp logo

cococokoko / awesome-misinformation-detection Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 0.0 44.34 MB

Detecting misleading tweets with Decision Tree, Random Forest, LassoCV, KNeighborsClassifier and OpenAI Api ada model

Jupyter Notebook 100.00%
decision-tree lasso-regression misinformation misleadinginformation openai openai-api random-forest tweets twitter fake-news

awesome-misinformation-detection's Introduction

AWESOME-misinformation-detection

As part of our Data Science & AI specialisation we chose to tackle the problem of fake news on twitter. We found to public datasets with labelled tweets and applied different classification algorithms.

Why it is relevant?

  1. Addressing misinformation: Misinformation and fake news are pervasive in today's digital age and can have significant societal implications. They might conscious and unconsciously form our opinions and political viewpoints (e.g Covid-19 or elections). Fake news might even cause chaos, panic or real-life fear. By developing and testing a classification model to identify fake news in tweets, we contribute to the ongoing efforts to combat misinformation and promote accurate information sharing.

  2. Social media impact: Twitter, as a widely used social media platform, plays a significant role in shaping public opinion and discourse. Fake news on Twitter can spread rapidly and influence public perceptions and decision-making. By focusing on tweets specifically, our project addresses the unique challenges posed by fake news dissemination on this platform.

  3. Enhancing trust and credibility: Developing effective tools to identify fake news can help improve the trustworthiness and credibility of information shared on social media. By providing a reliable classification model, our project can contribute to creating a more informed and discerning digital society.

  4. Algorithmic transparency and fairness: Testing and evaluating our classification model on multiple datasets can help uncover biases or limitations that may affect its performance. By actively addressing biases and striving for fairness, we contribute to the responsible development and deployment of AI algorithms, which is a crucial aspect of AI ethics.

  5. Generalizability: Fake news detection is a challenging problem, and testing our model on multiple datasets helps assess its generalizability. By demonstrating the effectiveness of our classification model across different datasets, we provide valuable insights into its potential real-world applicability.

Infodemic Dataset

The first dataset we used was taken from a paper called Fighting an Infodemic. The authors manually annotated a dataset of 10,700 social media posts and articles of real and fake news on COVID-19.

TruthSeeker Dataset

The second dataset is one of the most extensive benchmark datasets with more than 180.000 labels from 2009 to 2022 from a paper called TruthSeeker: The Largest Social Media Ground-Truth Dataset for Real/Fake Content.

Preprocessing, Models Implemented and Evaluation Metrics

Approach

Summary

  • The OpenAI model outperformed all other models, however the model is lacking explainability
  • Classical models:
    • the best approach for analysing text was found to be a combination of semantic features, Twitter metadata, and sentiment analysis
  • Assessing the models generalizability:
    • the classic models are demonstrating higher effectiveness across different datasets
    • potential better suited for real-world applicability

Further Resources:

awesome-misinformation-detection's People

Contributors

cococokoko avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.