GithubHelp home page GithubHelp logo

arvinsingh / biases-in-data Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 9.1 MB

Experiments on biases in data & models

Jupyter Notebook 99.92% Python 0.08%
bias-correction convolutional-neural-networks keras-tensorflow machine-learning natural-language-processing

biases-in-data's Introduction

Biases in data and models.

This repository explores the topic of biases and abuses in data and aims to study their effects on various experiments. The experiments will be conducted using Jupyter Notebook to analyze and understand the impact of biases in data and find ways to minimize them.

Tech Stack

  1. Keras with TensorFlow
  2. Numerical Python Stack
  3. Word2Vec
  4. Scikit-Learn
  5. Jupyter

Datasets

  1. Cat Vs Dog
  2. Titanic Dataset
  3. Statlog - German Credit Data

Introduction

In today's data-driven world, it is crucial to be aware of the biases and abuses that can exist within datasets. Biases can arise from various sources, such as data collection methods, sampling techniques, or even human judgment. These biases can lead to skewed results and unfair outcomes, impacting decision-making processes and perpetuating inequalities.

The purpose of this project is to shed light on the presence of biases and abuses in data & trained model and explore ways to mitigate their effects.

Topics to explore

  1. Bias in Natural Language Processing models.
  2. Convolutional Neural Network Manifold Learning.
  3. Global Black-box Explanation.
  4. Local Black-box Explanation.
  5. FairML

Biases in Data

Biases in data can occur in different forms, including:

  • Selection Bias: When certain groups or characteristics are overrepresented or underrepresented in the dataset due to biased sampling methods.
  • Confirmation Bias: When data is selectively collected or interpreted to support preconceived notions or beliefs.
  • Measurement Bias: When measurement instruments or techniques introduce systematic errors or inaccuracies.
  • Cultural Bias: When data reflects the biases and perspectives of a particular culture or group.

Experimental Setup

The experiments will be conducted using Jupyter Notebook, a popular tool for data analysis and visualization. The datasets used in the experiments will be carefully selected to highlight different types of biases and potential abuses. The code and analysis will be documented in the Jupyter Notebook files provided in this repository.

Results and Analysis

The results obtained from the experiments will be analyzed to identify the presence and impact of biases in the data. Various statistical techniques and machine learning algorithms will be used to quantify and understand the biases. Additionally, strategies and methodologies to minimize biases and improve the fairness of the data will be explored.

Conclusion

By studying biases and abuses in data, I aim to raise awareness about their existence and impact on decision-making processes. Through rigorous experimentation and analysis, I strive to develop best practices and guidelines to minimize biases and promote fairness in data-driven applications.

Please refer to the Jupyter Notebook files in this repository for detailed experiments, code, and analysis.

Insights

In the form of Critical Questions/Discussions at the end of each Notebook.

biases-in-data's People

Contributors

arvinsingh avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.