splch / qbs Goto Github PK

View Code? Open in Web Editor NEW

An effective and flexible Quantile-Based Balanced Sampling algorithm for addressing class imbalance in datasets while preserving the underlying data distribution, improving model performance across various machine learning applications.

Jupyter Notebook 98.50% Python 1.50%

qbs's Introduction

Quantile-Based Balanced Sampling Algorithm

The Quantile-Based Balanced Sampling Algorithm is a method for balancing imbalanced datasets, particularly when there is a significant disparity between the number of samples in majority and minority classes. This algorithm helps create a more balanced dataset to improve machine learning model performance by calculating quantiles for each feature and selecting the closest non-minority class samples to each quantile permutation.

Overview
Steps
Usage
Example
Contributing

Overview

The algorithm works by calculating quantiles for each feature in the dataset, generating a set of permutations of the quantiles, and selecting the closest non-minority class samples to each quantile permutation to balance the dataset. This process preserves the underlying data distribution while ensuring an equal representation of both majority and minority classes.

Steps

Count the unique non-minority class labels (c), minority class samples (m), and features (f).
Create an empty set (d) and add all minority class samples to it.
Calculate the number of quantiles (q) such that f^q=c*m.
Calculate the q quantiles for each feature.
Generate a set of all permutations of c quantiles (p).
Sort the non-minority class samples by their distance to each quantile for each feature.
For each quantile permutation in p, add the closest non-minority class sample to set d.
Return the balanced dataset d.

Usage

This algorithm can be implemented in various programming languages such as Python, R, or MATLAB. You can use it to preprocess your imbalanced dataset before feeding it to your machine learning model. Please note that you may need to adjust the algorithm according to the specific data structure and requirements of your project.

Example

Look at the qbs.py file for a sample implementation.

Contributing

We welcome contributions to improve this algorithm. Feel free to submit pull requests or raise issues to discuss potential improvements, bug fixes, or feature requests.

Recommend Projects

splch / qbs Goto Github PK

qbs's Introduction

Quantile-Based Balanced Sampling Algorithm

Table of Contents

Overview

Steps

Usage

Example

Contributing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs