GithubHelp home page GithubHelp logo

bluecast's Introduction

BlueCast

codecov Codecov workflow pre-commit Code style: black Checked with mypy pydocstyle Documentation Status PyPI version Optuna python python PRs Welcome

BlueCast

A lightweight and fast auto-ml library, that helps data scientists tackling real world problems from EDA to model explainability and even uncertainty quantification. BlueCast focuses on a few model architectures (on default Xgboost only) and a few preprocessing options (only what is needed for Xgboost). This allows for a much faster development cycle and a much more stable codebase while also having as few dependencies as possible for the library. Despite being lightweight in its core BlueCast offers high customization options for advanced users. Find the full documentation here.

Here you can see our test coverage in more detail:

Codecov sunburst

Philosophy

There are plenty of excellent automl solutions available. With BlueCast we don't follow the usual path ("Give me your data, we return the best model ensemble out of X algorithms"), but have the real world data scientist in mind. Our philosophy can be summarized as such:

  • automl should not be a black box
  • automl shall be a help rather than a replacement
  • automl shall not be a closed system
  • automl should be easy to install
  • explainability over another after comma digit in precision
  • real world value over pure performance

We support our users with an end-to-end toolkit, allowing fast and rich EDA, modelling at highest convenience, explainability, evaluation and even uncertainty quantification.

What BlueCast has to offer

Basic usage

from bluecast.blueprints.cast import BlueCast

automl = BlueCast(
        class_problem="binary",
    )

automl.fit(df_train, target_col="target")
y_probs, y_classes = automl.predict(df_val)

# from version 0.95 also predict_proba is directly available (also for BlueCastCV)
y_probs = automl.predict_proba(df_val)

Convenience features

Despite being a lightweight library, BlueCast also includes some convenience with the following features:

  • rich library of EDA functions to visualize and understand the data
  • plenty of customization options via an open API
  • inbuilt uncertainty quantification framework (conformal prediction)
  • hyperparameter tuning (with lots of customization available)
  • automatic feature type detection and casting
  • automatic DataFrame schema detection: checks if unseen data has new or missing columns
  • categorical feature encoding (target encoding or directly in Xgboost)
  • datetime feature encoding
  • automated GPU availability check and usage for Xgboost a fit_eval method to fit a model and evaluate it on a validation set to mimic production environment reality
  • functions to save and load a trained pipeline
  • shapley values
  • ROC AUC curve & lift chart
  • warnings for potential misconfigurations

The fit_eval method can be used like this:

from bluecast.blueprints.cast import BlueCast

automl = BlueCast(
        class_problem="binary",
    )

automl.fit_eval(df_train, df_eval, y_eval, target_col="target")
y_probs, y_classes = automl.predict(df_val)

It is important to note that df_train contains the target column while df_eval does not. The target column is passed separately as y_eval.

Kaggle competition results and example notebooks

Even though BlueCast has been designed to be a lightweight automl framework, it still offers the possibilities to reach very good performance. We tested BlueCast in Kaggle competitions to showcase the libraries capabilities feature- and performance-wise.

  • ICR top 20% finish with over 6000 participants (notebook)
  • An advanced example covering lots of functionalities (notebook)
  • PS3E23: Predict software defects top 12% finish (notebook)
  • PS3E25: Predict hardness of steel via regression (notebook)
  • PS4E1: Bank churn top 13% finish (notebook)
  • A comprehensive guide about BlueCast showing many capabilities (notebook)
  • BlueCast using a custom Catboost model for quantile regression (notebook)

About the code

Code quality

To ensure code quality, we use the following tools:

  • various pre-commit libraries
  • strong type hinting in the code base
  • unit tests using Pytest

For contributors, it is expected that all pre-commit and unit tests pass. For new features it is expected that unit tests are added.

Documentation

Documentation is provided via Read the Docs On GitHub we offer multiple ReadMes to cover all aspects of working with BlueCast, covering:

How to contribute

Contributions are welcome. Please follow the following steps:

  • Get in touch with me (i.e. via LinkedIn) if longer contribution is of interest
  • Create a new branch from develop branch
  • Add your feature or fix
  • Add unit tests for new features
  • Run pre-commit checks and unit tests (using Pytest)
  • Adjust the docs/source/index.md file
  • Copy paste the content of the docs/source/index.md file into the README.md file
  • Push your changes and create a pull request

If library or dev dependencies have to be changed, adjust the pyproject.toml. For readthedocs it is also requited to update the docs/srtd_requirements.txt file. Simply run:

poetry export --with dev -f requirements.txt --output docs/rtd_requirements.txt

If readthedocs will be able to create the documentation can be tested via:

poetry run sphinx-autobuild docs/source docs/build/html

This will show a localhost link containing the documentation.

Supports us

Being a small open source project we rely on the community. Please consider giving us a GitHb star and spread the word. Also your feedback will help the project evolving.

Meta

Creator: Thomas Meißner – LinkedIn

bluecast's People

Contributors

thomasmeissnerds avatar thomasmeissnercrm avatar

Stargazers

Gamze Akkurt avatar Elena Onischenko avatar  avatar Matt OP avatar Dragos Ionescu avatar Alex Sem avatar Constantine S avatar Viktoria Trifanova avatar  avatar codewalebaba avatar Eugene Tealon avatar  avatar Dee avatar ThePseudoQuant avatar  avatar Bobo Jamson avatar lash_fire avatar Venkat Sai Suman Lamba Karanam avatar  avatar Mensur Dlakic avatar Irati Urabain avatar Kevin Bönisch avatar Arka Dey avatar valeman avatar Jeff Carpenter avatar Carl McBride Ellis avatar artemka avatar  avatar Andrey Stebenkov avatar Mykhailo Kafka avatar Vitaliia avatar Charlie Lew avatar Samvel K. avatar  avatar  avatar  avatar Hermengardo avatar daniel avatar Giorgos Zachariadis avatar  avatar Gabriel Alves avatar Luca Massaron avatar Ruslan Uzyakaev avatar Vitchakorn Poonyakanok avatar Saurabh Birari avatar Ian Chu Te avatar Nikolay Nikitin avatar  avatar Sharan Shetty avatar Kunal Khurana avatar Ale avatar Jordi avatar Kamil Mielczarek avatar Hriday Mehta avatar Emith avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

bluecast's Issues

Incorporating Mutual Information for EDA

Mutual Information (MI) is a very useful tool for Feature Engineering. The MI between two quantities is a measure of the extent to which knowledge of one quantity reduces uncertainty about the other. I believe it can be helpful if we can have MI with BlueCast.

installation error

unable to install it on Pyton 3.12.

can you fix this? or help me to install?
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.