GithubHelp home page GithubHelp logo

kennethleungty / statsassume Goto Github PK

View Code? Open in Web Editor NEW
6.0 2.0 0.0 20.34 MB

Automating Assumption Checks for Regression Models (Work in Progress, Currently Paused)

License: MIT License

Python 100.00%
statsassume regression assumption-check assumptions linear-regression logistic-regression data-science machine-learning statistics statsmodels

statsassume's Introduction

StatsAssume

Automating Assumption Checks for Regression Models (WORK IN PROGRESS)

GitHub Workflow Status PyPI

FeaturesDownloadUsageMotivationContributingUpcoming

Features

StatsAssume automates the assumption checks of regression models (e.g., linear and logistic regression) on your data and displays the results in an elegant dashboard. 

  • Automatically detects regression task (and relevant assumption checks) based on the target variable of dataset.

  • Automatically executes statistical tests and visual plots of assumption checks relevant to the regression task.

  • Generates clear visual output of results in a beautiful dashboard (built on Jupyter-Dash).

  • Displays insightful information on assumption concepts and possible fixes for assumption violations.

  • Able to automatically encode categorical variables to create dataset suitable for regression modelling (unless specified otherwise).



Download

pip install statsassume

Usage

Quickstart

from statsassume import Check
from statsassume.datasets import load_data

df = load_data('Fish_processed')  # Get toy dataset (pre-processed)

assume = Check(df, target='Weight')  # Initiate Check class and define target variable
assume.report()  # Run assumption checks and generate dashboard report

NOTE: Data should ideally be pre-processed before running StatsAssume assumption checks.

Toy datasets available in StatsAssume can be found HERE

Comprehensive Usage

  • While pre-processing should ideally be performed prior, StatsAssume comes with automatic encoding of categorical variables so that we can quickly commence model runs and assumption checks
  • Here's how to put the Check class (core object of StatsAssume) to its best use:
df = load_data('Fish')  # Get toy dataset (raw)

assume = Check(df=df, 
               target='Weight',
               task='linear regression',
               predictors=['Height', 'Width', 'Length1', 'Species'],
               keep=True,
               categorical_features=['Species'],
               categorical_encoder='ohe',
               mode='inline')

Attributes

  • df: pd.DataFrame
    Dataset (in pandas DataFrame format)

  • target: str
    Column name of target (dependent) variable

  • task: str
    Type of regression task to be performed. Options include: 'linear regression'(More tasks to come soon). If None specified, task will be automatically determined based on target variable.

  • predictors: list
    List of column names of predictor (independent) features. If None specified, all columns other than target will be regarded as predictors

  • keep: bool
    If True, variables in predictors list will be kept as predictor variables, and other non-target variables will be dropped. If False, variables in predictors list will be dropped, and other non-target variables will be retained. Default is True.

  • categorical_features: list
    List of column names deemed categorical, so that appropriate encoding can be performed. If None specified, the categorical variables will be automatically detected and encoded into numerical format for regression modelling. Default is None.

  • categorical_encoding: str
    Type of encoding technique to be performed on categorical variables. Options include: ohe (i.e. one-hot encoding) and ord (i.e. ordinal encoding). Default is ohe.

  • mode: str
    Type of display for dashboard report. Options include inline (displayed as output directly in Jupyter notebook), external (displayed in a new full-screen browser tab), or jupyterlab (displayed in separate tab right inside JupyterLab). Default is inline.

Notes

  • Only df and target attributes are compulsory


Motivation

  • Tedious to perform assumption checks manually
  • Lack of rigour and consistency in references and notebooks online


Contributing

  1. Have a look at the existing Issues and Pull Requests that you would like to help with.
  2. Clone repo and create a new branch: $ git checkout https://github.com/kennethleungty/statsassume -b name_of_new_branch.
  3. Make changes and test
  4. Submit Pull Request with comprehensive description of changes

If you would like to request a feature or report a bug, please create a GitHub Issue.

See full contribution guide →

Upcoming

statsassume's People

Contributors

kennethleungty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.