GithubHelp home page GithubHelp logo

ismizu / lazy_lazypredict Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 2.96 MB

Utilizing LazyPredict, Feature Engine, Feature Tools: Narrow down base models automating various aspects of the eda process. Blog post at link.

Home Page: https://medium.com/geekculture/the-lazy-lazypredict-prediction-an-exercise-in-automated-python-libraries-f1adb4cd3f5a

Jupyter Notebook 100.00%
machine-learning eda exploratory-data-analysis lazypredict featureengineering

lazy_lazypredict's Introduction

Lazy LazyPredict: An Exercise in Automated Python Libraries

LazyPredict is an excellent python library that allows the user to automatically run their data through a myriad of models. It will then output information regarding how each model performed, allowing the user to view which model might be best for their use case.

This is a great tool to save time that might otherwise be spent manually testing different models. But while exploring this tool, I began to wonder what other manual steps I could accomplish purely with python libraries. And with that thought, I dove into a test of lazier than lazypredict predictions.


Folder Structure

  • Utilized data can be found in the /data folder, retrieved from Kaggle.com
  • All images can be found in the /images folder

The following libraries will be used:

Cleaning the Data

Utilizing the MeanMedianImputer from Feature Engine:

  • After importing the data, I utilize an automatic imputer to fill in missing values for Age.

mean_impute.png

Working with Extreme Values

I then utilize Feature Engine's Winsorizer to cap extreme values.

First, I take a look at the values. I then run the Winsorizer and recheck the values to view its effect.

wins.png

Feature Engineering

Having done basic EDA, I'll now take a look at automated feature engineering with Feature Tools.

The first step will be to tell Feature Tools what each column datatype is. This will instruct it on how to create features from them. I'll also drop the Survived column and place it into a separate variable, as that is the value I am trying to predict.

categories.png

Next, I'll make an entity called survived. Feature Tools will utilize this to engineer new features.

I also create relationship that I would like Feature Tools to explore.

entities.png

Having now set the relationships, I can run Feature Tool's Deep Feature Synthesis to engineer new features.

feature_engineering.png

Testing Models with Lazy Predict

Finally, I'll instantiate the LazyClassifier and run it using a train/test split.

After running, it creates the following table:

lazy_predict.png

Final Notes

Utilizing Feature Tools, Feature Engine, and Lazy Predict, I was able to narrow down to utilizing Ada Boost Classifier for my model as well as being able to utilize a slew of engineered features.

There were a few items I would have liked to look further into, but I felt they broke the spirit of using the three libraries for nearly 100% of the EDA/feature engineering process. Overall, I found this a great exercise in utilizing the three libraries and feel that, with a little more tweaking and practice, would be great additions to my usuaal data analysis process.

lazy_lazypredict's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.