GithubHelp home page GithubHelp logo

churn-prediction-kedro's Introduction

Churn Prediction with Kedro Framework

This is a Kedro repository that tackles a data science challenge of predicting customer churn for a fictional financial institution. The goal is to build an effective pipeline for a production-ready Machine Learning model to forecast customer churn accurately.

To approach this problem, it was first developed EDA, feature engineering and model training and evaluation using Jupyter Notebooks. The notebooks are located in "churn-prediction-kedro/churn-prediction/notebooks/". Feel free to visit the notebooks and check my reasoning behind the solution before running the pipeline. :)

Exaploratory Data Analysis

Feature Engineering

Model Training and Evaluation

Data Understanding:

  • The first dataset, named Abandono_clientes contains 10,000 rows and 13 columns, including a target column "Exited" with binary data (1 if the customer has churned, 0 if not).
  • The second dataset, named Abandono_teste, consists of 1,000 rows and 12 columns, excluding the Exited column.

Key Concepts:

Customer Churn: Churn refers to the phenomenon of customers discontinuing their relationship with a company or service. In this context, it represents customers who have abandoned the financial institution.

Features: The dataset contains various features or attributes that provide information about the customers. Features include Row Number, Customer Id, Surname, Credit Score, Geography, Gender, Age, Tenure (duration of the customer's relationship with the bank), Balance, Number of Products Held, Has a Credit Card, Is Active Member and Estimated salary.

Exited: The target variable Exited indicates whether a customer has churned (1) or not (0).

Performance Metrics: To assess the effectiveness of the model, various evaluation metrics are used, including accuracy, precision, recall, F1-score, and AUC-ROC curve. These metrics help gauge the model's predictive capability and its ability to correctly identify customers who are likely to churn.

Getting started

Please note that this project was initially developed using Python 3.10.6 and on the Ubuntu operating system.

Clone the repository

To clone the repository and set up the development environment, follow the steps below:

  1. Clone the repository using the command:

    git clone https://github.com/laizaparizotto/churn-prediction-kedro.git
    
  2. Change to the cloned repository directory:

    cd churn-prediction-kedro
    
  3. Create a virtual environment using venv:

    python -m venv .venv
    
  4. Activate the virtual environment:

    • For Windows:
      .venv\Scripts\activate
      
    • For macOS and Linux:
      source .venv/bin/activate
      

Now you have successfully cloned the repository and set up the virtual environment. You can proceed with the next steps as described in the project documentation.

Install Kedro

To install Kedro, run: For more information, please check Kedro Installation Documentation

cd churn-prediction/
pip install kedro

Install dependencies

All necessary dependencies are located in src/requirements.txt.

To install them, run:

pip install -r src/requirements.txt

How to run the pipeline

You can run the Kedro project with:

kedro run

This will run the pipeline, which consists in data loading, preprocessing, training and evaluating RandomForestClassifier, and finally prediciting for the test set.

Final results will be stored at '/churn-prediction/data/07_model_output/resultado_teste.csv' *

Interactive Visualization

You can acess the interactive visualization with

kedro viz

The final pipeline can be seen below:

churn-prediction-kedro's People

Contributors

laizaparizotto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.