GithubHelp home page GithubHelp logo

customersegmentationbertelsmann's Introduction

udacity_data_science_project4

Motivation

This project shows how customer data is used from the Bertelsmann/Arvato dataset.
The data is used to finish two main tasks, which are described below.

Task 1 - Customer Segmentation: The aim of this task was to use demographic data from the population of Germany and from customer data of Bertelsmann/Arvato to find similarities and differences in the data to find possible customers for a mail order campaign.
To achieve that, unsupervised learning methods were used. First dimension reduction methods were used and tested against each other. Afterwards, KMeans clustering algorithm was used to cluster the data. PCA in combination with KMeans performed best. It could be observed, that the data from the populations was homogeneously separated and the customers had just a few specific clusters.

Task 2 - Mailorder Classification: For the second task, we got data from a email campaign which was already performed and where we already got responses of the customers. The data was additionally enriched with the cluster-labels from the trained clustering pipeline trained in part 1.
A SVM-classifier was used to classify the data. The data was highly imbalanced, which was also a challenge which was solved with upsampling the data.
The F1-Score on the training data was 0.7943.

Table 1: Confusion matrix for the classification

P N
P 353 9900
N 71 23890

Libraries

The analysis makes use of the most known python libraries, namely:

  • pandas
  • numpy
  • matplotlib
  • sklearn

In addition pickle is used to save the results from the classification task 2.

Files

The repo contains one directory (data), where the results of the classification and the data from udacity are stored. The udacity data should not be checked in, since the data has a high volume.

The code and the report is documented in the Arvato Project Workbook.ipynb

The two DIAS*.xlsx files, describe the features, as well as the meaning of their values.

The data/RESULT_mailout_test.pkl file is a pickle-file, where the classification of the test dataset is stored.

Summary

A clustering pipeline was implemented, which consists of a dimension reduction step and clustering step. The data was then enriched by the clusters, that the data was separated in and a classification pipeline was train with that data. Then the performance of the data was tested on the train data set and was optimized using cross validation and grid-search. The results of the labels on the Test set are stored in data/RESULT_mailout_test.pkl.

Acknowledgments

Thanks to Bertelsmann and Arvato for providing this huge dataset and giving udacity students the opportunity to experience such an interesting, real life use-case of customer segmentation.

customersegmentationbertelsmann's People

Contributors

kaandrn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.