GithubHelp home page GithubHelp logo

wanadzhar913 / rfm-analysis-using-online-retail2 Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 44.04 MB

Repo for RFM (Recency, Frequency & Monetary) analysis and customer segmentation using the Online Retail II dataset from the UCI Machine Learning Repository.

Jupyter Notebook 100.00%
kmeans-clustering rfm-analysis

rfm-analysis-using-online-retail2's Introduction

TLDR

The Online Retail II data set contains all the transactions occurring for a UK-based, registered, non-store online retail company between 01/12/2009 and 09/12/2011. They mainly sell unique all-occasion gift-ware.

Hence, I've attempted to conduct RFM (Recency, Frequency & Monetary) analysis and cluster their customers to come up with marketing strategy recommendations.

Outcomes of clustering & recommendations

Cluster Cluster Size Recency Monetary Frequency
0 1534 234.98 1915.75 4.97
1 1228 27.37 10464.66 18.89
2 1168 28.09 837.08 3.07
3 1948 389.99 311.87 1.31
  • Cluster 1 seems to be best cluster although it is the 2nd smallest. This group spends more money and made the most transactions, and in average had their last transactions 27 days ago. These are the regular customers so loyalty programmes that encourage long-term patronage is a suitable campaign for them.

  • Cluster 0 is the next best cluster, because they make the 2nd most amount of money. However the customers in that group haven't come for a while. Hence, more targeted marketing efforts focused on building brand awareness and engagement is the recommendation for the long term. Over time, promotions e.g., free shipping or discounts can eventually be made to induce loyalty and patronage.

  • Cluster 2 can be characterised as those that spend less money, but order relatively frequently (as their last transactions a month ago). Promotions and discounts are likely to work well for this group to encourage them to occasionally spend more.

  • Customers in Cluster 3 have likely churned as they haven't spent much and haven't stopped by in a long time. This is worrisome as they make up the largest cluster.

Methodology

The RFM metrics were calculated as follows:

  • Recency: The number of days since the last purchase for each customer.
  • Frequency: How often each customer makes a purchase.
  • Monetary: The total amount of money each customer has spent.

After applying a logarithm transform (since all the metrics above were skewed to the right), we applied a Standard Scaler transformation to ensure uniform distribution between the variables. We finally used the elbow and silhouette method to find the optimal number of clusters. The K-Means algorithm was then used to cluster the customers by their RFM metrics.

num_cluster

Future Recommendations

  • Future explorations can perhaps benefit from inspect if there's any meaningful differences between customers with an ID and those that don't. Analysis can be done on the types of products being bought, the number of items per purchase as well as their overall purchase value.
  • The data set can benefit from greater data quality by ensuring that each Customer has an ID, as well as ensuring stricter procedures for logging negative quantity/price values. Both of which were prevalent in the data set. While we can make reasonable assumptions that these are return/cancelled items, perhaps a column marking that transaction as a return item (not just for cancelled ones) will be useful.

rfm-analysis-using-online-retail2's People

Contributors

wanadzhar913 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.