GithubHelp home page GithubHelp logo

pradnya1208 / telecom-customer-churn-prediction Goto Github PK

View Code? Open in Web Editor NEW
60.0 1.0 18.0 8.84 MB

Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. With the help of ML classification algorithms, we are going to predict the Churn.

Jupyter Notebook 100.00%
customer-churn-prediction customer-churn-analysis classification classification-algorithm feature-engineering cross-validation model-evaluation roc-auc gridsearchcv sklearn

telecom-customer-churn-prediction's Introduction

github linkedin tableau twitter

Telecom Customer Churn Prediction

Intro

What is Customer Churn?

Customer churn is defined as when customers or subscribers discontinue doing business with a firm or service.

Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. The telecommunications business has an annual churn rate of 15-25 percent in this highly competitive market.

Individualized customer retention is tough because most firms have a large number of customers and can't afford to devote much time to each of them. The costs would be too great, outweighing the additional revenue. However, if a corporation could forecast which customers are likely to leave ahead of time, it could focus customer retention efforts only on these "high risk" clients. The ultimate goal is to expand its coverage area and retrieve more customers loyalty. The core to succeed in this market lies in the customer itself.

Customer churn is a critical metric because it is much less expensive to retain existing customers than it is to acquire new customers.

To detect early signs of potential churn, one must first develop a holistic view of the customers and their interactions across numerous channels.As a result, by addressing churn, these businesses may not only preserve their market position, but also grow and thrive. More customers they have in their network, the lower the cost of initiation and the larger the profit. As a result, the company's key focus for success is reducing client attrition and implementing effective retention strategy.

Objectives:

  • Finding the % of Churn Customers and customers that keep in with the active services.
  • Analysing the data in terms of various features responsible for customer Churn
  • Finding a most suited machine learning model for correct classification of Churn and non churn customers.

Dataset:

Telco Customer Churn

The data set includes information about:

  • Customers who left within the last month – the column is called Churn
  • Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
  • Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
  • Demographic info about customers – gender, age range, and if they have partners and dependents

Implementation:

Libraries: sklearn, Matplotlib, pandas, seaborn, and NumPy

Few glimpses of EDA:

1. Churn distribution:

Churn distribution 26.6 % of customers switched to another firm.

2. Churn distribution with respect to gender:

Churn distribution wrt Gender

There is negligible difference in customer percentage/count who chnaged the service provider. Both genders behaved in similar fashion when it comes to migrating to another service provider/firm.`

3. Customer Contract distribution:

Customer contract distribution About 75% of customer with Month-to-Month Contract opted to move out as compared to 13% of customrs with One Year Contract and 3% with Two Year Contract

4. Payment Methods:

Distribution of Payments methods Churn wrt payment methods

Major customers who moved out were having Electronic Check as Payment Method. Customers who opted for Credit-Card automatic transfer or Bank Automatic Transfer and Mailed Check as Payment Method were less likely to move out.

5. Internet services:

Several customers choose the Fiber optic service and it's also evident that the customers who use Fiber optic have high churn rate, this might suggest a dissatisfaction with this type of internet service. Customers having DSL service are majority in number and have less churn rate compared to Fibre optic service. Churn distribution w.r.t Internet services and Gender

6. Dependent distribution:

Customers without dependents are more likely to churn. Churn distribution w.r.t dependents

7. Online Security:

As shown in following graph, most customers churn due to lack of online security Churn distribution w.r.t online security

8. Senior Citizen:

Most of the senior citizens churn; the number of senior citizens are very less in over all customer base. Churn distribution w.r.t Senior Citizen

9. Paperless Billing:

Customers with Paperless Billing are most likely to churn. Churn distribution w.r.t mode of billing

10. Tech support:

As shown in following chart, customers with no TechSupport are most likely to migrate to another service provider. Churn distribution w.r.t Tech support

11. Distribution w.r.t Charges and Tenure:

Monthly Charges Total Charges Tenure

Customers with higher Monthly Charges are also more likely to churn.
New customers are more likely to churn.

Machine Learning Model Evaluations and Predictions:

ML Algorithms

Results after K fold cross validation:

Logistic Regression KNN Naive Bayes Decision Tree Random Forest Adaboost Gradient Boost Voting Classifier

Confusion Matrix

Final Model: Voting Classifier

  • We have selected Gradient boosting, Logistic Regression, and Adaboost for our Voting Classifier.
    from sklearn.ensemble import VotingClassifier
    clf1 = GradientBoostingClassifier()
    clf2 = LogisticRegression()
    clf3 = AdaBoostClassifier()
    eclf1 = VotingClassifier(estimators=[('gbc', clf1), ('lr', clf2), ('abc', clf3)], voting='soft')
    eclf1.fit(X_train, y_train)
    predictions = eclf1.predict(X_test)
    print("Final Accuracy Score ")
    print(accuracy_score(y_test, predictions))
Final Score 
{'LogisticRegression': [0.841331397558646, 0.010495252078550477],
 'KNeighborsClassifier': [0.7913242024807321, 0.008198993337848612],
 'GaussianNB': [0.8232386881685605, 0.00741678015498337],
 'DecisionTreeClassifier': [0.6470213137060805, 0.02196953973039052],
 'RandomForestClassifier': [0.8197874155380965, 0.011556155864106703],
 'AdaBoostClassifier': [0.8445838813774079, 0.01125665302188384],
 'GradientBoostingClassifier': [0.844630629931458, 0.010723107447558198],
 'VotingClassifier': [0.8468096379573085, 0.010887508320460332]}

  • Final confusion matrix we got:

From the confusion matrix we can see that: There are total 1383+166=1549 actual non-churn values and the algorithm predicts 1400 of them as non churn and 149 of them as churn. While there are 280+280=561 actual churn values and the algorithm predicts 280 of them as non churn values and 281 of them as churn values.

Optimizations

We could use Hyperparamete Tuning or Feature enginnering methods to improve the accuracy further.

Feedback

If you have any feedback, please reach out at [email protected]

🚀 About Me

Hi, I'm Pradnya! 👋

I am an AI Enthusiast and Data science & ML practitioner

github linkedin tableau twitter

telecom-customer-churn-prediction's People

Contributors

pradnya1208 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

telecom-customer-churn-prediction's Issues

AttributeError

#Set and compute the Correlation Matrix:
sns.set(style="white")
corr = data2.corr()

#Generate a mask for the upper triangle:

mask = np.zeros_like(corr, dtype=np.bool)
mask[np.triu_indices_from(mask)] = True

#Set up the matplotlib figure and a diverging colormap:
f, ax = plt.subplots(figsize=(18, 15))
cmap = sns.diverging_palette(220, 10, as_cmap=True)

#Draw the heatmap with the mask and correct aspect ratio:
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,square=True,annot = True, linewidths=.5, cbar_kws={"shrink": .5})

AttributeError Traceback (most recent call last)
Cell In[43], line 7
3 corr = data2.corr()
5 #Generate a mask for the upper triangle:
----> 7 mask = np.zeros_like(corr, dtype=np.bool)
8 mask[np.triu_indices_from(mask)] = True
10 #Set up the matplotlib figure and a diverging colormap:

File ~\anaconda3\Lib\site-packages\numpy_init_.py:305, in getattr(attr)
300 warnings.warn(
301 f"In the future np.{attr} will be defined as the "
302 "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
304 if attr in former_attrs:
--> 305 raise AttributeError(former_attrs[attr])
307 # Importing Tester requires importing all of UnitTest which is not a
308 # cheap import Since it is mainly used in test suits, we lazy import it
309 # here to save on the order of 10 ms of import time for most users
310 #
311 # The previous way Tester was imported also had a side effect of adding
312 # the full numpy.testing namespace
313 if attr == 'testing':

AttributeError: module 'numpy' has no attribute 'bool'.
np.bool was a deprecated alias for the builtin bool. To avoid this error in existing code, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations @Pradnya1208

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.