GithubHelp home page GithubHelp logo

lab-customer-analysis-round-4's Introduction

logo_ironhack_blue 7

Lab | Customer Analysis Round 4

In today's lesson we talked about continuous distributions (mainly normal distribution), linear regression and how multicollinearity can impact the model. In this lab, we will test your knowledge on those things using the marketing_customer_analysis.csv file. You have been using the same data in the previous labs (round 2 and 3). You can continue using the same jupyter file. The file can be found in the files_for_lab folder.

Get the data

Use the jupyter file from the last lab (Customer Analysis Round 3)

Complete the following task

  • Check the data types of the columns. Get the numeric data into dataframe called numerical and categorical columns in a dataframe called categoricals. (You can use np.number and np.object to select the numerical data types and categorical data types respectively)
  • Now we will try to check the normality of the numerical variables visually
    • Use seaborn library to construct distribution plots for the numerical variables
    • Use Matplotlib to construct histograms
    • Do the distributions for different numerical variables look like a normal distribution
  • For the numerical variables, check the multicollinearity between the features. Please note that we will use the column total_claim_amount later as the target variable.
  • Drop one of the two features that show a high correlation between them (greater than 0.9). Write code for both the correlation matrix and for seaborn heatmap. If there is no pair of features that have a high correlation, then do not drop any features

lab-customer-analysis-round-4's People

Contributors

haggarw3 avatar ironhack-edu avatar sandrabosk avatar

Watchers

 avatar  avatar

lab-customer-analysis-round-4's Issues

Task a bit vague

@sandrabosk Students freaked out a bit about the 'Normality check' task. Maybe, make it clear that you can just visually check for normality, plotting the distribution and seeing if it's bell shaped, also,maybe offer some extra material in case they want to check normality with pvalue or so (pvalue class is on the day after this lab).

suggestion - new instructions

Regarding the upcoming Lab Customer Analysis Round 3 : I inspected it and reworked the instructions. Please take the same data, but take these instructions:
For this lab, we still keep using the marketing_customer_analysis.csv file. You can find the file in the files_for_lab folder.
Get the data
Use the same jupyter file from the last lab, Customer Analysis Round 3
EDA (Exploratory Data Analysis) - Complete the following tasks to explore the data:

  • Show DataFrame info.
  • Describe DataFrame.
  • Show a plot of the total number of responses broken down by response kind.
  • Show a plot of the response rate number of responses by the sales channel.
  • Show a plot of the response rate by the total claim amount. Show a plot of the distribution of the total claim amount, broken down by response kind. Try a boxplot and distribution plot, for each response kind. For the distribution plot, try to plot both kinds of responses in one chart (you can try with seaborn's histplot, using the hue parameter).
  • Show a plot of the response rate by income. Create similar plots like in the task before, but for Income
  • NEW: Create a scatterplot between total claim amount and income. Play around with the parameters of the scatterplot (markersize? alpha?) and try to identify more features within the data just visually. You can also try different seaborn plots. Check to find suitable ones: https://www.python-graph-gallery.com/134-how-to-avoid-overplotting-with-python
  • flo rewrote the instructions above

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.