GithubHelp home page GithubHelp logo

8-week-sql-challenge's Introduction

🍔 8 Week SQL Challenge

Thanks @DataWithDanny for the excellent SQL case studies! 👋🏻

#8WeekSQLChallenge

📚 Table of Contents


Case Study #1: Danny's Diner

Image

View the case study here and my solution here and on Medium.

Business Task:

Danny wants to use the data to answer a few simple questions about his customers, especially about their visiting patterns, how much money they’ve spent and also which menu items are their favourite.

Entity Relationship Diagram:

image

Case Study Questions:

Click here to expand!
  1. What is the total amount each customer spent at the restaurant?
  2. How many days has each customer visited the restaurant?
  3. What was the first item from the menu purchased by each customer?
  4. What is the most purchased item on the menu and how many times was it purchased by all customers?
  5. Which item was the most popular for each customer?
  6. Which item was purchased first by the customer after they became a member?
  7. Which item was purchased just before the customer became a member?
  8. What is the total items and amount spent for each member before they became a member?
  9. If each $1 spent equates to 10 points and sushi has a 2x points multiplier - how many points would each customer have?
  10. In the first week after a customer joins the program (including their join date) they earn 2x points on all items, not just sushi - how many points do customer A and B have at the end of January?

Case Study #2: Pizza Runner

Image

View the case study here and my solution here and on Medium.

Business Task:

Danny is expanding his new Pizza Empire and at the same time, he wants to Uberize it, so Pizza Runner was launched!

Danny started by recruiting “runners” to deliver fresh pizza from Pizza Runner Headquarters (otherwise known as Danny’s house) and also maxed out his credit card to pay freelance developers to build a mobile app to accept orders from customers.

Entity Relationship Diagram:

image

Case Study Questions:

A. Pizza Metrics

Click here to expand!
  1. How many pizzas were ordered?
  2. How many unique customer orders were made?
  3. How many successful orders were delivered by each runner?
  4. How many of each type of pizza was delivered?
  5. How many Vegetarian and Meatlovers were ordered by each customer?
  6. What was the maximum number of pizzas delivered in a single order?
  7. For each customer, how many delivered pizzas had at least 1 change and how many had no changes?
  8. How many pizzas were delivered that had both exclusions and extras?
  9. What was the total volume of pizzas ordered for each hour of the day?
  10. What was the volume of orders for each day of the week?

B. Runner and Customer Experience

Click here to expand!
  1. How many runners signed up for each 1 week period? (i.e. week starts 2021-01-01)
  2. What was the average time in minutes it took for each runner to arrive at the Pizza Runner HQ to pickup the order?
  3. Is there any relationship between the number of pizzas and how long the order takes to prepare?
  4. What was the average distance travelled for each customer?
  5. What was the difference between the longest and shortest delivery times for all orders?
  6. What was the average speed for each runner for each delivery and do you notice any trend for these values?
  7. What is the successful delivery percentage for each runner?

C. Ingredient Optimisation

Click here to expand!
  1. What are the standard ingredients for each pizza?
  2. What was the most commonly added extra?
  3. What was the most common exclusion?
  4. Generate an order item for each record in the customers_orders table in the format of one of the following:
  • Meat Lovers
  • Meat Lovers - Exclude Beef
  • Meat Lovers - Extra Bacon
  • Meat Lovers - Exclude Cheese, Bacon - Extra Mushroom, Peppers
  1. Generate an alphabetically ordered comma separated ingredient list for each pizza order from the customer_orders table and add a 2x in front of any relevant ingredients
  2. For example: "Meat Lovers: 2xBacon, Beef, ... , Salami"
  3. What is the total quantity of each ingredient used in all delivered pizzas sorted by most frequent first?

D. Pricing and Ratings

Click here to expand!
  1. If a Meat Lovers pizza costs $12 and Vegetarian costs $10 and there were no charges for changes - how much money has Pizza Runner made so far if there are no delivery fees?
  2. What if there was an additional $1 charge for any pizza extras?
  • Add cheese is $1 extra
  1. The Pizza Runner team now wants to add an additional ratings system that allows customers to rate their runner, how would you design an additional table for this new dataset generate a schema for this new table and insert your own data for ratings for each successful customer order between 1 to 5.
  2. Using your newly generated table - can you join all of the information together to form a table which has the following information for successful deliveries?
  • customer_id
  • order_id
  • runner_id
  • rating
  • order_time
  • pickup_time
  • Time between order and pickup
  • Delivery duration
  • Average speed
  • Total number of pizzas
  1. If a Meat Lovers pizza was $12 and Vegetarian $10 fixed prices with no cost for extras and each runner is paid $0.30 per kilometre traveled - how much money does Pizza Runner have left over after these deliveries?

E. Bonus Questions

Click here to expand!

If Danny wants to expand his range of pizzas - how would this impact the existing data design? Write an INSERT statement to demonstrate what would happen if a new Supreme pizza with all the toppings was added to the Pizza Runner menu?


Case Study #3: Foodie-Fi

image

View the case study here and my solution here and on [Medium].

Business Task:

Danny and his friends launched a new startup Foodie-Fi and started selling monthly and annual subscriptions, giving their customers unlimited on-demand access to exclusive food videos from around the world.

This case study focuses on using subscription style digital data to answer important business questions on customer journey, payments, and business performances.

Entity Relationship Diagram:

image

Case Study Questions:

A. Customer Journey

Click here to expand!

Based off the 8 sample customers provided in the sample from the subscriptions table, write a brief description about each customer’s onboarding journey.

B. Data Analysis Questions

Click here to expand!
  1. How many customers has Foodie-Fi ever had?
  2. What is the monthly distribution of trial plan start_date values for our dataset - use the start of the month as the group by value
  3. What plan start_date values occur after the year 2020 for our dataset? Show the breakdown by count of events for each plan_name
  4. What is the customer count and percentage of customers who have churned rounded to 1 decimal place?
  5. How many customers have churned straight after their initial free trial - what percentage is this rounded to the nearest whole number?
  6. What is the number and percentage of customer plans after their initial free trial?
  7. What is the customer count and percentage breakdown of all 5 plan_name values at 2020-12-31?
  8. How many customers have upgraded to an annual plan in 2020?
  9. How many days on average does it take for a customer to an annual plan from the day they join Foodie-Fi?
  10. Can you further breakdown this average value into 30 day periods (i.e. 0-30 days, 31-60 days etc)
  11. How many customers downgraded from a pro monthly to a basic monthly plan in 2020?

C. Challenge Payment Question

Click here to expand!

The Foodie-Fi team wants you to create a new payments table for the year 2020 that includes amounts paid by each customer in the subscriptions table with the following requirements:

  • monthly payments always occur on the same day of month as the original start_date of any monthly paid plan
  • upgrades from basic to monthly or pro plans are reduced by the current paid amount in that month and start immediately
  • upgrades from pro monthly to pro annual are paid at the end of the current billing period and also starts at the end of the month period
  • once a customer churns they will no longer make payments

D. Outside The Box Questions

Click here to expand!

The following are open ended questions which might be asked during a technical interview for this case study - there are no right or wrong answers, but answers that make sense from both a technical and a business perspective make an amazing impression!

  1. How would you calculate the rate of growth for Foodie-Fi?
  2. What key metrics would you recommend Foodie-Fi management to track over time to assess performance of their overall business?
  3. What are some key customer journeys or experiences that you would analyse further to improve customer retention?
  4. If the Foodie-Fi team were to create an exit survey shown to customers who wish to cancel their subscription, what questions would you include in the survey?
  5. What business levers could the Foodie-Fi team use to reduce the customer churn rate? How would you validate the effectiveness of your ideas?

Case Study #4: Data Bank

Image

View the case study here and my solution here and on [Medium].

Business Task:

Danny launched a new initiative, Data Bank which runs just like any other digital bank - but it isn’t only for banking activities, they also have the world’s most secure distributed data storage platform!

Customers are allocated cloud data storage limits which are directly linked to how much money they have in their accounts. There are a few interesting caveats that go with this business model, and this is where the Data Bank team need your help!

The management team at Data Bank want to increase their total customer base - but also need some help tracking just how much data storage their customers will need.

This case study is all about calculating metrics, growth and helping the business analyse their data in a smart way to better forecast and plan for their future developments!

Entity Relationship Diagram:

image

Case Study Questions:

A. Customer Nodes Exploration

Click here to expand!
  1. How many unique nodes are there on the Data Bank system?
  2. What is the number of nodes per region?
  3. How many customers are allocated to each region?
  4. How many days on average are customers reallocated to a different node?
  5. What is the median, 80th and 95th percentile for this same reallocation days metric for each region?

B. Customer Transactions

Click here to expand!
  1. What is the unique count and total amount for each transaction type?
  2. What is the average total historical deposit counts and amounts for all customers?
  3. For each month - how many Data Bank customers make more than 1 deposit and either 1 purchase or 1 withdrawal in a single month?
  4. What is the closing balance for each customer at the end of the month?
  5. Comparing the closing balance of a customer’s first month and the closing balance from their second nth, what percentage of customers:
  • Have a negative first month balance?
  • Have a positive first month balance?
  • Increase their opening month’s positive closing balance by more than 5% in the following month?
  • Reduce their opening month’s positive closing balance by more than 5% in the following month?
  • Move from a positive balance in the first month to a negative balance in the second month?

Case Study #5: Data Mart

Image

View the case study here and my solution here and on [Medium].

Business Task:

Data Mart is an online supermarket that specialises in fresh produce.

In June 2020 - large scale supply changes were made at Data Mart. All Data Mart products now use sustainable packaging methods in every single step from the farm all the way to the customer.

Danny needs your help to analyse and quantify the impact of this change on the sales performance for Data Mart and it’s separate business areas.

The key business question to answer are the following:

  • What was the quantifiable impact of the changes introduced in June 2020?
  • Which platform, region, segment and customer types were the most impacted by this change?
  • What can we do about future introduction of similar sustainability updates to the business to minimise impact on sales?

Entity Relationship Diagram:

image

image

Here are some further details about the dataset:

  1. Data Mart has international operations using a multi-region strategy.
  2. Data Mart has both, a retail and online platform in the form of a Shopify store front to serve their customers.
  3. Customer segment and customer_type data relates to personal age and demographics information that is shared with Data Mart.
  4. transactions is the count of unique purchases made through Data Mart and sales is the actual dollar amount of purchases.

Each record in the dataset is related to a specific aggregated slice of the underlying sales data rolled up into a week_date value which represents the start of the sales week.

Case Study Questions:

A. Data Cleansing Steps

Click here to expand!

In a single query, perform the following operations and generate a new table in the data_mart schema named clean_weekly_sales:

  • Convert the week_date to a DATE format
  • Add a week_number as the second column for each week_date value, for example any value from the 1st of January to 7th of January will be 1, 8th to 14th will be 2 etc
  • Add a month_number with the calendar month for each week_date value as the 3rd column
  • Add a calendar_year column as the 4th column containing either 2018, 2019 or 2020 values
  • Add a new column called age_band after the original segment column using the following mapping on the number inside the segment value
image
  • Add a new demographic column using the following mapping for the first letter in the segment values:
segment demographic
C Couples
F Families
  • Ensure all null string values with an "unknown" string value in the original segment column as well as the new age_band and demographic columns
  • Generate a new avg_transaction column as the sales value divided by transactions rounded to 2 decimal places for each record

B. Data Exploration

Click here to expand!
  1. What day of the week is used for each week_date value?
  2. What range of week numbers are missing from the dataset?
  3. How many total transactions were there for each year in the dataset?
  4. What is the total sales for each region for each month?
  5. What is the total count of transactions for each platform
  6. What is the percentage of sales for Retail vs Shopify for each month?
  7. What is the percentage of sales by demographic for each year in the dataset?
  8. Which age_band and demographic values contribute the most to Retail sales?
  9. Can we use the avg_transaction column to find the average transaction size for each year for Retail vs Shopify? If not - how would you calculate it instead?

C. Before & After Analysis

Click here to expand!

This technique is usually used when we inspect an important event and want to inspect the impact before and after a certain point in time.

Taking the week_date value of 2020-06-15 as the baseline week where the Data Mart sustainable packaging changes came into effect.

We would include all week_date values for 2020-06-15 as the start of the period after the change and the previous week_date values would be before.

Using this analysis approach - answer the following questions:

  1. What is the total sales for the 4 weeks before and after 2020-06-15? What is the growth or reduction rate in actual values and percentage of sales?
  2. What about the entire 12 weeks before and after?
  3. How do the sale metrics for these 2 periods before and after compare with the previous years in 2018 and 2019?

D. Bonus Question

Click here to expand!

Which areas of the business have the highest negative impact in sales metrics performance in 2020 for the 12 week before and after period?

  • region
  • platform
  • age_band
  • demographic
  • customer_type

Do you have any further recommendations for Danny’s team at Data Mart or any interesting insights based off this analysis?


8-week-sql-challenge's People

Contributors

katiehuangx avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.