GithubHelp home page GithubHelp logo

lab-sql-aggregation-and-transformation's Introduction

logo_ironhack_blue 7

Lab | SQL Data Aggregation and Transformation

Learning Goals

This lab allows you to practice and apply the concepts and techniques taught in class.

Upon completion of this lab, you will be able to:

  • Use SQL built-in functions such as COUNT, MAX, MIN, AVG to aggregate and summarize data, and use GROUP BY to group data by specific columns. Use the HAVING clause to filter data based on aggregate functions.

  • Use SQL to clean, transform, and prepare data for analysis by handling duplicates, null values, renaming columns, and converting data types. Use functions like ROUND, DATE_DIFF, CONCAT, and SUBSTRING to manipulate data and generate insights.

  • Use conditional expressions for creating new columns.



Prerequisites

Before this starting this lab, you should have learnt about:

  • SELECT, FROM, ORDER BY, LIMIT, WHERE, GROUP BY, and HAVING clauses.
  • DISTINCT keyword to return only unique values, AS keyword for using aliases.
  • Built-in SQL functions such as COUNT, MAX, MIN, AVG, ROUND, DATEDIFF, or DATE_FORMAT.
  • CASE statement for conditional logic.

Introduction

Welcome to the SQL Data Aggregation and Transformation lab!

In this lab, you will practice how to use SQL queries to extract insights from the Sakila database which contains information about movie rentals.

You will build on your SQL skills by practicing how to use the GROUP BY and HAVING clauses to group data and filter results based on aggregate values. You will also practice how to handle null values, rename columns, and use built-in functions like MAX, MIN, ROUND, DATE_DIFF, CONCAT, and SUBSTRING to manipulate and transform data for generating insights.

Throughout the lab, you will work with two SQL query files: sakila-schema.sql, which creates the database schema, and sakila-data.sql, which inserts the data into the database. You can download the necessary files locally by following the steps listed in Sakila sample database - installation.

You can also refer to the Entity Relationship Diagram (ERD) of the database to guide your analysis:


DB schema



Imagine you work at a movie rental company as an analyst. By using SQL in the challenges below, you are required to gain insights into different elements of its business operations.

Challenge 1

  1. You need to use SQL built-in functions to gain insights relating to the duration of movies:

    • 1.1 Determine the shortest and longest movie durations and name the values as max_duration and min_duration.
    • 1.2. Express the average movie duration in hours and minutes. Don't use decimals.
      • Hint: Look for floor and round functions.
  2. You need to gain insights related to rental dates:

    • 2.1 Calculate the number of days that the company has been operating.
      • Hint: To do this, use the rental table, and the DATEDIFF() function to subtract the earliest date in the rental_date column from the latest date.
    • 2.2 Retrieve rental information and add two additional columns to show the month and weekday of the rental. Return 20 rows of results.
    • 2.3 Bonus: Retrieve rental information and add an additional column called DAY_TYPE with values 'weekend' or 'workday', depending on the day of the week.
      • Hint: use a conditional expression.
  3. You need to ensure that customers can easily access information about the movie collection. To achieve this, retrieve the film titles and their rental duration. If any rental duration value is NULL, replace it with the string 'Not Available'. Sort the results of the film title in ascending order.

    • Please note that even if there are currently no null values in the rental duration column, the query should still be written to handle such cases in the future.
    • Hint: Look for the IFNULL() function.
  4. Bonus: The marketing team for the movie rental company now needs to create a personalized email campaign for customers. To achieve this, you need to retrieve the concatenated first and last names of customers, along with the first 3 characters of their email address, so that you can address them by their first name and use their email address to send personalized recommendations. The results should be ordered by last name in ascending order to make it easier to use the data.

Challenge 2

  1. Next, you need to analyze the films in the collection to gain some more insights. Using the film table, determine:
    • 1.1 The total number of films that have been released.
    • 1.2 The number of films for each rating.
    • 1.3 The number of films for each rating, sorting the results in descending order of the number of films. This will help you to better understand the popularity of different film ratings and adjust purchasing decisions accordingly.
  2. Using the film table, determine:
    • 2.1 The mean film duration for each rating, and sort the results in descending order of the mean duration. Round off the average lengths to two decimal places. This will help identify popular movie lengths for each category.
    • 2.2 Identify which ratings have a mean duration of over two hours in order to help select films for customers who prefer longer movies.
  3. Bonus: determine which last names are not repeated in the table actor.

Requirements

  • Fork this repo
  • Clone it to your machine

Getting Started

Complete the challenges in this readme in a .sqlfile.

Submission

  • Upon completion, run the following commands:
git add .
git commit -m "Solved lab"
git push origin master
  • Paste the link of your lab in Student Portal.

lab-sql-aggregation-and-transformation's People

Contributors

debironhack avatar sandrabosk avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.