GithubHelp home page GithubHelp logo

data-engineering-exercise's Introduction

Data Engineering Exercise

Instructions

  1. Download exercise.sql and exercise.tar.gz files
  2. Complete exercise
  3. Email optimized exercise.sql file and question answers to [email protected]

Purpose

Here at Evolytics, we often receive requests to help optimize SQL. While the original SQL generates the proper output, the performance requires additional resources to execute, and the readability makes updating and maintaining the code difficult to understand.

This data engineering exercise focuses on the ability to identify sub-optimal and poorly written SQL, and then refactor to improve performance and readability. The exercise is entirely query based, so we are not looking for solutions that incorporate stored procedures, user defined functions, or external programming.

Any query based solutions meeting the following objectives will be considered in scope for this exercise.

Objective

Refactor the SQL with the following criteria in mind

  1. Optimized query plan
  2. Enhanced code readability
  3. Easier code maintenance

Final Output

The final output of this exercise will be an updated SQL file, which returns the exact same dataset as produced by the original SQL.

Environment

This exercise was written and tested using MySQL on Mac with default install locations. All necessary code and instructions to reproduce environment are provided below. You are welcome to follow as is, or use any other database platform that you are comfortable with. Keep in mind that if choosing a different platform, it may be necessary to modify the environment instructions as well as the original SQL, to be in line with platform specific keywords and syntax.

All steps executed on macOS Mojave

  1. Download and install MySQL. We tested using version 8.0.11

  2. Create or add entries to /etc/my.cnf so load data infile command will read from directory

     sudo nano /etc/my.cnf
    
     [mysqld]
     secure_file_priv=/add_your_path_here
    
  3. Create or add entries to /etc/my.cnf so MySQL is in the PATH variable

     nano ~/.bash_profile
    
     export PATH="/usr/local/mysql/bin:${PATH}"
    
  4. Restart MySQL from system preferences so changes take affect

  5. Execute the following lines of code to create and populate database. Modify the infile directory based on the secure_file_priv set in /etc/my.cnf file.

     create database evolytics;
    
     create table evolytics.exercise (visitor_id bigint, visit_num int, visit_start_timestamp datetime, hit_timestamp datetime, transaction_type varchar(6), transaction_action varchar(21));
    
     load data infile '/add_your_path_here/exercise.tsv' into table evolytics.exercise fields terminated by '\t' ignore 1 rows;
    
  6. Execute original SQL to generate dataset. Begin refactoring exercise.

data-engineering-exercise's People

Contributors

evo-evrmlya avatar evolytics-rnewton avatar rnewton2 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.