GithubHelp home page GithubHelp logo

meazer / data-engineer-bootcamp-assessment Goto Github PK

View Code? Open in Web Editor NEW

This project forked from augusto3lx/data-engineer-traineeship-assessment

0.0 0.0 0.0 279 KB

License: MIT License

Scala 100.00%

data-engineer-bootcamp-assessment's Introduction

Data Engineering bootcamp assessment

The goal of this test is to asses yourself and open a path to a clear and concise technical interview.

Remember, you already went through our screening and first interview; this is the beginning and preparation for a mutual technical conversation.

We do not expect you to know any of these tools here. But all of them are widely used and have a large variety of information all over the internet, so it's not an extremely complex job. The 8 weeks for the bootcamp will be very intensive, so being able to find your way around new concepts/tech/tools from day 0 is very important ability, thus this assessment.

Hortonworks Data Platform Sandbox

Download the latest HDP sandbox and run locally following the instructions here to learn how to setup and everything. Do the first tutorial Learning the Ropes of the HDP Sandbox

This was tested running on a macbook with 16GB of RAM. Check your memory usage or deploy in the cloud if you need more. I recommend allocating at least 8GB for this vbox.

Explain you steps and impression in MyExperience.md.

Scala + Spark

Basic HDFS & Hive

Build a Scala application using Spark (you must use sbt as a build tool) and execute against the Sandbox Hive & Spark to do the following:

  • upload the .csv files on data-spark to HDFS
  • create tables on Hive for each .csv file
  • output a dataframe on Spark that contains DRIVERID, NAME, HOURS_LOGGED, MILES_LOGGED so you can have aggregated information about the driver.

Besides the code on a repo, explain you steps and impression in MyExperience.md.

HBase

Extend the Scala application above so that it can:

  • create a table dangerous_driving on HBase
  • load dangerous-driver.csv
  • add a 4th element to the table from extra-driver.csv
  • Update id = 4 to display routeName as Los Angeles to Santa Clara instead of Santa Clara to San Diego
  • Outputs to console the Name of the driver, the type of event and the event Time if the origin or destination is Los Angeles.

Same thing here, besides the code on a repo, explain you steps and impression in MyExperience.md.

Extra

  • Deliver a containerized Scala app
  • Write at least 2 Unit Tests before building the Scala app

Doubts &/Or Submission

Clone this repository to start working on your own prefered git tool. In the end, commit and push your solution and send us the link.
Feel free to reach out to Thiago de Faria.

data-engineer-bootcamp-assessment's People

Contributors

meazer avatar thiagoavadore avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.