Coding Challenge

Introduction

At Datafiniti, we have several millions of product records in our database that we've collected from several retailers on the internet. We're tasked with keeping those records as up-to-date as we can. This means our data pipeline needs to be blazing fast with regards to getting data from the internet into our database. This coding challenge exposes you to a miniature prototype of our pipeline and tasks you with speeding up the rate at which data moves through it.

Objective

To reiterate, our goal is to get data, in the form of json objects, from point A to point B as fast as we possibly can. These json objects begin by being enqueued in a shared in-memory cache (redis in this case). We provide you with code that simply dequeues records one at a time and imports them into elasticsearch. Your objective is to do whatever it takes to increase the rate at which records are inserted into elasticsearch. You can complete this challenge by improving either the java or nodejs code we provide. Feel free to change any part of this codebase or introduce/replace technologies in order to increase import rates. The only rule of this challenge is that records must be enqueued somewhere and then imported into elasticsearch. This may seem initially daunting but don't worry, we provide plenty of tools which we explain in the section below.

Setup Instructions

Prerequistes

Docker
- If you don't already have docker you can download and install it from the following links:
  - mac
  - windows
  - linux
- For those of you on Windows or a Mac: increase the memory allocated to docker to at least 4GB. This setting can be found within docker's preferences. Feel free to reach out if you don't have that much memory available on your machine. Linux users don't need to worry about this because there's no virtual machine running between your host OS and your docker containers.
Fork this repository, clone it down and cd into it.
The repository contains a docker composition that sets up the following docker containers:
- Redis
- Elasticsearch
- Kibana. This is a fancy web UI that you can use to monitor import metrics in order to benchmark your solution
- A dev container with either java or nodejs installed depending on which you decide to use
Follow the instructions below for your language of choice

NodeJS

From within the coding-challenge directory, run ./bin/setup-nodejs.sh
Once the output is done printing to the terminal click on this link.
Click the Monitoring button on the far left
Click Enable Monitoring
Open another terminal and type in docker exec -it node-dev bash
- This is equivalent to sshing into a virtual machine that has nodejs installed
To run the code we've provided run the following commands
```
cd code
npm install
node baseline.js
```
- This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
Head over to the browser window that you have kibana open in
Click on indices and then records
Watch the Indexing Rate graph to see how fast the provided solution is.
- You'll be using this graph to benchmark your solution
- Your solution will be assessed by first running baseline.js and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
Include a file named solution.js that runs your solution. Including more than 1 file is perfectly fine so long as we can kick everything off by running node solution.js. Additionally, feel free to modify any and all parts of the code provided in this challenge.

Java

From within the coding-challenge directory, run ./bin/setup-java.sh
Once the output is done printing to the terminal click on this link.
Click the Monitoring button on the far left
Click Enable Monitoring
Open another terminal and type in docker exec -it java-dev bash
- This is equivalent to sshing into a virtual machine that has java installed
To run the code we've provided run the following commands
```
cd code

# This command will compile the baseline solution we provide to you and run it.
./bin/run.sh
```
- This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
Head over to the browser window that you have kibana open in
Click on indices and then records
Watch the Indexing Rate graph to see how fast the provided solution is.
- You'll be using this graph to benchmark your solution
- Your solution will be assessed by first running Baseline and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
Provide a way for us to start your solution in a class named Solution. You can include all of your solution in that class or create additional classes as you see fit. Just make sure that we can run your solution from the Main class by doing something like Solution.start() or Solution.run(). Additionally, free to modify any and all parts of the code provided in this challenge.
- You can run your code using ./bin/run.sh so long as you run your solution from within Main

Cleanup and Submission

Once you're done, from within the coding-challenge directory, run ./bin/teardown.sh in order to delete all containers and volumes
Send over an email with a link to your forked repo and we'll take a look ASAP!

Hints

Speeding up the import rate does not require some fancy algorithm or data strucuture.
We will be trying your solution against different amounts of seeded records, likely somwhere between 10,000 - 50,000. Make sure your solution isn't hardcoded for just 10,000 records.

Good Luck!

anjaliannahuja / infra-coding-challenge Goto Github PK

infra-coding-challenge's Introduction

Coding Challenge

Introduction

Objective

Setup Instructions

Prerequistes

NodeJS

Java

Cleanup and Submission

Hints

infra-coding-challenge's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs