At Datafiniti, we have several millions of product records in our database that we've collected from several retailers on the internet. We're tasked with keeping those records as up-to-date as we can. This means our data pipeline needs to be blazing fast with regards to getting data from the internet into our database. This coding challenge exposes you to a miniature prototype of our pipeline and tasks you with speeding up the rate at which data moves through it.
To reiterate, our goal is to get data, in the form of json objects, from point A to point B as fast as we possibly can. These json objects begin by being enqueued in a shared in-memory cache (redis in this case). We provide you with code that simply dequeues records one at a time and imports them into elasticsearch. Your objective is to do whatever it takes to increase the rate at which records are inserted into elasticsearch. You can complete this challenge by improving either the java or nodejs code we provide. Feel free to change any part of this codebase or introduce/replace technologies in order to increase import rates. The only rule of this challenge is that records must be enqueued somewhere and then imported into elasticsearch. This may seem initially daunting but don't worry, we provide plenty of tools which we explain in the section below.
-
Docker
- If you don't already have docker you can download and install it from the following links:
- For those of you on Windows or a Mac: increase the memory allocated to docker to at least 4GB. This setting can be found within docker's preferences. Feel free to reach out if you don't have that much memory available on your machine. Linux users don't need to worry about this because there's no virtual machine running between your host OS and your docker containers.
-
Fork this repository, clone it down and cd into it.
-
The repository contains a docker composition that sets up the following docker containers:
- Redis
- Elasticsearch
- Kibana. This is a fancy web UI that you can use to monitor import metrics in order to benchmark your solution
- A dev container with either java or nodejs installed depending on which you decide to use
-
Follow the instructions below for your language of choice
-
From within the
coding-challenge
directory, run./bin/setup-nodejs.sh
-
Once the output is done printing to the terminal click on this link.
-
Click the Monitoring button on the far left
-
Click
Enable Monitoring
-
Open another terminal and type in
docker exec -it node-dev bash
- This is equivalent to
ssh
ing into a virtual machine that has nodejs installed
- This is equivalent to
-
To run the code we've provided run the following commands
cd code npm install node baseline.js
- This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
-
Head over to the browser window that you have kibana open in
-
Click on indices and then records
-
Watch the Indexing Rate graph to see how fast the provided solution is.
- You'll be using this graph to benchmark your solution
- Your solution will be assessed by first running
baseline.js
and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
-
Include a file named
solution.js
that runs your solution. Including more than 1 file is perfectly fine so long as we can kick everything off by runningnode solution.js
. Additionally, feel free to modify any and all parts of the code provided in this challenge.
-
From within the
coding-challenge
directory, run./bin/setup-java.sh
-
Once the output is done printing to the terminal click on this link.
-
Click the Monitoring button on the far left
-
Click
Enable Monitoring
-
Open another terminal and type in
docker exec -it java-dev bash
- This is equivalent to
ssh
ing into a virtual machine that has java installed
- This is equivalent to
-
To run the code we've provided run the following commands
cd code # This command will compile the baseline solution we provide to you and run it. ./bin/run.sh
- This code will seed 10,000 product records into redis and then begin to import one record at a time into elasticsearch
-
Head over to the browser window that you have kibana open in
-
Click on indices and then records
-
Watch the Indexing Rate graph to see how fast the provided solution is.
- You'll be using this graph to benchmark your solution
- Your solution will be assessed by first running
Baseline
and then your solution. If the graph indicates that your solution is indexing records faster than the baseline then Congratulations! you did it.
-
Provide a way for us to start your solution in a class named
Solution
. You can include all of your solution in that class or create additional classes as you see fit. Just make sure that we can run your solution from theMain
class by doing something likeSolution.start()
orSolution.run()
. Additionally, free to modify any and all parts of the code provided in this challenge.- You can run your code using
./bin/run.sh
so long as you run your solution from withinMain
- You can run your code using
- Once you're done, from within the
coding-challenge
directory, run./bin/teardown.sh
in order to delete all containers and volumes - Send over an email with a link to your forked repo and we'll take a look ASAP!
- Speeding up the import rate does not require some fancy algorithm or data strucuture.
- We will be trying your solution against different amounts of seeded records, likely somwhere between 10,000 - 50,000. Make sure your solution isn't hardcoded for just 10,000 records.
Good Luck!