travaux.com technical challenge
This work is based on :
- embulk for the Extract/Load steps : configuration can be found in the
/embulk/
directory - postgres for storage and SQL fueling
- dbt for Transformation : project will be found under
dbt/travaux/
Thanks to the repository's Dockerfile, one can build the tech challenge environment like this :
docker build -t travaux-tech-challenge .
Please be patient...
The built environment can be launched with the following command line :
docker run -d -p 5432:5432 -p 8080:8080 -p 3000:3000 travaux-tech-challenge
Then, please get the running container id like this :
docker ps
and launch a terminal session in the environment with this container-id :
sudo docker exec -it CONTAINER-ID bash
We are now going to use this terminal to process our data.
This will extract the data from the event_log.csv file and load it in the postgres instance :
cd /embulk
java -jar embulk.jar run event_log.yml
Now, we can use dbt to transform the data, from the event_log data up to the end user's datasets :
cd /dbt/travaux
dbt run --profiles-dir .
The following command will validate all the transformation steps :
dbt test --profiles-dir .
To learn how data is transformed, one can follow these steps :
- Generate the documentation :
dbt docs generate --profiles-dir .
- Start the embedded dbt web server :
dbt docs serve --profiles-dir . &
- Open a web browser at
http://localhost:8080
Finally, we can explore the data further in a data visualization tool :
cd /metabase
java -jar metabase.jar &
- Open a web browser page at
http://localhost:3000
- Authenticate with the following parameters :
[email protected]
/travaux1
docker stop CONTAINER-ID
This Dockerfile is the first one I have ever built, and it may surely not follow the best practices...