The "Learn Druid" repository contains many resources to help you learn and apply Apache Druid.
It contains:
- Jupyter Notebooks that guide you through query, ingestion, and data management with Apache Druid.
- A Docker Compose file to get you up and running with a learning lab.
To use the "Learn Druid" Docker Compose, you need:
-
Git or Github Desktop
-
Docker Desktop with Docker Compose
-
A machine with at least 6 GiB of RAM.
Of course, more power is better. The notebooks have been tested with the following resources available to docker: 6 CPUs, 8GB of RAM, and 1 GB swap.
To get started quickly:
-
Clone this repository locally, if you have not already done so:
git clone https://github.com/implydata/learn-druid
-
Navigate to the directory:
cd learn-druid
To refresh your local copy with the latest notebooks:
git restore . git pull
-
Launch the "Learn Druid" Docker environment:
docker compose --profile druid-jupyter up -d
The first time you lanch the environment, it can take a while to start all the services.
-
Navigate to Jupyter Lab in your browser:
From there you can read the introduction or use Jupyter Lab to navigate the notebooks folder.
The Learn Druid environment Docker Compose file includes the following services:
Jupyter Lab: An interactive environment to run Jupyter Notebooks. The image for Jupyter used in the environment contains Python along with all the supporting libraries you need to run the notebooks.
-
Jupyter Labs is exposed at:
Apache Kafka: Streaming service as a data source for Druid.
Imply Data Generator: A tool to generate sample data for Druid. It can produce either batch or streaming data.
Apache Druid: The currently released version of Apache Druid by default.
You can use the web console to monitor ingestion tasks, compare query results, and more. To learn about the Druid web console, see Web console.
-
The Druid web console is exposed at:
You can use the following Docker Compose profiles to start various combinations of the components based upon your specific needs.
Individual notebooks may prescribe a specific profile that you need to use.
Use this profile when you want to run the notebooks against an existing Apache Druid database. Use the DRUID_HOST
parameter to set the Apache Druid host address.
To start Jupyter only:
DRUID_HOST=[host address] docker compose --profile jupyter up -d
For example, if Druid is running on the local machine:
DRUID_HOST=host.docker.internal docker compose --profile jupyter up -d
To stop Jupyter:
docker compose --profile jupyter down
Use this profile when you need to query data and do batch ingestion only.
To start Jupyter and Druid:
docker compose --profile druid-jupyter up -d
To stop Jupyter and Druid:
docker compose --profile druid-jupyter down
To start all services:
docker compose --profile all-services up -d
To stop all services:
docker compose --profile all-services down
For feedback and help, start a discussion in the Discussions board or make contact in the docs and training channel in Apache Druid Slack.