GithubHelp home page GithubHelp logo

anilkulkarni87 / airflow-docker Goto Github PK

View Code? Open in Web Editor NEW
21.0 4.0 10.0 109 KB

This is my Apache Airflow Local development setup on Windows 10 WSL2/Mac using docker-compose. It will also include some sample DAGs and workflows.

Home Page: https://anilkulkarni87.github.io/airflow-docker/

License: MIT License

Shell 4.64% Python 48.32% PLpgSQL 37.41% Makefile 1.13% HTML 8.28% Dockerfile 0.22%
airflow sample-dags airflow-community docker data-engineering yaml workflows sql wsl2 airflow-testing

airflow-docker's Introduction

Project logo

Airflow Made Easy | Local Setup Using Docker

Execute Airflow Unit Tests

Deploy GitHub Pages

This is my Apache Airflow Local development setup using docker-compose. It will also include some sample DAGs and workflows.

Recent Updates:

03-Dec-2023

  • Upgrade to airflow 2.7.3
  • Upgraded superset to add secret key
  • Added superset database connection image
  • Works on M1 Mac

03-May-2022

  • Added Dockerfile to extend airflow image
  • Adding additional Pypi package (td-client)
  • Upgrade to Airflow 2.3.0

29-Jun-2021

  • Updated image to Airflow 2.1.1
  • Leveraging _PIP_ADDITIONAL_REQUIREMENTS to install additional dependencies
  • Developing and testing operators for Treasure Data
  • Read more at Treasure Data

๐Ÿ“ Table of Contents

๐Ÿง About

Setup Apache Airflow 2.0 locally on Windows 10 (WSL2) via Docker Compose. The oiginal docker-compose.yaml file was taken from the official github repo.

This contains service definitions for

  • airflow-scheduler
  • airflow-webserver
  • airflow-worker
  • airflow-init - To initialize db and create user
  • flower
  • redis
  • postgres - This is backend for airflow. I am also creating additional database userdata as a backend for my data flow. This is not recommended. Its ideal to have separate databases for airflow and your data.

I have added additional command to add a airflow db connection as part of the docker-compose

Directories I am mounting:

  • ./dags
  • ./logs
  • ./plugins
  • ./sql - for Sql files. We can leveraje jinja templating in our queries. Refer the sample Dag.
  • ./test - Has Unit tests for Airflow Dags.
  • ./pg-init-scripts - This has scripts to create additional database in postgres.

Data Engineering Projects

Here you will find some personal projects that I have worked on. These projects will throw light on some of the airflow features I have used and learnings related to other technologies.

Data Visualization

To experiment with Apache Superset. Read more here

๐Ÿ Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Clone this repo to your machine

docker-compose -f docker-compose.yaml up airflow-init
docker-compose -f docker-compose.yaml up

Prerequisites

What things you need to install the software and how to install them.

You should have Docker and Docker-compose v1.27.0 or more installed on your machine

  • Install and configure WSL2
  • I also had to reset my Ubuntu installation and thats when it asked me to create a user.

Installing

A step by step series of examples that tell you how to get a development env running.

Clone the Repo

git clone

Start docker build

#To extend airflow image
docker-compose build

docker-compose -f docker-compose.yaml up airflow-init

docker-compose -f docker-compose.yaml up

Keep checking docker processes to make sure all machines are helthy

docker ps

Once you notice that all containers are healthy.

Add a connection to Postgres via command line and then Access Airflow UI

docker exec -it airflow-docker_airflow-worker airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'
http://localhost:8080

End with an example of getting some data out of the system or using it for a little demo.

๐Ÿ”ง Running the tests

Unit test for airflow dags has been defined and present in the test folder. This folder is also mapped to the docker containers inside the docker-compose.yaml file. Follow below steps to execute unittests after the docker containers are running:

./airflow bash
python -m unittest discover -v

Github Workflow for running tests

I had to create another docker-compose to be able to execute unit tests whenever I push code to master. Please refer

Break down into end to end tests

Another #TODO

๐ŸŽˆ Usage

Now you can create new dags and place them in your local system and can see it coming live on web UI. Refer the sample dag in the repo.

Important :

Edit the postgres_default connection from the UI or through command line if you want to persist data in postgres as part of the dags you create. Even better you can always add a new connection.

Update: This is now taken care of the in the updated Docker compose file. The connection and the new database are created
./airflow.sh bash

airflow connections add 'postgres_new' --conn-uri 'postgres://airflow:airflow@postgres:5432/airflow'

connect to postgres and create new database with name 'userdata'

docker exec -it airflowdocker_postgres_1 /bin/bash psql -U airflow create database userdata;


Turn on Dag: PostgreOperatorTest_Dag

โ›๏ธ Built Using

โœ๏ธ Authors

๐ŸŽ‰ Acknowledgements

  • Apache Airflow
  • Inspiration is the Airflow Community

Cleanup

docker-compose down --volumes --rmi all

airflow-docker's People

Contributors

anilkulkarni87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

airflow-docker's Issues

Where the superset_config.py file ?


superset_1 | WARNING
superset_1 | --------------------------------------------------------------------------------
superset_1 | A Default SECRET_KEY was detected, please use superset_config.py to override it.
superset_1 | Use a strong complex alphanumeric string and use a tool to help you generate
superset_1 | a sufficiently random sequence, ex: openssl rand -base64 42
superset_1 | --------------------------------------------------------------------------------
superset_1 | --------------------------------------------------------------------------------

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.