GithubHelp home page GithubHelp logo

acornsgrow / kube-airflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mumoshu/kube-airflow

0.0 37.0 1.0 16 KB

A docker image and kubernetes config files to run Airflow on Kubernetes

License: Apache License 2.0

Makefile 63.56% Shell 36.44%

kube-airflow's Introduction

kube-airflow

Docker Hub Docker Pulls Docker Stars

kube-airflow provides a set of tools to run Airflow in a Kubernetes cluster. This is useful when you'd want:

  • Easy high availability of the Airflow scheduler
  • Easy parallelism of task executions
    • The common way to scale out workers in Airflow is to utilize Celery. However, managing a H/A backend database and Celery workers just for parallelising task executions sounds like a hassle. This is where Kubernetes comes into play, again. If you already had a K8S cluster, just let K8S manage them for you.
    • If you have ever considered to avoid Celery for task parallelism, yes, K8S can still help you for a while. Just keep using LocalExecutor instead of CeleryExecutor and delegate actual tasks to Kubernetes by calling e.g. kubectl run --restart=Never ... from your tasks. It will work until the concurrent kubectl run executions(up to the concurrency implied by scheduler's max_threads and LocalExecutor's parallelism. See this SO question for gotchas) consumes all the resources a single airflow-scheduler pod provides, which will be after the pretty long time.

This repository contains:

  • Dockerfile(.template) of airflow for Docker images published to the public Docker Hub Registry.
  • airflow.all.yaml for creating Kubernetes services and deployments to run Airflow on Kubernetes

Informations

Installation

Create all the deployments and services for Airflow:

    kubectl create -f airflow.all.yaml

Build

git clone this repository and then just run:

    make build

Usage

Create all the deployments and services to run Airflow on Kubernetes:

   kubectl create -f airflow.all.yaml

It will create deployments for:

  • postgres
  • rabbitmq
  • airflow-webserver
  • airflow-scheduler
  • airflow-flower
  • airflow-worker

and services for:

  • postgres
  • rabbitmq
  • airflow-webserver
  • airflow-flower

You can browse the Airflow dashboard via running:

make browse-web

the Flower dashboard via running:

make browse-flower

If you want to use Ad hoc query, make sure you've configured connections: Go to Admin -> Connections and Edit "mysql_default" set this values (equivalent to values in config/airflow.cfg) :

  • Host : mysql
  • Schema : airflow
  • Login : airflow
  • Password : airflow

Check Airflow Documentation

Run the test "tutorial"

    kubectl exec web-<id> --namespace airflow-dev airflow backfill tutorial -s 2015-05-01 -e 2015-06-01

Scale the number of workers

For now, update the value for the replicas field of the deployment you want to scale and then:

    make apply

Wanna help?

Fork, improve and PR. ;-)

kube-airflow's People

Contributors

mumoshu avatar vyper avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.