GithubHelp home page GithubHelp logo

ibm / ada Goto Github PK

View Code? Open in Web Editor NEW
52.0 5.0 9.0 20.53 MB

ADA is a microservice created to retrieve analytics metrics from an Airflow database instance.

License: Apache License 2.0

Dockerfile 2.32% Python 97.68%
airflow airflow-plugin apache apache-airflow api python scheduler workflow serverless

ada's Introduction

Airflow DAG Analytics

Airflow Pytest

ADA is a microservice created to retrieve key analytics metrics for task and DAG level from your Airflow database instance.

Highly integrated with Airflow, ADA makes you able to retrieve data from your database and get analytical insights from it. By plugging ADA in your instance, you will get metrics that can help you to make decisions based on your DAGs historical behavior.

ADA was born to provide a solution for those who want historical data about their DAGs. It can be fully decoupled from your code, which is great when you use an autoscaling tool to host it.

Contents

Usage

Metrics

Using current ADA's SQL queries you can get the following information:

Metric Insight
score Is it taking longer than expected?
average What is its average duration?
count_runs How many times did it run?
maximum What is its longest duration?
minimum What is its shortest duration?
median What is the median duration?
standard_deviation How often is my duration far from the average?
variance How far is my duration from the average?

One of the most powerful metric ADA retrieves is the score. It's calculated by:

 

 

The score is the main metric you must analyze and rely on when identifying a stuck pod. You can use it as your threshold to decide to take - or not - an action about that run. The factor 1.2 was arbitrarily chosen in order to round up the score, acting like a safety factor. It makes the metric more trustable and robust, since it's less susceptible to outliers.

Deployment

When deploying ADA, make sure you have set all required environment variables. You will need two different types of them:

  1. Authorization

    In order to encrypt/decrypt your keys, you need to set your PRIV_KEY and API_KEY. It's important to mention that ADA follows the Fernet implementation style.

  2. Airflow database (Postgres)

    In order to access your Airflow database (Postgres supported), you need to add all of your connection settings. It includes: database, host, username, password and port. Check psycopg and IBM Cloud Databases for PostgreSQL for more details.

If nothing is missing, your docker run command when testing locally should look like this:

 docker run --name ada -p 7000:7000 --rm \
   -e DATABASE=$DATABASE \
   -e USER=$USER \
   -e PASS=$PASS \
   -e HOST=$HOST \
   -e API_PORT=$API_PORT \
   -e PRIV_KEY=$PRIV_KEY \
   -e API_KEY=$API_KEY \
   -i -t ada bash

Use cases

Here are some great examples on how ADA can make you life a lot easier :)

Stuck pods

If you're working integrated with Apache Spark, there's a chance stuck pods are a big pain for you. Whenever they happen, they always require attention and quick actions. With ADA, you'll have the metrics at hand! It means you can use the score to tell if it's taking longer - or not - than it should to run. Your workflow could look like this:

DAG Predict

If you wish to predict you DAGs duration, ADA can help you with that as well! By using the metrics ADA provide, you will be able to tell what is the average runtime of a specific DAG. It means your process can be more transparent and reliable.

If you still want to go further, ADA can provide the numbers to your machine learning model, such as an echo state network, or a math approach you design on your own!

API reference

/all

Return all combinations of task_id and dag_id in your database instance.

Request

  GET /all

Response

[
   {
       "task_id": "task_id_α",
       "dag_id": "dag_id_α",
       "count_runs": 1,
       "average": 1,
       "median": 1,
       "maximum": 1,
       "minimum": 1,
       "standard_deviation": 1,
       "variance_": 1,
       "score": 1
   },
   ...,
   {
       "task_id": "task_id_β",
       "dag_id": "dag_id_β",
       "count_runs": 2,
       "average": 2,
       "median": 2,
       "maximum": 2,
       "minimum": 2,
       "standard_deviation": 2,
       "variance_": 2,
       "score": 2
   }
]

/dag_id

Return metrics in a DAG level.

Request

  GET /dag_id/<your_dag_id>

Response

[
   {
       "dag_id": "dag_id_α",
       "count_runs": 1,
       "average": 1,
       "median": 1,
       "maximum": 1,
       "minimum": 1,
       "standard_deviation": 1,
       "variance_": 1,
       "score": 1
   }
]

/task_id

Return metrics in a task level.

Request

  GET /task_id/<your_task_id>

Response

[
   {
       "task_id": "task_id_α",
       "count_runs": 1,
       "average": 1,
       "median": 1,
       "maximum": 1,
       "minimum": 1,
       "standard_deviation": 1,
       "variance_": 1,
       "score": 1
   }
]

Engine compatibility

Contributing

Contributions are always welcome!

See contributing.md for ways to get started.

License

Copyright 2022 - IBM Inc. All rights reserved
SPDX-License-Identifier: Apache 2.0

See LICENSE for the full license text.

ada's People

Contributors

danielrsfreitas avatar juliaalfarias avatar viniciuserrero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.