GithubHelp home page GithubHelp logo

bcgov / cas-airflow Goto Github PK

View Code? Open in Web Editor NEW
3.0 11.0 0.0 833 KB

Exploration of Airflow on OpenShift

License: Apache License 2.0

Python 63.82% Dockerfile 0.77% Shell 16.72% Mustache 8.28% Smarty 10.42%
env cas-ggircs

cas-airflow's Introduction

CAS Airflow

Configuration of Apache Airflow for the Climate Action Secretariat projects.

This repository contains the docker images, helm charts, and DAGs required to automate various workflows for the CAS team.

DAGs

The dags directory contains the various workflows (Directed Acyclic Graphs)

Running tasks using the Kubernetes executor and KubernetesPodOperator

  • The Kubernetes Executor allows us to run tasks on Kubernetes as Pods.
    • It gives us benefits to run one script inside one container within its own quota, and to schedule to the least-congested node in the cluster.
  • The KubernetesPodOperator allows us to create Pods on Kubernetes.
    • It gives us the freedom to run the command in any arbitrary image, sandboxing the job run inside a docker container.
    • It allows us to select in which namespace to run the job.
  • Airflow Kubernetes

CI/CD

The docker images are built on CircleCI for every commit, and pushed to CAS' google cloud registry if the build occurs on the develop or master branch, or if the commit is tagged.

Deployement is done with Shipit, using the helm charts defined in the helm folder

  • the helm install command should specify namespaces for the different CAS applications: helm install --set namespaces.airflow=<< >> --set namespaces.ggircs=<< >> --set namespaces.ciip=<< >> --set namespaces.cif=<< >>

There are a couple manual steps required for installation (the first deployment) at the moment:

  1. Prior to deploying, the namespace where airflow is deployed should have:
  • A "github-registry" secret, containing the pull secrets to the docker registry. This should be taken care of by cas-pipeline's make provision.
  • An "airflow-default-user-password" secret. This will have airflow create a 'cas-aiflow-admin' user with this password.
  1. Deploy with shipit

  2. The connections required in the various dags need to be manually created

TODO

  • stream-minio should be replaced to use gcs client
  • the docker images should be imported in the cluster instead of pulling from GH every time we spin up a pod
  • authentication should be done with GitHub (allowing members of https://github.com/orgs/bcgov/teams/cas-developers)
  • automate the creation of connections on installation

Contributing

Cloning

git clone [email protected]:bcgov/cas-airflow.git ~/cas-airflow && cd $_
git submodule update --init

This repository contains the DAGs as well as the helm chart. It submodules airflow through the cas-airflow-upstream repository, to use its helm chart as a dependency - and will eventually reference the official airflow instead.

Getting started

Use asdf to install the correct version of python.

asdf install

Use pip to install the correct version of airflow.

pip install -r requirements.txt

Then reshim asdf to ensure the correct version of airflow is in your path.

asdf reshim

Be sure to set the $AIRFLOW_HOME environment variable if this repository was cloned to a path other than ~/airflow.

airflow db init
airflow users create -r Admin -u <<username>> -e <<email>> -f <<first name>> -l <<last name>> -p <<password>>

Start airflow locally (optional).

airflow webserver --daemon
airflow scheduler --daemon

Writing

Run a specific task in a specific dag.

airflow test hello_world_dag hello_task $(date -u +"%Y-%m-%dT%H:%M:%SZ")

cas-airflow's People

Contributors

dependabot[bot] avatar dleard avatar joshgamache avatar joshlarouche avatar junminahn avatar matthieu-foucault avatar mikevespi avatar pbastia avatar repo-mountie[bot] avatar tmastrom avatar wenzowski avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cas-airflow's Issues

Bug: nginx reload dag fails when crunchy postgres operator is installed in that namespace

Describe the Bug:

This is a combination of:

  • The monitoring sidecar generates an empty projected volume in its yaml, in the database pods cas-cif-postgres-pgha1-<...>-N
     - name: exporter-config
       projected:
       defaultMode: 420
       sources: null
    
  • The python kubernetes API strictly enforces the projected volume spec when parsing responses from the cluster on api calls like list_namespaced_pod(...), which is why we're getting this error: ValueError: Invalid value for 'sources, must not be 'None'

the issue is very similar to this kubernetes-client/python#895 , so one avenue to fixing in the short term is to monkey-patch in a similar way - especially that we're not doing anything with that data in this DAG, we're just listing the pods to find the nginx ones.

Probability (how likely the bug is to happen, scored from 1-5): 5

Effect (how bad the bug is when it does happen, scored from 1-5): 3
There is a workaround: restart the nginx pods manually

Steps to reproduce

Install the postgres operator in a namespace, then run the reload_nginx_containers for that namespace (no need to actually have nginx containers present)

Dev Details

the issue is very similar to this kubernetes-client/python#895 , so one avenue to fixing in the short term is to monkey-patch in a similar way - especially that we're not doing anything with that data in this DAG, we're just listing the pods to find the nginx ones.

probably something like:

 from kubernetes.client.models.v1_projected_volume_source import V1ProjectedVolumeSource
  def sources(self, names):
      self._sources = sources or []
  V1ContainerImage.sources = V1ContainerImage.sources.setter(sources)

this is the context of what we're trying to patch

As a maintainer, I want airflow to have a DAG that fetches DAG files and writes them to the DAGs PVC

We need to create a dag that fetches DAG files

  • from a given <git url, path, ref> config
  • and writes them to the DAGs PVC (needs to be enabled in airflow helm chart)

Airflow API for configuration options: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/post_dag_run

In the helm chart, we'll need --set dags.persistence.enabled=true to enable the PVC

dag refresh time might need tweaking

Design tooltips / help text to provide additional context to the calculated fields to avoid confusion

Describe the task

According to @pbastia: The rules for some calculated fields are getting too complex even for someone familiar with the CIF program. We might want to start working on contextual help and tooltips. The generic

This field cannot be calculated due to lack of information now.

help text isn't enough.

Acceptance Criteria

  • List the calculated fields needing additional help text refer to the Calculations doc
  • Draft contextual help text for the fields above

Additional context

  • Tooltip components in #1573 can be reused for the implementation of this
  • @suhafa's suggestion:

Something more specific would be helpful. I have an idea -Wealthsimple has a fantastic way they handle incomplete information. When the user submits the erroneous application, an alert box will let them know something isn't right and then the user is directly taken to fields they need to enter information in step-by-step to sort the issue. We could also utilize these 'guided' field filling paths for calculations.

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means its critical that we work to make our content as discoverable as possible; Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

add a topic

That's in, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip ๐Ÿค“

  • If your org is not in the list below, or the table contains errors, please create an issue here.

  • While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.

  • Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.

  • If your application is live, add the production URL.

Ministry Short Codes

Short Code Organization Name
AEST Advanced Education, Skills & Training
AGRI Agriculture
ALC Agriculture Land Commission
AG Attorney General
MCF Children & Family Development
CITZ Citizens' Services
DBC Destination BC
EMBC Emergency Management BC
EAO Environmental Assessment Office
EDUC Education
EMPR Energy, Mines & Petroleum Resources
ENV Environment & Climate Change Strategy
FIN Finance
FLNR Forests, Lands, Natural Resource Operations & Rural Development
HLTH Health
FLNR Indigenous Relations & Reconciliation
JEDC Jobs, Economic Development & Competitiveness
LBR Labour Policy & Legislation
LDB BC Liquor Distribution Branch
MMHA Mental Health & Addictions
MAH Municipal Affairs & Housing
BCPC Pension Corporation
PSA Public Safety & Solicitor General & Emergency B.C.
SDPR Social Development & Poverty Reduction
TCA Tourism, Arts & Culture
TRAN Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 180 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

curl commands in `airflow-dag-trigger.sh` should be run with a retry

curl commands should use --retry 5 --retry-all-errors.
This is mainly to work around a race condition where the trigger fail because it cannot find a newly created dag, but it is a good thing to add to have a more resilient script, in case there is a network issue in connecting to the airflow API

airflow exposes an airflow-dag-trigger chart

To trigger dags, we need to create a script and an ad-hoc helm template for the deployment that needs to trigger the dag.
ciip-portal and ggircs are good examples of that:
https://github.com/bcgov/cas-ciip-portal/blob/develop/.bin/trigger-airflow-dag.sh
https://github.com/bcgov/cas-ciip-portal/blob/develop/helm/cas-ciip-portal/templates/airflow-dag-trigger.yaml

The airflow-dag-trigger chart needs the following values:

  • airflow endpoint
  • dag_id
  • dag_conf
  • airflow_secret_name

The dag should be added to a helm repository using the chart-releaser GH action (see cas-postgres/cas-metabase repo)

As a maintainer, I want DAGs to be in the application repositories

Using a DELL ECS or GCP bucket, we could ensure that correct version of the DAGs are used when deploying our application, and we could host the DAGs code in the applications git repositories instead of this repository

  • create a dag that fetches DAG files from a given <git url, path, ref> config and writes them to the DAGs PVC (needs to be enabled in airflow helm chart)

  • each application uses airflow-dag-trigger job (created in #91) to trigger the above DAG for its own repo.

  • dag refresh time might need tweaking

Possibly helpful blog post on breaking up the dag monorepo:
https://tech.scribd.com/blog/2020/breaking-up-the-dag-repo.html
Idea to do something similar with GCS:
https://github.com/GoogleCloudPlatform/gcsfuse
Related Airflow 2.0 improvement proposal:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 286 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

docker images are tagged with the tag generated by the chart-releaser-action

now that we use the chart-releaser action, we can automate the tagging of the docker images with the same tag that was generated by the chart-releaser-action.

The idea would be to avoid having values like this in a release: https://github.com/bcgov/cas-ciip-portal/blob/61d938c5efc2c4cb7859e30ccae1da27cfccbd8b/helm/cas-ciip-portal/values.yaml#L4

AC:

  • updated CI rule to run the tagging job with the generated release tag
  • tag in the values.yaml file needs to be updated automatically as part of the release process - this can probably be done by using a templating tpl ( ... ) call? Or by CI as well?

SWRS version 0 update function is broken

We had assumed the swrs_report_id remained the same across versions but it changes with every new version.
This means the way the refresh_swrs_version_data function matches versions is incorrect.

To fix this:
The match to swrs.report from ggircs_portal.application should be made using both swrs_facility_id and reporting_year

Probability: 4
Effect: 4

Task: Setup a nightly build for the cas-airflow repo

Describe the task

Setup a nightly build for the cas-airflow repo to increase security in cases where code is not submitted for multiple days.

Acceptance Criteria

  • A nightly build github action has been set up
  • A yarn-audit github action has been set up

Additional context

  • Add any other context about the task here.
  • Or here

Create KubernetesJobFromCronJobOperator

dags/trigger_k8s_cronjob.py is used in most of our DAGs, and could be refactored to be an operator, instead of using the python operator. Maybe we would be able to contribute that upstream too.

DAG; swrs.py throwing errors in Airflow

  • Broken DAG: [/usr/local/airflow/dags/dags/swrs.py] No module named 'operators'

  • Broken DAG: [/usr/local/airflow/dags/dags/remove-child-process-logs.py] The key (delete _task) has to be made of alphanumeric characters, dashes, dots and underscores exclusively

Airflow not recording email addresses from Github

###Description of the Tech Debt###
When logging in to Airflow using Github OAuth, the users email addresses are not being recorded. They are being saved as github_<username>@email.notfound which is what it does when it doesn't receive an email address when creating a new user.
There was a point in the process when it was recording the correct email addresses, but it stopped doing so when we fixed a more pressing problem with it.

###Tech Debt Triage#
The purpose of our technical debt triage process is to analyze technical debt to determine risk level of the technical debt and the value in tackling that technical debt.

Risk Value Scoring:

Level Value
High 3
Medium 2
Low 1
Technical Debt - Risk Types Level Value
Business Area Risk - Risk of business area visibility / damage to user experience 1 1
Developer Fault Risk - How likely will this tech debt cause a future error related to coding on top of it 1 1
System Fault Risk - Risk of system errors or application downtime 1 1
Time Scale Risk - Compound risk effect if left alone. How much more difficult to fix or dangerous will this become over time? 1 1
Time Sink Risk - How much will this tech debt slow the development process down 2 2
TOTAL SCORE: 6 6

Tech Debt: Remove dependency on Terraform Cloud and cas-shelf

We are spinning down Terraform Cloud. Follow the patterns established in cas-registration and cas-ciff for migration and running terraform as an OpenShift Job.

Acceptance Criteria

  • Migrate Terraform resources from Terraform Cloud to a state held in a Google Cloud Storage bucket (these will have been created by cas-pipeline.
  • Create Terraform modules for resources
  • Create Helm template to import modules and run Terraform job

Update Airflow to 2.1 release

Tech debt

From https://youtrack.button.is/issue/GGIRCS-2122
Right now we are running an image built from airflow's 2.0.0rc1 release, since anything after this had trouble with the webserver's liveness and readiness probes
(Client Timeout).
We need to revisit this to see:

  1. if there is a newer version that doesn't have this issue
  2. or if we can find the root cause of this issue to use the official release

To reproduce the issue

  • pull apache/airflow tag 2.0.0
  • build the docker image

then in our cas-airflow repository

  • edit the Dockerfile to reference that image we just built
  • build the cas-airflow docker image
  • reference that image in the cas-airflow helm chart values at airflow.defaultAirflowRepository and airflow.defaultAirflowTag
  • helm install

Airflow: As a maintainer, I want my repository to expose a BCGov Public Code file, so that I can make my work discoverable by the wider government

Description:

The info feeds into a dashboard view of all open source gov't applications, showing tech stacks, hosting, common components, and other key info at a glance. It's an easy way to see what other teams/applications are on the same tech stack/using the same components.

Acceptance Criteria:

Given I am a developer
When I visit the cas-airflow repository
Then there is a bcgovpubcode.yml at the root of the repository
And that file contains up-to-date information

Development Checklist:

Definition of Ready (Note: If any of these points are not applicable, mark N/A)

  • User story is included
  • User role and type are identified
  • Acceptance criteria are included
  • Wireframes are included (if required)
  • Design / Solution is accepted by Product Owner
  • Dependencies are identified (technical, business, regulatory/policy)
  • Story has been estimated (under 13 pts)

ยทDefinition of Done (Note: If any of these points are not applicable, mark N/A)

  • Acceptance criteria are tested by the CI pipeline
  • UI meets accessibility requirements
  • Configuration changes are documented, documentation and designs are updated
  • Passes code peer-review
  • Passes QA of Acceptance Criteria with verification in Dev and Test
  • Ticket is ready to be merged to main branch
  • Can be demoed in Sprint Review
  • Bugs or future work cards are identified and created
  • Reviewed and approved by Product Owner

Notes:

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Bug - Airflow: triggering a kubernetes cron job will fail if the cluster takes too long instantiating a job

trigger_k8s_cronjob.py contains a flaw where it waits an arbitrary 10 second for a job to be created from the cronjob.
This can cause a failure if the cluster is slow, because the retrieved list of matching jobs will be empty.

Action Item:
Retry getting the jobs (with logging) if the returned list is empty, and abort with a much longer timeout. Maybe we can retry indefinitely and let the cluster timeout the job instead?

Tech Debt: update deprecated settings values

The latest Airflow upgrade had a few deprecation warnings:

/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:724 DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:747 DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:761 FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py:450 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py:3854 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.

These will need to be addressed before we update to a future version

Airflow: As a Developer, I want to have access to a python library to create wal-g backup DAGs

As a Developer,

I want to have access to a small python library that creates wal-g backup tasks

So I can break down the backup DAG into the individual repositories (ciip, ggircs, metabase)

the usage would be something like

from airflow import DAG
from walg_backups import create_backup_task

dag = DAG(..., schedule_interval='0 8 * * *')
create_backup_task(dag, task_name, namespace, deployment_name)

Airflow documentation curation

Assess the CIIP repo's documentation.
Things to consider:

  • What is outdated
  • What is missing
  • Is it organised in a way that makes sense and is discoverable

Todo:

  • Remove or update outdated documentation
  • Create a list of suggestions for documentation that should be added
  • Re-organise documentation if it is not discoverable following our documentation organisation strategy

Documentation organisation strategy:

  • A docs directory at the root of a repo
  • Separate doc files with names that clearly describe what the doc file contains
  • Appropriately named sub-directories where there are several related individual doc files

Bug: new k8s api deprecated the batch/v1beta1 cronjobs

Bug

The trigger_k8s_cronjob task uses the deprecated batch/v1beta1 API for CronJobs.

This is an issue, since most DAGs we have use this task to trigger cronjobs in other namespaces, and will all fail.

Fix

the wrong API is called in a few places in this file

  • update dags/trigger_k8s_cronjob.py to use the correct batch/v1 API when calling a cronjob
  • create an release and deploy it to dev

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.