bcgov / cas-airflow Goto Github PK
View Code? Open in Web Editor NEWExploration of Airflow on OpenShift
License: Apache License 2.0
Exploration of Airflow on OpenShift
License: Apache License 2.0
Create a new release version for CAS-Airflow that includes the migrated Terraform code from #168 .
curl commands should use --retry 5 --retry-all-errors
.
This is mainly to work around a race condition where the trigger fail because it cannot find a newly created dag, but it is a good thing to add to have a more resilient script, in case there is a network issue in connecting to the airflow API
Broken DAG: [/usr/local/airflow/dags/dags/swrs.py] No module named 'operators'
Broken DAG: [/usr/local/airflow/dags/dags/remove-child-process-logs.py] The key (delete _task) has to be made of alphanumeric characters, dashes, dots and underscores exclusively
This is a combination of:
cas-cif-postgres-pgha1-<...>-N
- name: exporter-config
projected:
defaultMode: 420
sources: null
list_namespaced_pod(...)
, which is why we're getting this error: ValueError: Invalid value for 'sources, must not be 'None'
the issue is very similar to this kubernetes-client/python#895 , so one avenue to fixing in the short term is to monkey-patch in a similar way - especially that we're not doing anything with that data in this DAG, we're just listing the pods to find the nginx ones.
Probability (how likely the bug is to happen, scored from 1-5): 5
Effect (how bad the bug is when it does happen, scored from 1-5): 3
There is a workaround: restart the nginx pods manually
Install the postgres operator in a namespace, then run the reload_nginx_containers
for that namespace (no need to actually have nginx containers present)
the issue is very similar to this kubernetes-client/python#895 , so one avenue to fixing in the short term is to monkey-patch in a similar way - especially that we're not doing anything with that data in this DAG, we're just listing the pods to find the nginx ones.
probably something like:
from kubernetes.client.models.v1_projected_volume_source import V1ProjectedVolumeSource
def sources(self, names):
self._sources = sources or []
V1ContainerImage.sources = V1ContainerImage.sources.setter(sources)
this is the context of what we're trying to patch
now that we use the chart-releaser action, we can automate the tagging of the docker images with the same tag that was generated by the chart-releaser-action.
The idea would be to avoid having values like this in a release: https://github.com/bcgov/cas-ciip-portal/blob/61d938c5efc2c4cb7859e30ccae1da27cfccbd8b/helm/cas-ciip-portal/values.yaml#L4
AC:
tag
in the values.yaml file needs to be updated automatically as part of the release process - this can probably be done by using a templating tpl ( ... )
call? Or by CI as well?###Description of the Tech Debt###
When logging in to Airflow using Github OAuth, the users email addresses are not being recorded. They are being saved as github_<username>@email.notfound
which is what it does when it doesn't receive an email address when creating a new user.
There was a point in the process when it was recording the correct email addresses, but it stopped doing so when we fixed a more pressing problem with it.
###Tech Debt Triage#
The purpose of our technical debt triage process is to analyze technical debt to determine risk level of the technical debt and the value in tackling that technical debt.
Risk Value Scoring:
Level | Value |
---|---|
High | 3 |
Medium | 2 |
Low | 1 |
Technical Debt - Risk Types | Level | Value |
---|---|---|
Business Area Risk - Risk of business area visibility / damage to user experience | 1 | 1 |
Developer Fault Risk - How likely will this tech debt cause a future error related to coding on top of it | 1 | 1 |
System Fault Risk - Risk of system errors or application downtime | 1 | 1 |
Time Scale Risk - Compound risk effect if left alone. How much more difficult to fix or dangerous will this become over time? | 1 | 1 |
Time Sink Risk - How much will this tech debt slow the development process down | 2 | 2 |
TOTAL SCORE: | 6 | 6 |
We need to create a dag that fetches DAG files
Airflow API for configuration options: https://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html#operation/post_dag_run
In the helm chart, we'll need --set dags.persistence.enabled=true
to enable the PVC
dag refresh time might need tweaking
The info feeds into a dashboard view of all open source gov't applications, showing tech stacks, hosting, common components, and other key info at a glance. It's an easy way to see what other teams/applications are on the same tech stack/using the same components.
Given I am a developer
When I visit the cas-airflow repository
Then there is a bcgovpubcode.yml
at the root of the repository
And that file contains up-to-date information
Definition of Ready (Note: If any of these points are not applicable, mark N/A)
ยทDefinition of Done (Note: If any of these points are not applicable, mark N/A)
According to @pbastia: The rules for some calculated fields are getting too complex even for someone familiar with the CIF program. We might want to start working on contextual help and tooltips. The generic
This field cannot be calculated due to lack of information now.
help text isn't enough.
Something more specific would be helpful. I have an idea -Wealthsimple has a fantastic way they handle incomplete information. When the user submits the erroneous application, an alert box will let them know something isn't right and then the user is directly taken to fields they need to enter information in step-by-step to sort the issue. We could also utilize these 'guided' field filling paths for calculations.
The latest Airflow upgrade had a few deprecation warnings:
/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:724 DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:747 DeprecationWarning: The auth_backend option in [api] has been renamed to auth_backends - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/configuration.py:761 FutureWarning: The auth_backends setting in [api] has had airflow.api.auth.backend.session added in the running config, which is needed by the UI. Please update your config before Apache Airflow 3.0.
/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py:450 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
/home/airflow/.local/lib/python3.8/site-packages/airflow/models/dag.py:3854 DeprecationWarning: The dag_concurrency option in [core] has been renamed to max_active_tasks_per_dag - the old setting has been used, but please update your config.
These will need to be addressed before we update to a future version
The release action fails if the chart.lock is not updated.
This should be done automatically in the action itself.
helm dep up
should be run before calling the chart-releaser action
Setup a nightly build for the cas-airflow repo to increase security in cases where code is not submitted for multiple days.
From https://youtrack.button.is/issue/GGIRCS-2122
Right now we are running an image built from airflow's 2.0.0rc1 release, since anything after this had trouble with the webserver's liveness and readiness probes
(Client Timeout).
We need to revisit this to see:
then in our cas-airflow repository
airflow.defaultAirflowRepository
and airflow.defaultAirflowTag
helm install
As a Developer,
I want to have access to a small python library that creates wal-g backup tasks
So I can break down the backup DAG into the individual repositories (ciip, ggircs, metabase)
the usage would be something like
from airflow import DAG
from walg_backups import create_backup_task
dag = DAG(..., schedule_interval='0 8 * * *')
create_backup_task(dag, task_name, namespace, deployment_name)
currently, we're using the same password stored in an openshift secret.
Look into integration with our Github team (docs)
Deploy Airflow 2.5.3
This issue is a kind reminder that your repository has been inactive for 180 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.
To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.
dormant
or retired
life cycle badge.Thank you for your help ensuring effective governance of our open-source ecosystem!
We are spinning down Terraform Cloud. Follow the patterns established in cas-registration and cas-ciff for migration and running terraform as an OpenShift Job.
We had assumed the swrs_report_id remained the same across versions but it changes with every new version.
This means the way the refresh_swrs_version_data
function matches versions is incorrect.
To fix this:
The match to swrs.report from ggircs_portal.application should be made using both swrs_facility_id
and reporting_year
Probability: 4
Effect: 4
Using a DELL ECS or GCP bucket, we could ensure that correct version of the DAGs are used when deploying our application, and we could host the DAGs code in the applications git repositories instead of this repository
create a dag that fetches DAG files from a given <git url, path, ref> config and writes them to the DAGs PVC (needs to be enabled in airflow helm chart)
each application uses airflow-dag-trigger job (created in #91) to trigger the above DAG for its own repo.
dag refresh time might need tweaking
Possibly helpful blog post on breaking up the dag monorepo:
https://tech.scribd.com/blog/2020/breaking-up-the-dag-repo.html
Idea to do something similar with GCS:
https://github.com/GoogleCloudPlatform/gcsfuse
Related Airflow 2.0 improvement proposal:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-5+Remote+DAG+Fetcher
The trigger_k8s_cronjob
task uses the deprecated batch/v1beta1
API for CronJobs.
This is an issue, since most DAGs we have use this task to trigger cronjobs in other namespaces, and will all fail.
the wrong API is called in a few places in this file
dags/trigger_k8s_cronjob.py
to use the correct batch/v1
API when calling a cronjobtrigger_k8s_cronjob.py
contains a flaw where it waits an arbitrary 10 second for a job to be created from the cronjob.
This can cause a failure if the cluster is slow, because the retrieved list of matching jobs will be empty.
Action Item:
Retry getting the jobs (with logging) if the returned list is empty, and abort with a much longer timeout. Maybe we can retry indefinitely and let the cluster timeout the job instead?
Assess the CIIP repo's documentation.
Things to consider:
Todo:
Documentation organisation strategy:
docs
directory at the root of a repodags/trigger_k8s_cronjob.py
is used in most of our DAGs, and could be refactored to be an operator, instead of using the python operator. Maybe we would be able to contribute that upstream too.
This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.
To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.
dormant
or retired
life cycle badge.Thank you for your help ensuring effective governance of our open-source ecosystem!
Airflow 2.8.1 is released, along with many new features and bug fixes, and security improvements.
To do:
To trigger dags, we need to create a script and an ad-hoc helm template for the deployment that needs to trigger the dag.
ciip-portal and ggircs are good examples of that:
https://github.com/bcgov/cas-ciip-portal/blob/develop/.bin/trigger-airflow-dag.sh
https://github.com/bcgov/cas-ciip-portal/blob/develop/helm/cas-ciip-portal/templates/airflow-dag-trigger.yaml
The airflow-dag-trigger chart needs the following values:
The dag should be added to a helm repository using the chart-releaser GH action (see cas-postgres/cas-metabase repo)
When a DAG is first loaded by Airflow, it is paused. Before triggering a run, a patch
request should be sent to ensure that the DAG is not paused
This issue is a kind reminder that your repository has been inactive for 286 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.
To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.
dormant
or retired
life cycle badge.Thank you for your help ensuring effective governance of our open-source ecosystem!
Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).
In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means its critical that we work to make our content as discoverable as possible; Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.
Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca
email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic
.
That's in, you're done!!!
Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz
to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.
If your org is not in the list below, or the table contains errors, please create an issue here.
While you're doing this, add additional topics
that would help someone searching for "something". These can be the language used javascript
or R
; something like opendata
or data
for data only repos; or any other key words that are useful.
Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.
If your application is live, add the production URL.
Short Code | Organization Name |
---|---|
AEST | Advanced Education, Skills & Training |
AGRI | Agriculture |
ALC | Agriculture Land Commission |
AG | Attorney General |
MCF | Children & Family Development |
CITZ | Citizens' Services |
DBC | Destination BC |
EMBC | Emergency Management BC |
EAO | Environmental Assessment Office |
EDUC | Education |
EMPR | Energy, Mines & Petroleum Resources |
ENV | Environment & Climate Change Strategy |
FIN | Finance |
FLNR | Forests, Lands, Natural Resource Operations & Rural Development |
HLTH | Health |
FLNR | Indigenous Relations & Reconciliation |
JEDC | Jobs, Economic Development & Competitiveness |
LBR | Labour Policy & Legislation |
LDB | BC Liquor Distribution Branch |
MMHA | Mental Health & Addictions |
MAH | Municipal Affairs & Housing |
BCPC | Pension Corporation |
PSA | Public Safety & Solicitor General & Emergency B.C. |
SDPR | Social Development & Poverty Reduction |
TCA | Tourism, Arts & Culture |
TRAN | Transportation & Infrastructure |
NOTE See an error or omission? Please create an issue here to get it remedied.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.