GithubHelp home page GithubHelp logo

ktaletsk / cwl-airflow-compose Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 9.66 MB

Use this Docker Compose environment to run Airflow 2.1.4 with CWL-Airflow.

Shell 10.85% Python 50.92% Dockerfile 23.71% Common Workflow Language 14.53%

cwl-airflow-compose's Introduction

CWL-Airflow Using Docker Compose

Use this Docker Compose file to run Airflow 2.1.4 with CWL-Airflow 1.2.0 installed. The compose file will start a webserver, scheduler, and worker and all other necessary parts.

Table of Contents

Configuration and Installation

  • First, make sure that each containers can access persistant data via volumes. These specify locations in your local filesystem, managed by Docker.
    • Navigate to the .env to specify the location of your home directory where you will store important folders (AIRFLOW_HOME, CWL_INPUTS_FOLDER, CWL_OUTPUTS_FOLDER, CWL_PICKLE_FOLDER, CWL_TMP_FOLDER)

    • If encountering problems with folder location, make sure that the folder locations are specefied inside airflow.cfg. Navigate to airflow.cfg and include/update the parameters, at the bottom of the document, which specify the folder locations. After updating the file, copy the file back into the webserver container where it is used to initialize airflow and its many parameters.

Use the command

docker compose up --build

Note: Compose will create all volumes at run time if they do not already exist. Additionally MySQL database and storage will also be created automatically upon building.

Running first CWL workflow

You will need three things to run your first CWL script

  1. upload your CWL file to your home directory
  2. make a python file inside dags folder that uses CWLDAG library (example below)
#!/usr/bin/env python3
from cwl_airflow.extensions.cwldag import CWLDAG
dag = CWLDAG(workflow="/absolute/path/to/workflow.cwl", dag_id="my_dag_name")
  1. upon runtime, indclude any necessary configuration parameters like "job" in a .json file
    • When triggering your DAG, use a .json job file to specify inputs. Select "trigger w/ configuration" inside of the Airflow UI. Also look to the CWL-Airflow "how-to-use.md" to learn about API usage. CWL-Airflow supplies its own API to execute DAGs with PUSH and accompanying job configuration.

      For example, upperback.cwl is a simple workflow that takes a message, changes it to uppercase, reverses the text, and outputs the result (using inLineJaveRequrement, CommandLineTool, and ExpressionTool). It needs an input, specified upon triggering, like this:

{
  "job": {
    "message": "whats up",
    "scale": 1
  }
}

Features and Bugs

  • jobs folder is personal storage; it is not necessary to save in this location

  • Similarly, dag_storage is a place to store and edit potential CWL code--however, DAGs inside of the dags folder may point to CWL script stored in this specific location

  • Inside this repository, some files are irrelevent (just reminence of my originial local file). "compressed_workflow.gz" and "compressed_workflow_base64.txt" are files that were used to PUSH a new CWL workflow to Airflow using the API.

The API request looked something like this:

$ curl -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "workflow": "/Users/john.mcauliffe/documents/dag-storage/scatter.cwl",
    "workflow_content": "/Users/John.mcauliffe/documents/project1/compressed_workflow_base64.txt"
  }' \
  "http://localhost:8081/api/experimental/dags?dag_id=my_new_dag"

Common errors:

  • schema_salad.exceptions.ValidationException: Not found: '/Users/john.mcauliffe/documents/dag-storage/new.cwl'
    • fix: make sure your /path/to/dag is inside the same project folder like it is here.
  • FileNotFoundError: [Errno 2] No such file or directory: '/Users/john.mcauliffe/documents/project1/cwl_pickle_folder/3c18fa08f8beabcef4278a5f54503482.p'
    • fix: this error is similar to the above mention; make sure all PATHs are correct and local file organization is correct for all documents including your "dag.py"
  • recursion reached maximum depth error
    • fix: try debugging your python operator; the problem is likely with incorrect syntax inside a python file

More Complex and Tips

  • Try using the TriggerDagRunOperator to compile multiple DAGs into one

    • See combine.py for an example of this operator in use
  • Another useful tool inside Airflow: Sensor Operators. These Operators can be used to monitor the behavior of other tasks and DAGs. They can execute functions when certain criteria are met (e.g. the successful completetion of a seperate task)

    • Check out the ExternalTaskSensor Operator that I use in sensor_example.py
    • In this example, the sensors are referencing tasks from an external DAG, hence the operator name, however the same operator, and other sensor operators, can be used more simply (inside the DAG their monitoring instead of pointing outside of the file).
    • Sensor Operators can be used to check task status, outputs, inputs, and configuration

cwl-airflow-compose's People

Contributors

jrmcauliffe00 avatar ktaletsk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.