,

Apache Airflow, A Quick Guide to Integrating CWL Workflows Into Your Airflow infrastructure

Blake Bradford Avatar

·

Apache Airflow is a powerful workflow management platform widely used for orchestrating complex data processing pipelines. With the emergence of Common Workflow Language (CWL) v1.0, there is now a standardized way to define and execute bioinformatics and scientific workflows.

In this article, we will explore cwl-airflow-parser, a package that extends Apache Airflow with support for CWL v1.0. We will cover the installation process, system requirements, and demonstrate how to integrate CWL workflows seamlessly into your existing Airflow infrastructure.

Installation
To get started with cwl-airflow-parser, ensure you have Python 3.6 installed on your system. Using pip, run the following command to install the package:

sh
pip3.6 install -U cwl-airflow-parser

Please note that cwl-airflow-parser has been tested on Ubuntu 16.04.3 and Mac OS X Sierra/High Sierra. Additionally, the package requires Docker and Node.js to be installed on your system. Make sure your system meets these criteria before proceeding.

Usage
Once cwl-airflow-parser is installed, you can start integrating CWL workflows into your Airflow infrastructure. Here is a basic example to get you started:

“`python
from cwl_airflow_parser import CWLDAG, CWLJobDispatcher, CWLJobGatherer
from datetime import timedelta

def cwl_workflow(workflow_file):
dag = CWLDAG(default_args={
‘owner’: ‘airflow’,
’email’: [‘my@email.com’],
’email_on_failure’: False,
’email_on_retry’: False,
‘retries’: 20,
‘retry_exponential_backoff’: True,
‘retry_delay’: timedelta(minutes=30),
‘max_retry_delay’: timedelta(minutes=60 * 4)
},
cwl_workflow=workflow_file)
dag.create()
dag.add(CWLJobDispatcher(dag=dag), to=’top’)
dag.add(CWLJobGatherer(dag=dag), to=’bottom’)

return dag

cwl_workflow(“/path/to/my/workflow.cwl”)
“`

Above is a sample script that demonstrates the integration of a CWL workflow using cwl-airflow-parser. It creates a CWLDAG object, sets default arguments for the workflow, and adds the CWLJobDispatcher and CWLJobGatherer to the DAG. Finally, it returns the DAG.

By using cwl-airflow-parser, you can fully leverage the capabilities of Apache Airflow to streamline job scheduling and execution. The package provides a seamless integration experience for running CWL v1.0 workflows, allowing you to automate your data processing tasks with ease.

Conclusion
In this article, we have explored cwl-airflow-parser and its role in extending Apache Airflow with CWL v1.0 support. We have covered the installation process, system requirements, and demonstrated the usage of the package. By integrating CWL workflows into your Airflow infrastructure, you can enhance workflow automation and optimize data processing.

If you are working with complex bioinformatics or scientific workflows, cwl-airflow-parser is a valuable tool to streamline your data processing pipelines. Start leveraging the power of Apache Airflow and CWL v1.0 today!

References
Apache Airflow
Common Workflow Language v1.0
cwl-airflow-parser Documentation

Leave a Reply

Your email address will not be published. Required fields are marked *