Skip to main content

Run a task in Azure Container Instances with Airflow

Azure Container Instances (ACI) is one service that Azure users can leverage for working with containers. In this tutorial, you'll learn how to orchestrate ACI using Airflow and create a DAG that runs a task in an ACI container.

info

All code in this tutorial can be found on the Astronomer Registry.

Time to complete

This tutorial takes approximately 30 minutes to complete.

Assumed knowledge

To get the most out of this tutorial, make sure you have an understanding of:

Prerequisites

To complete this tutorial, you need:

Step 1: Create an Azure service principal

An Azure service principal is required for external tools like Airflow to connect to your Azure resources. Identify the Azure resource group you want to create your ACI in (or create a new one), then create service principal with write access over that resource group. For more information, see Use the portal to create an Azure AD application and service principal that can access resources.

Step 2: Configure your Astro project

Now that you have your Azure resources configured, you can move on to setting up Airflow.

  1. Create a new Astro project:

    $ mkdir astro-aci-tutorial && cd astro-aci-tutorial
    $ astro dev init
  2. Add the following line to the requirements.txt file of your Astro project:

    apache-airflow-providers-microsoft-azure

    This installs the Azure provider package that contains all of the relevant ACI modules.

  3. Run the following command to start your project in a local environment:

    astro dev start

Step 3: Add an Airflow connection to ACI

Add a connection that Airflow will use to connect to ACI. In the Airflow UI, go to Admin -> Connections.

Create a new connection named azure_container_conn_id and choose the Azure Container Instance connection type.

Specify your Client ID in the Login field, Client Secret in the Password field, and Tenant and Subscription IDs in the Extras field as json. It should look something like this:

ACI Connection

Step 4: Choose a Docker image

Choose a Docker image that you want to run. The AzureContainerInstancesOperator will run any Docker image in a container with your specifications. If you don't have an image, you can use a pre-built one such as Docker's hello-world:latest image. You can search for other available images in Docker's container image repository.

Step 5: Create your DAG

In your Astro project dags/ folder, create a new file called aci-pipeline.py. Paste the following code into the file:

from airflow import DAG
from airflow.providers.microsoft.azure.operators.azure_container_instances import AzureContainerInstancesOperator
from datetime import datetime, timedelta


with DAG('azure_container_instances',
start_date=datetime(2020, 12, 1),
max_active_runs=1,
schedule_interval='@daily',
default_args = {
'retries': 1,
'retry_delay': timedelta(minutes=1)
},
catchup=False
) as dag:

opr_run_container = AzureContainerInstancesOperator(
task_id='run_container',
ci_conn_id='azure_container_conn_id',
registry_conn_id=None,
resource_group='<your-resource-group>',
name='azure-tutorial-container',
image='hello-world:latest',
region='East US',
cpu=1,
memory_in_gb=1.5,
fail_if_exists=False

)

Update the resource_group parameter to the name of the resource group you created in Step 1. You may wish to update some of the other parameters in your operator, particularly the image and registry_conn_id if you chose a different Docker image. The following parameters are defined in this example:

  • ci_conn_id: The connection ID for the Airflow connection you created in Step 3.
  • registry_conn_id: The connection ID to connect to a registry. In this tutorial we use DockerHub, which is public and does not require credentials, so we pass in None.
  • resource_group: The Azure resource group you created in Step 1.
  • name: The name you want to give your ACI. Note that this must be unique within the resource group.
  • image: The Docker image you chose in Step 4. In this case we use a simple Hello World example from Docker.
  • region: The Azure region we want our ACI deployed to
  • CPU: The number of CPUs to allocate to your container. In this example we use the default minimum. For more information on allocating CPUs and memory, refer to the Azure documentation.
  • memory_in_gb: The amount of memory to allocate to the container. In example we use the default minimum.
  • fail_if_exists: Whether you want the operator to raise an exception if the container group already exists (default value is True). If it's set to False and the container group name already exists within the given resource group, the operator will attempt to update the container group based on the other parameters before running and terminating upon completion.

You can also provide the operator with other parameters such as environment variables, volumes, and a command as needed to run the container. For more information on the AzureContainerInstancesOperator, check out the Astronomer Registry.

info

This operator can also be used to run existing container instances and make certain updates, including the docker image, environment variables, or commands. Some updates to existing container groups are not possible with the operator, including CPU, memory, and GPU; those updates require deleting the existing container group and recreating it, which can be accomplished using the AzureContainerInstanceHook.

Step 6: Run the DAG and review the task logs

Go to the Airflow UI, unpause your azure_container_instances DAG, and trigger it to run the image in your ACI. An ACI will spin up, run the container with the Hello World image, and spin down. Go to the Airflow task log, and you should see the printout from the container has propagated to the logs:

ACI Task Log

Additional considerations

There are multiple ways to manage containers with Airflow on Azure. The most flexible and scalable method is to use the KubernetesPodOperator. This lets you run any container as a Kubernetes pod, which means you can pass in resource requests and other native Kubernetes parameters. Using this operator requires an AKS cluster (or a hand-rolled Kubernetes cluster).

If you are not running on AKS, ACI can be a great choice:

  • It's easy to use and requires little setup
  • You can run containers in different regions
  • It's typically the cheapest; since no virtual machines or higher-level services are required, you only pay for the memory and CPU used by your container group while it is active
  • Unlike the DockerOperator, it does not require running a container on the host machine

With these points in mind, Astronomer recommends using ACI with the AzureContainerInstancesOperator for testing or lightweight tasks that don't require scaling. For heavy production workloads, you should use AKS and the KubernetesPodOperator.