Skip to main content

Enable data lineage for external systems

To generate lineage graphs for your data pipelines, you first need to configure your data pipelines to emit lineage data. Because lineage data can be generated in all stages of your pipeline, you can configure pipeline components outside of Astro, such as dbt or Databricks, to emit lineage data whenever they're running a job. Coupled with lineage data emitted from your DAGs, Astro generates a lineage graph that can provide context to your data before, during, and after it reaches your Deployment.

Lineage architecture

Lineage data is generated by OpenLineage. OpenLineage is an open source standard for lineage data creation and collection. The OpenLineage API sends metadata about running jobs and datasets to Astro. Every Astro Organization includes an OpenLineage API key that you can use in your external systems to send lineage data back to your Control Plane.

Diagram showing how lineage data flows to Astro

Configuring a system to send lineage data requires:

  • Installing an OpenLineage backend to emit lineage data from the system.
  • Specifying your Organization's OpenLineage API endpoint to send lineage data to the Astro control plane.
tip

You can access this documentation directly from the Lineage tab in the Cloud UI. The embedded documentation additionally loads your Organization's configuration values, such as your OpenLineage API key and your Astro base domain, directly into configuration steps.

Retrieve your OpenLineage API key

To send lineage data from an external system to Astro, you must specify your Organization's OpenLineage API key in the external system's configuration.

  1. In the Cloud UI, click the Lineage tab.

  2. In the left menu, click Integrations:

    Location of the "Integrations" button in the Lineage tab of the Cloud UI

  3. In Getting Started, copy the value below OpenLineage API Key.

For more information about how to configure this API key in an external system, review the Integration Guide for the system.

Integration guides

Lineage is configured automatically for all Deployments on Astro Runtime 4.2.0+. To add lineage to an existing Deployment that is running on a version of Astro Runtime that is lower than 4.2.0, upgrade to the latest version. For instructions, see Upgrade Astro Runtime.

Note: If you don't see lineage features enabled for a Deployment on Runtime 4.2.0+, then you might need to push code to the Deployment to trigger the automatic configuration process.

To configure lineage on an existing Deployment on Runtime <4.2.0 without upgrading Runtime:

  1. In your locally hosted Astro project, update your requirements.txt file to include the following line:

    openlineage-airflow
  2. Push your changes to your Deployment.

  3. In the Cloud UI, set the following environment variables in your Deployment:

    AIRFLOW__LINEAGE__BACKEND=openlineage.lineage_backend.OpenLineageBackend
    OPENLINEAGE_NAMESPACE=<your-deployment-namespace>
    OPENLINEAGE_URL=https://<your-astro-base-domain>
    OPENLINEAGE_API_KEY=<your-lineage-api-key>

Verify

To view lineage metadata, go to the Organization view of the Cloud UI and open the Lineage tab. You should see your most recent DAG run represented as a data lineage graph in the Lineage page.

Note: Lineage information appears only for DAGs that use operators that have extractors defined in the openlineage-airflow library, such as the PostgresOperator and SnowflakeOperator. For a list of supported operators, see Data lineage Support and Compatibility.

Note: If you don't see lineage data for a DAG even after configuring lineage in your Deployment, you might need to run the DAG at least once so that it starts emitting lineage data.

Make source code visible for Airflow operators

Because Workspace permissions are not yet applied to the Lineage tab, viewing source code for supported Airflow operators is off by default. If you want users across Workspaces to be able to view source code for Airflow tasks in a given Deployment, create an environment variable in the Deployment with a key of OPENLINEAGE_AIRFLOW_DISABLE_SOURCE_CODE and a value of False. Astronomer recommends enabling this feature only for Deployments with non-sensitive code and workflows.