Skip to main content

Connect Astro to GCP data sources

Use the information provided here to learn how you can securely connect your Astro data plane to your existing Google Cloud Platform (GCP) instance. A connection to GCP allows Astro to access data stored on your GCP instance and is a necessary step to running pipelines in a production environment.

Connection options

The connection option that you choose is determined by the requirements of your organization and your existing infrastructure. You can choose a straightforward implementation, or a more complex implementation that provides enhanced data security. Astronomer recommends that you review all of the available connection options before selecting one for your organization.

Publicly accessible endpoints allow you to quickly connect Astro to GCP. To configure these endpoints, you can use one of the following methods:

When you use publicly accessible endpoints to connect Astro and GCP, traffic moves directly between your Astro data plane and the GCP API endpoint. Data in this traffic never reaches the control plane, which is managed by Astronomer.

Authorization options

Authorization is the process of verifying a user or service's permissions before allowing them access to organizational applications and resources. Astro clusters must be authorized to access external resources from your cloud. Which authorization option that you choose is determined by the requirements of your organization and your existing infrastructure. Astronomer recommends that you review all of the available authorization options before selecting one for your organization.

To allow data pipelines running on GCP to access Google Cloud services in a secure and manageable way, Google recommends using Workload Identity. All Astro clusters on GCP have Workload Identity enabled by default. Each Astro Deployment is associated with a Google service account that's created by Astronomer and is bound to an identity from your Google Cloud project's fixed workload identity pool.

To grant a Deployment on Astro access to external data services on GCP, such as BigQuery:

  1. In the Cloud UI, select a Workspace, select a Deployment, and then copy the value in the Namespace field.

  2. Use the Deployment namespace value and the name of your Google Cloud project to identify the Google service account for your Deployment.

    Google service accounts for Astro Deployments are formatted as follows:

    astro-<deployment-namespace>@<gcp-project-name>.iam.gserviceaccount.com

    For example, for a Google Cloud project named astronomer-prod and a Deployment namespace defined as nuclear-science-2730, the service account for the Deployment would be:

    astro-nuclear-science-2730@astronomer-prod.iam.gserviceaccount.com
    info

    GCP has a 30-character limit for service account names. For Deployment namespaces which are longer than 24 characters, use only the first 24 characters when determining your service account name.

    For example, if your Google Cloud project is named astronomer-prod and your Deployment namespace is nuclear-scintillation-2730, the service account name is:

    astro-nuclear-scintillation-27@astronomer-prod.iam.gserviceaccount.com

  3. Grant the Google service account for your Astro Deployment an IAM role that has access to your external data service. With the Google Cloud CLI, run:

    gcloud projects add-iam-policy-binding $GOOGLE_CLOUD_PROJECT --member=serviceAccount:astro-<deployment-namespace>@<gcp-project-name>.iam.gserviceaccount.com --role=roles/viewer

    For instructions on how to grant your service account an IAM role in the Google Cloud console, see Grant an IAM role.

  4. Optional. Repeat these steps for every Astro Deployment that requires access to external data services on GCP.