Disaster recovery
The Astro Data Plane is designed to withstand and survive in-region Availability Zone (AZ) degradations and outages as described in Resilience.
To withstand a full region outage and achieve near real-time recovery, Astronomer recommends provisioning at least two Astro clusters in alternate regions. For example, one cluster in AWS us-east-1
and another in us-west-2
. To ensure that both the primary and secondary clusters are in sync, we recommend deploying all changes to both.
To simplify the responsibility of maintaining two Clusters, Astronomer plans to invest in cluster and Deployment syncing strategies in 2022. If you're interested in this functionality, please reach out and share feedback with Astronomer support.
Full region outages
In the case of a full region outage, Astronomer can re-provision your Cluster(s) and all Deployments in an alternate region. The re-provisioning includes:
- Cluster, including all nodes and most cluster-level configuration.
- VPC.
- VPC peering. Customers will need to re-accept peering request.
- Deployments and data pipelines.
- Environment variables.
- API keys.
- Alert emails.
Astronomer will not be able to restore:
- VPC Routes configured by customers via AWS console.
- VPC Security Group rules configured by customers via AWS console.
- DAG history and task logs.
- XComs.
- Airflow configurations (Variables, Connections, Pools) configured via the Airflow UI. Any configurations set via your deployed Astro project image can still be recovered.
Organization settings, Workspace settings, and user management configured in Astro's control plane will be unaffected by a region failure in the data plane.
Astronomer plans to introduce self-serve and automation enhancements as part of our 2022 roadmap. Please submit feedback to Astronomer support if you are interested in joining the conversation.