Kubernetes Disaster Recovery & Migration

    Velero Backup

    Velero is the industry standard open-source tool for safely backing up and restoring, performing disaster recovery, and migrating Kubernetes cluster resources and persistent volumes.

    Core Capabilities

    Disaster Recovery: Reduce time to recovery in case of infrastructure loss, data corruption, or service outage.
    Data Migration: Move cluster resources to other clusters or cloud providers.
    Cluster Replication: Replicate your production cluster to development and testing environments.

    Architecture

    Understanding how Velero components interact is crucial for debugging and configuration. Click on the components below to reveal their specific roles in the backup ecosystem.

    Interactive System Map

    Kubernetes Cluster

    Select a Component

    Interact with the diagram on the left to explore the responsibilities of each Velero component. This architecture supports both disaster recovery and stateful application migration.

    Backup Workflow Simulation

    A Velero backup is an orchestrated sequence of API calls, snapshot triggers, and data uploads. Use the controls to step through the lifecycle of a Backup Custom Resource Definition (CRD).

    1
    API Validation
    User submits `velero backup create`. The Kubernetes API validates the CRD.
    2
    Controller Detection
    Velero Controller notices the new Backup resource and validates configuration.
    3
    Pre-Backup Hooks
    Container hooks (e.g., `fsfreeze`) execute to ensure application consistency.
    4
    Resource Collection
    Velero queries the K8s API for all resources matching the label selectors.
    5
    Snapshot & Upload
    Cloud provider plugins trigger volume snapshots. Metadata is tarballed and uploaded to Object Storage.
    STATUS:IDLE

    > Waiting for backup request...

    Performance & Strategy Analysis

    Choosing between Native Cloud Snapshots and File System Backup (FSB - formerly Restic) impacts RTO (Recovery Time Objective) and Cost.

    Recovery Time (RTO) Comparison

    Lower is Better

    Native Snapshots (100GB)~2 min
    FSB/Kopia (100GB)~20 min

    Insight: Native snapshots (EBS, PD) are significantly faster for large volumes because they are block-level operations performed by the cloud provider, bypassing the K8s network layer.

    Storage Efficiency (Deduplication)

    Kopia/Restic vs Snapshot

    FSB with Dedup (30 days)~120GB total
    Full Snapshots (30 days)~400GB total

    Insight: FSB (File System Backup) utilizes deduplication. Over time, for data with small daily change rates, FSB is cheaper than full snapshots, though slower to restore.

    Velero Operator Toolkit

    velero backup create daily-backup-01 --include-namespaces production --ttl 24h0m0s --storage-location aws-s3-primary --snapshot-volumes=true