Back to Blog
    Kubernetes
    MariaDB
    Database
    Operators
    Stateful Workloads

    Why MariaDB Operator 25.10 Is a Big Deal for Stateful Workloads on Kubernetes

    October 31, 2025
    8 min read
    # Why MariaDB Operator 25.10 Is a Big Deal for Stateful Workloads on Kubernetes Running databases in Kubernetes has always felt a bit like trying to fit a square peg in a round hole. Kubernetes was designed for stateless applications, and for the longest time, databases—those pesky stateful, disk-hungry, fail-sensitive creatures—have been treated like second-class citizens in the cloud-native ecosystem. But with the release of **MariaDB Operator 25.10**, things are starting to shift. This update isn't just another bump in version numbers or a routine security patch. It's a significant leap forward, especially for teams that want to run **stateful workloads like MariaDB** inside Kubernetes without duct-taping together half-baked solutions. This release introduces **asynchronous replication as a fully supported feature**, adds **automated replica recovery**, and bakes in several smart operational improvements that make running production-grade MariaDB clusters inside Kubernetes not just possible—but sane. ## Asynchronous Replication Goes GA—And It's Actually Solid The headline feature here is the **general availability (GA) of asynchronous replication**. That may not sound thrilling unless you've ever watched a database go sideways at 3 AM. For most users, asynchronous replication means something pretty straightforward: a primary database server does all the writes, and one or more replicas follow along, pulling changes over as fast as they can. It's not bleeding-edge tech. MySQL and MariaDB have supported this for ages. But what's important is that **MariaDB Operator now understands it deeply**. You define a simple Kubernetes manifest, flip the replication switch, and boom—you've got a primary-replica setup. ```yaml apiVersion: k8s.mariadb.com/v1alpha1 kind: MariaDB metadata: name: mariadb-repl spec: storage: size: 1Gi storageClassName: rook-ceph replicas: 3 replication: enabled: true ``` That's it. Behind the scenes, the operator sets up users, manages credentials, syncs the binary logs, and monitors replication lag. You don't need to babysit it. You just define the desired state and let the operator handle the dirty work. ## Failover You Can Actually Trust Another big win? **Automated primary failover**. If your primary pod dies—and it will, eventually—the operator **automatically picks the most up-to-date replica** and promotes it. This isn't some flaky hack that crosses its fingers and hopes the new primary has all the writes. The operator checks replication lag and relay log application status to ensure the candidate is clean. Here's a sample of what that looks like during a failover: ``` NAME READY STATUS PRIMARY UPDATES mariadb-repl False Switching primary to 'mariadb-repl-1' mariadb-repl-0 ReplicasFirstPrimaryLast ... NAME READY STATUS PRIMARY UPDATES mariadb-repl True Running mariadb-repl-1 ReplicasFirstPrimaryLast ``` That transition takes seconds. It's the kind of zero-touch recovery most teams wish they had when managing databases manually—except now it's baked in. You can even control failover behavior with settings like `autoFailoverDelay` for tuning how aggressively the system promotes a new primary. That's huge for high-availability setups where uptime is measured in dollars per second. ## Replica Recovery That Doesn't Suck Let's talk about the elephant in the cluster: **replica corruption**. Anyone who's dealt with asynchronous replication knows the pain of error code **1236**—the dreaded "replica can't catch up because the primary purged the binary logs" situation. It's a silent killer that leaves your cluster in a weird limbo. MariaDB Operator 25.10 solves this with **automated replica recovery** using a construct called `PhysicalBackup`. If a replica can't recover normally, the operator triggers a recovery flow that takes a volume-level snapshot from a healthy replica and restores it into the broken one. All without manual intervention. And the best part? It actually works: ``` kubectl get mariadb NAME READY STATUS PRIMARY mariadb-repl False Recovering replicas mariadb-repl-1 ... kubectl get mariadb NAME READY STATUS PRIMARY mariadb-repl True Running mariadb-repl-1 ``` Recovery time depends on your storage driver and data size, but it's typically fast enough that you don't need to scramble. For teams dealing with production-grade workloads, this is a godsend. It turns replica recovery from a 30-minute firefight into a non-event. ## Smarter, Safer Scaling and Backups This release doesn't just stop at failover and recovery. It also offers **flexible strategies for scaling out**, including support for different backup methods. You can use fast, local **VolumeSnapshots** for rapid scaling, or switch to **mariadb-backup** for longer-term durability. This gives teams more control over how they balance performance and reliability. For example, you can maintain one `PhysicalBackup` spec for nightly S3 backups and another for instant snapshot-based recovery. The operator supports both, and choosing the right one is as easy as plugging in a different template. ## The Community's Fingerprints Are All Over This It's worth calling out that much of what makes 25.10 so good came from **real-world feedback**. Users in the open-source community reported issues with early replication support, submitted manual recovery runbooks, and pushed the maintainers to refine the operational experience. The maintainers—especially mmontes11, who appears to be spearheading a lot of the development—deserve props for listening and iterating. You can feel the difference between a "built-in-a-bubble" feature and one forged through actual production use. As one user noted in the release discussion, many features exist today **because people kept breaking their clusters and wanted better recovery paths**. That kind of evolution is rare in projects trying to be everything to everyone. ## Not Perfect—But Getting Close There's still some room to grow. Right now, the operator only supports replication within a **single Kubernetes cluster**, not across clusters or regions. That's a limitation for teams building multi-region failover systems. But given how fast things are moving, cross-cluster support feels more like a "when," not "if." There's also the usual caveats about performance. If you're running Kubernetes on-prem, **local storage is a must**. Networked volumes can become a bottleneck, especially with write-heavy workloads. And as the maintainer put it, "Don't make any assumptions—run sysbench." Still, even with those constraints, MariaDB Operator 25.10 brings a level of confidence that stateful workloads inside Kubernetes have often lacked. It's not a bolt-on experiment anymore. It's production-ready, thoughtfully built, and backed by a community that clearly cares. ## TL;DR MariaDB Operator 25.10 doesn't just support asynchronous replication. It **makes it work the way you'd want it to**—automatically, intelligently, and resiliently. With features like: - General availability of async replication - Automated failover to the most up-to-date replica - Snapshot-based replica recovery on error code 1236 - Flexible backup strategies for different use cases …it's a milestone release for anyone looking to move stateful workloads into Kubernetes with minimal drama. If you've been waiting for a sign that running a database in k8s isn't reckless, this is it.