Back to Blog
    Ceph
    Proxmox
    High Availability
    SMB
    Storage
    Clustering

    Ceph, HA, and the Minimum Viable Cluster for SMBs

    November 2, 2025
    7 min read
    If you're a small or medium business eyeing high availability (HA) with Proxmox and Ceph, the obvious question hits early: what's the smallest possible cluster setup that actually makes sense? Can you get by with just two nodes and a spare Raspberry Pi pretending to be a quorum device—or is that just a crash waiting to happen? That question kicked off a fiery discussion among infrastructure enthusiasts, and the answers revealed a lot more nuance than you might expect. Turns out, "minimum" doesn't just mean "smallest that technically works"—it means "smallest you can sleep soundly with." ## The HA Illusion of the 2-Node Cluster On paper, you can set up a Proxmox HA cluster with just two nodes and a QDevice for quorum. This config is technically viable. A lot of users do it—especially in labs or very small deployments—and some even run production VMs this way. But there's a reason that many in the community get nervous when they see "2-node HA cluster" and "production" in the same sentence. As one user bluntly put it: "3 is the bare minimum, but I'd never run a production workload on just 3." A 2-node cluster with a QDevice is inherently brittle. Sure, it avoids split-brain with an extra vote, but you're still running on a knife's edge. One node goes down and you're already at full capacity, potentially struggling to handle the load. And that QDevice? It's often something janky—a NAS, a VM, or a Pi. It's not doing any heavy lifting, but if it fails, you've got quorum issues. In real-world terms, you don't just want your HA setup to survive a failure—you want it to survive without drama. ## What Ceph Actually Needs to Breathe If you're planning to run Ceph for storage in a Proxmox cluster, things get heavier fast. Ceph is robust, scalable, and performs well under pressure. But it doesn't like to be cramped. The consensus? You need at least four real Ceph nodes to run it safely, and most people recommend five to start feeling secure. That's not about being fancy—it's about ensuring replication, performance, and durability don't suffer when things get messy. One experienced admin explained it like this: "Four real Ceph nodes and a quorum vote Proxmox VM—and a very strong network backbone, no joke, like 40Gbit/100Gbit." That last bit isn't optional. A Ceph cluster on a 10Gbit network is doable, but you're putting a cap on performance. At the very least, you need a tightly designed 10G infrastructure or clever use of DACs and directly connected NICs to cut switch costs and latency. And even that gets dicey as your node count rises. ## Why 3 Nodes is the Community Sweet Spot (With Caveats) Ask ten people on a Proxmox forum what the minimum cluster size should be, and the most common answer is this: "Three real nodes." That's where quorum gets stable without relying on external gadgets. It also gives you a spare box for updates, testing, or failover without hitting panic mode every time you reboot something. And it lets you run Ceph at a minimum level, although you're still toeing the line when it comes to redundancy. One user noted they use a fourth node without a quorum vote purely for compiling binaries and testing patches—just to avoid risking the cluster. That's how careful real-world deployments get, even at small scale. And while QDevices can help in 2-node scenarios, several users argued that managing two full Proxmox nodes plus a QDevice (on a backup server, Pi, or embedded switch container) is more complex than just adding a third node. As another contributor put it: "Two plus QDevice is harder to administer than just having three nodes." ## Storage Matters: Ceph vs Alternatives for SMBs Ceph isn't the only game in town, especially if you're running lightweight VM workloads. Some admins recommend ZFS replication across nodes for HA, with no need for shared block storage. Others go with iSCSI SANs or even newer tech like LINSTOR/DRBD or SeaweedFS—though these come with their own quirks and integration headaches. The appeal of Ceph is that it's deeply integrated with Proxmox and can scale nicely, but you pay for that with both complexity and hardware demand. One user described skipping Ceph altogether in favor of ZFS replication with just two nodes: "We run a 2-node HA cluster with ZFS replication at work using the two_node corosync option and it works fine." It's not fancy, but for SMBs that don't need hyper-resilient object storage and don't have hundreds of VMs chugging away, it's enough. ## Is It Production or Just a Lab in Disguise? This question kept bubbling up in the thread: What actually counts as production? One user answered it best: "Production is defined by the workloads and how you treat them rather than by size." If you're running a mission-critical service—whether it's a 911 call center or your company's core ERP—and it's on a 2-node cluster with some shaky shared storage, that's production. And that setup might be one hiccup away from a long, painful day. The takeaway? If you're calling it production, treat it like production. That means redundancy, testing, monitoring, backups, and—maybe—spending on that third or fourth node instead of getting clever with QDevices and edge cases. ## Final Thoughts: Build Small, But Smart If you're an SMB trying to do Proxmox HA and Ceph on a budget, there are two roads: **Start lean but solid.** Three Proxmox nodes with enough juice to handle VM failover, paired with ZFS replication and good backups. Add Ceph only if you need distributed storage. **Go all in.** Four or five nodes, proper Ceph setup, high-speed networking, and a clean HA setup with real quorum. Pricey, but future-proof. Trying to force high availability into a 2-node setup isn't just about saving money. It's often about avoiding upfront complexity. But if that "simpler" setup makes your life harder the minute something fails, is it really simpler? The smallest cluster that's "worth it" isn't just the smallest one that works—it's the smallest one that keeps you sleeping at night. And in most real-world cases, that starts at three.