Whitepaper: Trilio Site Recovery (TSR) — DR for Kubernetes-native VMs

KubeFed Explained: Kubernetes Federation Guide

Table of Contents

Running one Kubernetes cluster is complex enough. Running five across AWS, GCP, and an on-prem data center without a unified control plane gets painful fast. Kubernetes Federation v2 (KubeFed) was built to solve this problem: managing federated Kubernetes clusters from a single point of control and distributing workloads across regions and providers without duplicating YAML files for every environment.

This guide breaks down the KubeFed architecture, its core resource types, and what happened after the project was archived in 2023. It also covers which successor tools have taken its place and a critical gap that no Kubernetes cluster federation layer addresses alone: protecting the actual data your federated workloads depend on.

What Is Kubernetes Federation?

Kubernetes federation is a coordination layer that lets you manage multiple clusters as a single logical unit. Rather than logging into each cluster individually to deploy workloads, you define what you want once, and the federation control plane handles distribution across all of them. Here’s how the pieces fit together.

The Core Problem KubeFed Was Built to Solve

Once an organization moves past a single Kubernetes cluster, operational complexity grows fast. You might need workloads running in us-east-1 and eu-west-1 for latency reasons. You might have a separate cluster on-prem for regulated data. Maybe you run production on AWS and keep a failover cluster on GCP. Each of those clusters has its own API server, its own set of YAML manifests, and its own deployment pipeline. Keeping them in sync manually is tedious, error-prone, and doesn’t scale.

KubeFed was the Kubernetes community’s answer to this problem, developed under SIG Multicluster. It gave teams a single control plane to push workloads across clusters, with per-cluster overrides baked right in. One important caveat is that the KubeFed repository was archived in April 2023 and is no longer actively maintained. We’ll cover the full implications of that later, but if you’re evaluating federation tooling right now, keep this in mind from the start.

The KubeFed Architecture: Host Cluster and Member Clusters

KubeFed follows a hub-and-spoke model. One cluster serves as the host, running the federation controller manager and an admission webhook. The remaining clusters are members, each with its own fully independent Kubernetes control plane. The host communicates with member clusters through their API servers using service account credentials.

Federation layers coordination on top of local cluster management. If the federation control plane goes offline, member clusters continue operating autonomously with whatever resources they already have.

A deliberate design goal of KubeFed was the ability to federate any Kubernetes resource type, including custom resource definitions (CRDs), not just built-in resources like Deployments or ConfigMaps. This flexibility made it a strong fit for teams running complex, custom workloads across multiple environments.

Core KubeFed Custom Resource Types

Three custom resource types control how KubeFed distributes workloads. Each one handles a distinct part of the federation lifecycle, and understanding the role of each is key to working with KubeFed effectively.

Custom Resource
Purpose
Example Use
FederatedTypeConfig
Enables or disables federation for a specific resource type
Enabling federation for Deployments or ConfigMaps
FederatedResource (e.g., FederatedDeployment)
Wraps a standard Kubernetes resource with placement and override specs
Deploying nginx to clusters in US and EU with different replica counts
ReplicaSchedulingPreference
Distributes replicas across clusters based on weights or capacity
Allocating 70% of replicas to the primary cluster and 30% to the secondary

Every FederatedResource contains three sections:

  • Template: The base resource spec that gets distributed
  • Placement: Which clusters receive the resource
  • Overrides: Per-cluster modifications like replica counts or environment variables

This structure keeps things clean. You define the baseline once and only specify what differs per cluster, which reduces duplication and drift.

Key Use Cases and Limitations of Federated Kubernetes

Here’s a closer look at the use cases where federation delivers real value, along with the limitations that deserve your attention before you commit.

High Availability and Cross-Region Failover in Federated Kubernetes Clusters

The strongest case for running federated Kubernetes clusters comes down to surviving regional outages. If all your production workloads sit in a single cluster and that region goes dark, everything goes with it. Federation changes that by spreading the same workload across clusters in different geographic regions, so a failure in one location doesn’t take down your entire service.

KubeFed manages this through ReplicaSchedulingPreference (RSP). You assign weights to your clusters, for example, 60% of replicas on your primary cluster in us-east-1 and 40% on a secondary in eu-west-1. When the federation controller detects that a cluster has gone unreachable (based on the clusterUnavailableDelay setting), it automatically redistributes replicas to healthy clusters according to those weights. No one needs to wake up at 3 AM to manually shift traffic.

There’s a catch, though: Federation alone doesn’t redirect user traffic. You still need a global load balancer or DNS-based routing layer in front of your federated clusters to send requests to whichever cluster is actually healthy. AWS Global Accelerator, Cloudflare Load Balancing, or a service mesh with multi-cluster capabilities can fill that gap. Without that layer, your pods might be running perfectly in eu-west-1 while users keep hitting a dead endpoint in us-east-1.

High Availability and Cross-Region Failover in Federated Kubernetes Clusters

A federated Kubernetes cluster also makes a lot of sense when your infrastructure is spread across multiple cloud providers or when you’re running a mix of cloud and on-prem environments. Suppose you have most of your compute on AWS, a GCP cluster handling ML pipelines, and an on-prem cluster for sensitive data that can’t leave your network. Without federation, each cluster gets its own manifests, its own deployment tooling, and its own operational burden.

With KubeFed, you define a FederatedDeployment once on the host cluster. The placement spec targets all three environments, and overrides handle provider-specific differences like storage classes or node selectors. This cuts out per-cluster YAML duplication and reduces configuration drift. Data residency requirements become straightforward to enforce: EU customer data stays on EU clusters, US traffic stays on US clusters, all orchestrated from a single control plane. That consistency also lowers vendor lock-in risk because your workload definitions aren’t coupled to any one provider’s tooling.

Known KubeFed Limitations to Plan For

Kubernetes cluster federation is a powerful pattern, but it has hard boundaries. If you don’t account for these gaps early, they’ll surface at the worst possible time. Here are the most significant limitations to plan around before committing to a federated Kubernetes deployment:

  • Resource definitions only, not runtime state: KubeFed pushes specs to member clusters. If a pod crashes, the local kubelet handles the restart. The federation layer has zero visibility into pod-level health or runtime behavior.
  • No built-in cross-cluster DNS or service discovery: Traffic routing between clusters requires external-dns, a service mesh, or a global load balancer. KubeFed doesn’t provide this functionality.
  • Stateful workloads need storage-level coordination: Federating a Deployment does not replicate persistent volume data between clusters. Etcd corruption, accidental namespace deletion, or ransomware destroying PV data cannot be recovered through Kubernetes cluster federation alone.
  • Multi-cluster debugging is harder: Incident response slows down significantly when failures span clusters. Centralized logging (Grafana Loki, Elasticsearch) and distributed tracing should be fully operational before you go live with federation.
  • KubeFed is archived: No security patches, no bug fixes, no compatibility updates for newer Kubernetes versions. Teams still running it need a plan, whether that means maintaining a fork or migrating to a supported platform.

Kubernetes Cluster Federation: Deprecation and the Path Forward

KubeFed served as the primary federation tool for years, but the project hit a wall. If you’re building a multi-cluster strategy today, you need to understand what replaced it and how to make the transition without getting stuck on abandoned software.

Why KubeFed Was Archived

The KubeFed repository was officially archived in April 2023. The SIG Multicluster working group made this call after years of struggling with issues that proved difficult to resolve within KubeFed’s existing design. The federated CRD model, which wrapped every resource type in a FederatedResource, added significant operational complexity. Scaling beyond a handful of clusters exposed performance bottlenecks in the controller manager, and enterprises kept running into gaps that required heavy customization just to meet basic production requirements.

The architecture was sound as a proof of concept for Kubernetes federation, but it couldn’t carry the weight of real-world, large-scale deployments. The community decided it was better to let successor projects build on the lessons learned rather than keep patching a foundation that had reached its ceiling.

Successor Projects: Karmada and KubeAdmiral

Karmada is the most direct successor to KubeFed. It inherits the core concepts (host cluster, member clusters, resource distribution) but breaks them into cleaner, standalone APIs: Propagation Policy, Override Policy, and Resource Template. Instead of wrapping every resource in a federated CRD, you apply policies separately. This decoupling makes the system easier to reason about and extend. 

Karmada also introduces both Push and Pull cluster registration modes, whereas KubeFed only supported Push. Pull mode lets clusters behind firewalls or in restricted networks register themselves, which is a significant improvement for hybrid and edge deployments.

Migration from KubeFed to Karmada is well-documented. CLI commands map closely: kubefedctl join becomes karmadactl join, and the conceptual model transfers without a complete rethink of your federation strategy.

KubeAdmiral, originally developed by ByteDance and now a CNCF project, takes a different angle. It builds directly on KubeFed v2’s codebase but adds enhanced scheduling capabilities, topology-aware distribution, and better performance at scale. If your team is already deep into KubeFed’s resource model and wants an evolutionary upgrade rather than a conceptual shift, KubeAdmiral is worth evaluating.

What to Evaluate When Choosing a Federated Kubernetes Tool Today

Here’s a practical evaluation framework that covers the areas most likely to matter once you’re running federation in production:

  1. Assess your cluster registration needs: Determine whether you require Push-only or both Push and Pull modes. Pull is essential if member clusters sit behind NAT or restrictive firewalls.
  2. Evaluate scheduling flexibility: Test whether the tool supports weighted distribution, topology constraints, and affinity rules that match your workload placement requirements.
  3. Verify CRD and native API support: Confirm that your custom resources federate cleanly. Some tools handle this better than others, especially for complex operator-managed workloads.
  4. Check community activity and release cadence: Look at commit frequency, open issue response times, and whether the project has active maintainers. An archived or stagnant project is a liability.
  5. Plan backup and recovery independently: No federation tool handles persistent volume backup or application-consistent snapshots. This must be addressed as a separate concern regardless of which tool you choose.

This evaluation process keeps you from repeating the same mistake: adopting a tool that checks boxes on paper but can’t hold up under production pressure. The underlying components of each cluster, from etcd to persistent volumes, remain independently managed. Your federation tool selection needs to account for that autonomy. That also means your data protection strategy should be designed per-cluster, treating each one as its own failure domain rather than assuming the federation layer will cover it.

Protecting Data Across Federated Kubernetes Clusters

Kubernetes federation handles where your workloads run. It doesn’t handle what happens when the data behind those workloads disappears. That distinction matters more than most teams realize, and it usually becomes obvious at the worst possible time: when you’re staring at an empty database after a restore.

Why Federation Alone Doesn't Cover Disaster Recovery

KubeFed and its successors propagate resource definitions (Deployments, ConfigMaps, Services) across member clusters. What they don’t propagate is persistent volume data, etcd snapshots, or application state. If a StatefulSet running PostgreSQL gets federated to three clusters, each cluster holds its own independent data. Lose one cluster’s storage to corruption, accidental deletion, or a ransomware event, and federation will happily redeploy an empty application to a healthy cluster. The pods come up; the data doesn’t.

Federation distributes workloads. Backup protects the data those workloads depend on. Both are required: neither substitutes for the other.

This gap is easy to miss when your focus is on scheduling and placement policies. A federated Kubernetes cluster without a data protection layer gives you redundancy at the orchestration level while leaving the most valuable asset, your actual data, completely exposed. It’s the kind of blind spot that only shows up during an incident, and at that point, your options are limited.

How Trilio for Kubernetes Fills the Gap

Trilio for Kubernetes takes an application-centric approach to backup and recovery. Rather than snapshotting individual volumes or exporting raw YAML, it captures entire Kubernetes applications as a single consistent unit: persistent volumes, metadata, Helm releases, Operators, and custom resources all included. Pre- and post-backup hooks ensure that databases like MySQL, PostgreSQL, and Redis reach a consistent state before any snapshot is taken, which eliminates the partial-write problems that plague volume-only backup tools. If you’re running stateful workloads like MongoDB, this application-aware consistency is especially important.

The following table breaks down exactly what federation tools and Trilio each handle, so you can see where your current setup might have gaps:

Capability
KubeFed / Karmada
Trilio for Kubernetes
Cross-cluster workload distribution
Yes
No
Persistent volume backup
No
Yes (incremental, application-consistent)
Application-consistent snapshots
No
Yes (with pre/post hooks)
Point-in-time restore
No
Yes
Cross-cluster migration
Definitions only
Yes (full application with data)
Ransomware protection
No
Yes (immutable backups)

Automated Kubernetes Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

Trilio integrates through native Kubernetes APIs with no sidecar agents required. It supports S3-compatible object storage, NFS, and cloud-native storage backends, which means it fits directly into the same hybrid and multi-cloud environments where you’d run federated Kubernetes clusters. Backup schedules and retention rules are defined declaratively, keeping everything consistent with how your team already manages Kubernetes resources.

If you’re operating across multiple clusters, Trilio for Kubernetes ensures that the data layer stays protected regardless of which federation tool handles the orchestration layer. Federation gives you resilience at the scheduling level. Trilio gives you resilience where it actually counts: your data. Schedule a demo to see how Trilio protects your multi-cluster Kubernetes data. 

Conclusion

KubeFed established the foundation for managing multiple Kubernetes clusters from a single control plane. The project is now archived, but the problems it tackled haven’t disappeared. Multi-cluster operations have actually become more common as teams distribute workloads across providers and regions. Tools like Karmada and KubeAdmiral carry the federation concept forward with cleaner APIs and improved scalability, so the pattern remains very much alive.

The key takeaway is that federation and data protection solve different problems, and you need both. One determines where your workloads run, while the other ensures that you can actually recover when something breaks. If you’re building or rethinking a multi-cluster strategy right now, start with a supported federation tool, then layer in application-consistent backup before you go to production. That order matters. You don’t want your first data loss incident to be what finally forces the backup conversation.

FAQs

Can KubeFed still be used in production Kubernetes environments?

Since the KubeFed repository was archived in April 2023, it no longer receives security patches or compatibility updates for newer Kubernetes versions, making it a risky choice for production. Teams still running it should plan a migration to actively maintained alternatives like Karmada or KubeAdmiral.

What is the difference between Karmada and KubeFed for multi-cluster management?

Karmada decouples propagation policies from resource definitions instead of wrapping every resource in a federated CRD, which reduces complexity. It also supports both Push and Pull cluster registration modes, while KubeFed only supported Push, making Karmada a better fit for hybrid and edge environments.

Does Kubernetes federation automatically handle cross-cluster traffic routing?

No. Federation tools only distribute resource definitions to member clusters and do not manage DNS or network traffic between them. You need a separate solution like a global load balancer, external-dns, or a service mesh such as Istio to route user requests to healthy clusters.

How do you protect persistent volume data in a federated Kubernetes setup?

Federation does not replicate or back up persistent volume data across clusters, so you need a dedicated backup tool that captures application state, volumes, and metadata as a consistent unit. Without this layer, a storage failure on one cluster means losing data even if the workload is quickly redeployed elsewhere.

Can KubeFed federate custom resource definitions across clusters?

Yes, KubeFed was designed to federate any Kubernetes resource type, including CRDs, not just built-in resources like Deployments or Services. This capability carries forward into successor tools, though you should verify that your specific custom resources propagate cleanly before relying on them in production.

Sharing

Author

Picture of Kevin Jackson

Kevin Jackson

Related Articles

Copyright © 2026 by Trilio

Powered by Trilio