Running Home Assistant on Kubernetes: Overkill or Exactly Right?

Yes, I run my smart home on a 4-node Kubernetes cluster built from Orange Pi 5 single-board computers. Yes, Home Assistant runs as a Helm-managed StatefulSet with PostgreSQL, Longhorn-replicated storage, and CiliumNetworkPolicy enforcement. No, I don’t think that’s overkill.

Okay — I think it’s a little overkill. But hear me out.

The Honest Version

I’m not going to pretend I run Home Assistant on Kubernetes because it’s the optimal choice for home automation. If all you want is to control some lights and a thermostat, flash an SD card with Home Assistant OS and you’re done in twenty minutes. It’s genuinely good software, and the standard deployment is fine for most people.

But I already had a 4-node Kubernetes cluster. I built it to learn Kubernetes on real hardware — ARM64 SBCs running actual workloads, not Minikube on my laptop pretending to be a cluster. The cluster runs my AI agent platform, graph databases, security scanning pipelines, and various tool servers. Home Assistant is one more workload on a platform that already exists.

The question was never “should I build a Kubernetes cluster for home automation?” It was “I have a Kubernetes cluster — should I add home automation to it?” And that’s a very different question.

What You Actually Get

Here’s what running HA on Kubernetes buys you, in order of how much I actually care about each one.

PostgreSQL from Day One

This was the non-negotiable. Home Assistant defaults to SQLite for its recorder database — all your entity state history, event logs, everything. SQLite is great software. SQLite on network-attached storage is not great software.

Longhorn provides storage via iSCSI. SQLite on iSCSI means Write-Ahead Logging (WAL) locking issues under concurrent access. The “database is locked” errors aren’t theoretical — they’re the #1 complaint on the Home Assistant forums for anyone running on networked storage. Even on local storage, SQLite starts to struggle once you have enough entities and enough history.

PostgreSQL eliminates the problem entirely. It handles concurrent writes natively, it performs better under load, and it’s a well-understood database that I can monitor, back up, and restore with standard tooling. I chose a plain postgres:16-alpine StatefulSet — no operator, no Bitnami (licensing changed in 2025), just PostgreSQL with a Longhorn PVC and a CronJob for pg_dump.

Thirty days of history, 5-second commit interval, noisy domains excluded. The database hums along at about 100m CPU and 256Mi memory. Boring is good.

Storage Resilience

Longhorn replicates every volume across two nodes. When I take a node offline for maintenance — kernel updates, thermal paste, whatever — Home Assistant’s config and database remain available. The pod reschedules to another node, picks up the replicated volume, and carries on.

On a standalone Pi, pulling the power means pulling the only copy of your data. Yes, you can set up backups. But “my data is replicated across two nodes with automated snapshots every six hours” is a different posture than “I should probably remember to back this up.”

I also get Longhorn’s snapshot and backup system for free. Snapshots every 6 hours, retained for 10 iterations. Daily backups to an S3-compatible target, retained for 30 days. If I manage to corrupt something, I’m never more than 6 hours from a recovery point.

Rolling Updates

Updating Home Assistant is a Helm value change:

helm upgrade home-assistant pajikos/home-assistant \
  --set image.tag=2026.2.1 \
  --namespace home-assistant

The StatefulSet performs a rolling restart. If the new version breaks something — and HA breaking changes are a recurring theme in the community — helm rollback takes me to the previous state in seconds. On HA OS, a bad update means restoring from backup and hoping you had a recent one.

CiliumNetworkPolicy for IoT Security

IoT devices are, broadly speaking, a security nightmare. They’re chatty, they’re poorly patched, and they’re on your network. Running them on a cluster with Cilium means every component gets explicit network policies.

My PostgreSQL instance accepts connections only from the Home Assistant pod, on port 5432, TCP only. Nothing else in the cluster can reach it. As I add MQTT brokers and Zigbee gateways in Phase 2, each gets similarly scoped policies. A compromised smart device that should only speak MQTT can’t pivot to my database because the network policy enforces that at the kernel level via eBPF.

There’s a wrinkle: Home Assistant itself runs with hostNetwork: true for mDNS device discovery, which bypasses Cilium’s enforcement for that specific pod. I’ve written about this tradeoff in detail on the project page — it’s an accepted tradeoff, not an ignored one, mitigated by Tailscale (no public internet exposure), HA’s own auth, and security scanning that flags the hostNetwork usage to keep it visible.

Unified Operations

One set of tools for everything. kubectl logs to debug Home Assistant. kubectl describe to check pod health. The Longhorn dashboard to verify replication status. The same monitoring and alerting I use for every other workload.

When something goes wrong at 11 PM — and with home automation, it will — I’m not context-switching between “how do I debug HA OS” and “how do I debug my K8s workloads.” It’s all the same workflow.

What It Costs

I’d be dishonest if I didn’t talk about the costs. There are real ones.

Complexity

There’s no way around it. A Helm chart with a values file, a separate PostgreSQL StatefulSet, CiliumNetworkPolicies, Longhorn volume configuration — that’s a lot of moving parts compared to “install HA OS on an SD card.” When something breaks, the debugging surface area is larger.

The hostNetwork requirement for mDNS is a good example. On HA OS, device discovery just works. On Kubernetes, you need to understand why pod networking breaks mDNS, evaluate your options (hostNetwork, Multus, Avahi reflectors), make a decision, document the tradeoff, and configure the mitigation. That’s an hour of work and an ADR where HA OS is zero effort.

The Container Mode Gap

Running HA in Container mode means you lose the Supervisor and the Add-ons store. Every integration that would normally be a one-click add-on — Mosquitto, Zigbee2MQTT, ESPHome, Frigate — becomes a separate Kubernetes deployment that you manage yourself.

For someone already running Kubernetes, this is arguably a feature. Each component has its own lifecycle, resource limits, and security policy. But it does mean more YAML, more Helm charts, and more things to keep updated.

Learning Curve

If you don’t already know Kubernetes, adding it to your home automation stack is a terrible idea. You’ll spend more time learning K8s than actually automating your home. The cluster should be the thing you already have and understand, not the thing you’re learning alongside Home Assistant.

Most People Should Not Do This

I want to be explicit: for 95% of people who want home automation, Home Assistant OS on a Raspberry Pi is the right answer. It’s well-supported, the community is enormous, the Add-ons ecosystem is mature, and it works out of the box.

You should consider running HA on Kubernetes if:

You already have a Kubernetes cluster
You understand StatefulSets, PVCs, network policies, and Helm
You enjoy the operational side of things (or need to learn it for your career)
You want your home automation to be part of a larger platform, not a standalone appliance

If that’s not you, please, genuinely — use HA OS. It’s great.

The Real Reason

Let me be honest about the actual motivation.

The cluster is a learning platform. I’m a data architect who works on enterprise systems. Kubernetes, Cilium, PostgreSQL, Longhorn, Helm — these are technologies I encounter professionally and need to understand deeply. Running them on real hardware with real workloads teaches me things that documentation and tutorials can’t.

Home Assistant happens to be a perfect workload for this. It touches storage (PostgreSQL, config persistence), networking (mDNS, network policies, hostNetwork tradeoffs), security (IoT segmentation, Tailscale), and application lifecycle (Helm upgrades, rolling restarts). It’s a demanding tenant that exercises the full Kubernetes stack.

And it also runs my smart home. My daughter checks off her morning chores on a dashboard that replaced a $60/year DakBoard subscription. The thermostat adjusts based on presence detection. Camera feeds show up on the living room display. It’s genuinely useful — and it happens to run on the same platform that teaches me the skills I use at work.

Is it overkill? For home automation alone, absolutely. But it’s not alone. It’s one workload on a platform that exists for a broader purpose. And in that context, it’s exactly right.

The full architecture, design decisions, and tradeoff analysis are on the project page. If you want to see every hostNetwork tradeoff and CiliumNetworkPolicy rule, it’s all there.