Skip to content

Home Automation on Kubernetes

Home automation gets treated as a hobby project. Flash an SD card, install Home Assistant OS, add devices, hope it doesn’t break. That’s fine until you’re relying on it — until your thermostat automation failing means a $400 energy bill, or your camera feed going dark means you don’t see someone at the door.

I already run a 4-node Kubernetes cluster on Orange Pi 5 SBCs. Rather than dedicate separate hardware to home automation, I run Home Assistant on the same cluster with the same operational discipline I’d apply to any production workload: PostgreSQL instead of SQLite, Longhorn-replicated storage, Cilium network policies for IoT segmentation, and Tailscale for authenticated remote access. No ports exposed to the internet. No single points of failure for storage.

graph TB
subgraph tailscale["Tailscale Mesh — Authenticated Remote Access"]
subgraph cluster["K8s Cluster · v1.28.2 · 4x Orange Pi 5"]
subgraph ha_ns["Namespace: home-assistant"]
ha["Home Assistant 2026.2\nhostNetwork: true\npajikos Helm v0.3.43"]
pg["PostgreSQL 16 (alpine)\nStatefulSet · Longhorn PVC 10Gi"]
ha -->|"recorder:\n db_url: postgresql://..."| pg
end
subgraph net["Cilium CNI (eBPF)"]
cnp["CiliumNetworkPolicy\nL3-L7 segmentation"]
end
subgraph storage["Longhorn v1.10.1"]
ha_pvc["HA Config PVC\n2x replication"]
pg_pvc["PostgreSQL PVC\n2x replication"]
snap["Recurring Snapshots\nEvery 6 hours"]
end
end
phone["iOS Companion App\nPresence Detection"]
pi4["Raspberry Pi 4\nKiosk Dashboard"]
end
phone -->|"GPS zones, WiFi SSID"| ha
pi4 -->|"Chromium kiosk\nha.example.com"| ha
ha --> nest["Google Nest SDM API\nThermostat · 3 Cameras · Doorbell"]
ha --> sonos["Sonos Soundbar\nLocal control · No cloud"]
cnp -.->|"enforces"| ha_ns
ha_pvc & pg_pvc --> snap

The honest answer: I already had the cluster. But there are real advantages beyond convenience.

Storage resilience. Longhorn replicates every volume across two nodes. When I take a node offline for maintenance — kernel updates, thermal paste, whatever — Home Assistant’s config and database remain available. On a standalone Pi, pulling the power means pulling the only copy of your data.

Rolling updates. Updating Home Assistant from 2026.1 to 2026.2 is a Helm value change: update the image tag, helm upgrade, and the StatefulSet performs a rolling restart. If the new version breaks something, helm rollback takes me to the previous state in seconds. On HA OS, a bad update means restoring from backup.

Resource sharing. The Orange Pi 5’s RK3588S (8 cores, 16 GB RAM) is massively overpowered for Home Assistant alone. Running HA alongside my AI agent platform, graph databases, and tool servers means the hardware is actually utilized. The full HA stack — Core plus PostgreSQL — uses roughly 310m CPU and 1 GiB memory, taking total cluster utilization from 13% to 14.3% CPU.

Unified operations. One set of tools for monitoring, logging, and debugging. kubectl logs, kubectl describe, Longhorn dashboard — the same workflow I use for every other workload.

This was non-negotiable. Home Assistant defaults to SQLite for its recorder database, which stores all entity state history. SQLite on local storage is fine. SQLite on network-attached storage — which is what Longhorn provides via iSCSI — causes WAL (Write-Ahead Logging) locking issues under concurrent access. I’ve seen the “database is locked” errors in enough forum posts to know this isn’t theoretical.

PostgreSQL eliminates the problem entirely. It handles concurrent writes natively, performs better under load, and is a first-class citizen on Kubernetes with decades of operational knowledge behind it.

I chose a plain postgres:16-alpine StatefulSet over more complex options:

  • Not Bitnami. Broadcom changed Bitnami’s licensing in August 2025 — free images are no longer available. I’m actively migrating off Bitnami dependencies elsewhere in the cluster (Redis → Valkey).
  • Not CloudNativePG. It’s a solid operator, but running a Kubernetes operator for a single PostgreSQL instance is like hiring a building superintendent for a studio apartment. A StatefulSet with a Longhorn PVC and a CronJob for pg_dump covers my needs.

The HA recorder config is straightforward:

recorder:
db_url: postgresql://homeassistant:${PASSWORD}@postgresql.home-assistant.svc.cluster.local/homeassistant
purge_keep_days: 30
commit_interval: 5
exclude:
domains:
- automation
- script
- scene

Thirty days of history, 5-second commit interval, noisy domains excluded to keep the database manageable. The PostgreSQL PVC gets its own 10Gi Longhorn volume with 2x replication.

Home Assistant discovers devices on the local network via mDNS/Bonjour and SSDP. Standard Kubernetes pod networking isolates pods from the LAN broadcast domain — which is exactly the wrong behavior for home automation.

The solution most K8s HA deployments use is hostNetwork: true, which puts the pod directly on the node’s network stack. Combined with dnsPolicy: ClusterFirstWithHostNet (so Kubernetes DNS still works), HA can see every device on the LAN while still resolving cluster-internal service names.

hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet

I evaluated Multus CNI (dual-homed pods with both overlay and LAN interfaces) and Avahi reflectors (mDNS bridging between pod and host networks). Both add complexity without proportional benefit for a homelab. The pragmatic choice is hostNetwork, with the security tradeoff explicitly acknowledged and mitigated through other layers.

Here’s the tension: hostNetwork: true bypasses Cilium’s NetworkPolicy enforcement for the HA pod. The pod is on the host’s network stack, not the CNI overlay, so CiliumNetworkPolicy rules that reference pod selectors or namespace labels don’t apply.

This is an accepted tradeoff, not an ignored one. Mitigation:

  • Tailscale is the only external access path. No ports are exposed to the public internet. HA is accessible only from devices on the tailnet, authenticated by Tailscale’s identity layer.
  • HA’s own auth. Home Assistant has its own user authentication with MFA support.
  • IoT segmentation happens at the network level. CiliumNetworkPolicy still governs all other pods in the home-assistant namespace (PostgreSQL, future MQTT broker, future Zigbee2MQTT). The HA pod itself communicates outbound to the Nest SDM API and the Sonos devices on the LAN — both of which require LAN access by nature.
  • Monitoring. The cybersecurity agent runs Trivy k8s config scans that flag hostNetwork usage, keeping the tradeoff visible in security posture reports.

Even with HA on hostNetwork, the rest of the home automation stack benefits from Cilium’s L3-L7 policy enforcement. As I add MQTT brokers, Zigbee gateways, and other IoT infrastructure in Phase 2, each component gets explicit ingress/egress rules:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: postgresql-policy
namespace: home-assistant
spec:
endpointSelector:
matchLabels:
app: postgresql
ingress:
- fromEndpoints:
- matchLabels:
app: home-assistant
toPorts:
- ports:
- port: "5432"
protocol: TCP
egress:
- toEntities:
- kube-apiserver

PostgreSQL accepts connections only from the Home Assistant pod, on port 5432, TCP only. No other pod in the cluster can reach it. When Mosquitto and Zigbee2MQTT arrive, they’ll get similarly scoped policies — Mosquitto accepts MQTT traffic (port 1883) only from HA and Zigbee2MQTT, Zigbee2MQTT accepts management traffic only from HA.

L7 policies matter for IoT because smart devices are notoriously chatty and occasionally compromised. A device that should only speak MQTT shouldn’t be able to reach a PostgreSQL port. Cilium enforces this at the kernel level via eBPF, with minimal performance overhead on the resource-constrained nodes.

No ingress controller, no TLS certificate management, no ports exposed to the public internet. Home Assistant is accessible at ha.example.com via Tailscale’s DNS, which resolves only within the tailnet. Authentication happens at the WireGuard tunnel level before HA’s web UI is ever reachable.

This is a deliberate security posture. Home automation systems are high-value targets — they control physical devices, have LAN access to IoT networks, and often run with elevated privileges. Exposing HA to the internet, even behind reverse proxy authentication, increases the attack surface for no benefit. Tailscale gives me access from my phone, laptop, or any device on the tailnet, from anywhere, with zero public exposure.

Phase 1 has no USB device constraint — my current devices (Nest thermostat, Google cameras, doorbell, Sonos) are all WiFi/cloud or local network devices. No Zigbee stick means no node affinity requirement. The HA pod can schedule on any node, and Longhorn handles storage replication transparently.

When I add a Zigbee coordinator in Phase 2, I’ll use a network-based coordinator (SLZB-06, ~$35) that connects via Ethernet rather than USB. This eliminates the USB passthrough problem entirely — no privileged containers, no hostPath device mounts, no node pinning. Zigbee2MQTT connects to the coordinator via TCP (tcp://192.168.1.50:6638), making it fully portable across K8s nodes.

We had a Raspberry Pi 4 in the living room running DakBoard — a cloud-hosted dashboard service showing weather, calendar, and our daughter’s daily chore checklists. $5/month, $60/year. It worked, but it was limited: no device control, no camera feeds, no real-time sensor data, and we were paying a subscription for what’s essentially a web page on a screen we already own.

Replacing it with a Home Assistant Lovelace dashboard was one of the most satisfying parts of this project. The Pi 4 now runs Chromium in kiosk mode pointed at a dedicated HA dashboard view:

/etc/xdg/lxsession/LXDE-pi/autostart
@xset s off
@xset -dpms
@xset s noblank
@chromium-browser --noerrdialogs --disable-infobars --kiosk https://ha.example.com/lovelace/livingroom

The layout mirrors what DakBoard provided, but adds capabilities DakBoard never could:

ElementImplementationDakBoard Could Do This?
Clock + weather forecastclock-weather-card (HACS)Yes
Week calendar (horizontal scroll)atomic-calendar-revive (HACS) + Google Calendar integrationYes
Our daughter’s chore checklistsHA To-Do Lists + Mushroom cards — Wakeup (7 items) + Bedtime (6 items)Yes
Daily dad jokeREST sensor hitting icanhazdadjoke.com + Markdown cardYes
School traffic / commute timegoogle_travel_time integrationYes
Thermostat controlNest climate card — tap to adjustNo
Camera feedsNest SDM live streams — porch, backyard, doorbellNo
Sonos controlsMedia player card — play/pause/volumeNo
Presence indicatorsPerson cards — who’s home, who’s awayNo

Our daughter’s chore lists are worth calling out. The DakBoard version was static — just a list of items we had to update via a cloud portal. The HA version uses native To-Do lists that she can check off by tapping the screen, and they auto-reset on schedule. Her wakeup routine (eat breakfast, bathroom routine, get dressed, brush hair, make bed, hug mom/dad) and bedtime routine (allergy meds, pajamas, brush hair, hug mom/dad) are interactive instead of decorative.

The lovelace-wallpanel HACS integration rotates scenic background images, matching the DakBoard aesthetic. Dark theme, auto-dimming based on time of day. It looks better than what we were paying for.

Savings: $60/year, immediately. The Pi 4 was already owned hardware.

The HA Companion App on my phone reports GPS location, WiFi SSID, and activity type to Home Assistant. HA maps these to zones — home, work, school, grocery stores — and exposes them as person.father and person.mother entities.

This is where home automation intersects with my AI agent platform. Presence data flows from HA to my agent via webhooks:

automation:
- alias: "Presence update to agent"
trigger:
- platform: state
entity_id: person.father
action:
- service: rest_command.agent_presence
data:
person: "spencer"
zone: "{{ states('person.father') }}"

The agent uses presence context to adjust its behavior: suppress non-urgent alerts when I’m driving, surface the grocery list when I’m at the store, adjust communication style based on whether I’m at work or home. Location data stays entirely local — HA runs on my cluster, not in the cloud, and the agent gets zone names (“home,” “work”), not raw GPS coordinates.

ComponentImage / ChartStorageResources
Home Assistantpajikos Helm v0.3.43, HA 2026.2.110Gi Longhorn PVC (2x repl)250m/512Mi req, 2000m/2Gi limit
PostgreSQLpostgres:16-alpine StatefulSet10Gi Longhorn PVC (2x repl)100m/256Mi req
Longhorn snapshotsRecurring JobEvery 6 hours, retain 10
Longhorn backupsRecurring JobS3-compatible targetDaily, retain 30

Namespace: home-assistant Helm chart: pajikos/home-assistant — auto-updated with new HA releases, low issue count (2 open as of Feb 2026), supports StatefulSet by default with configurable persistence, init containers (for HACS installation), and templated configuration.yaml.

Phase 1 is intentionally minimal: prove the platform with WiFi/cloud devices (Nest, Sonos), then expand.

AdditionWhat It EnablesKey Decision
Zigbee2MQTT + MosquittoZigbee device support (sensors, switches, lights)Network coordinator (SLZB-06) over USB — eliminates node pinning
Matter ServerMatter/Thread device supporthostNetwork: true for IPv6 multicast
ESPHomeCustom ESP32/ESP8266 sensorshostNetwork: true for mDNS OTA
FrigateLocal camera AI (person/vehicle detection)Orange Pi 5’s RK3588 has 6 TOPS NPU — explore for inference

Each addition is a separate Kubernetes Deployment with its own PVC, resource limits, and CiliumNetworkPolicy. The HA “Apps” store doesn’t exist in Container mode — every add-on runs as a standalone pod. For someone already running Kubernetes, this is arguably a feature: each component has its own lifecycle, resource bounds, and security policy.

Production operations on constrained hardware. Not “it works on my Pi” — PostgreSQL with proper replication, automated snapshots, network policies, and rolling updates. The kind of operational discipline that transfers directly to cloud or enterprise Kubernetes.

Security-first IoT design. IoT devices are high-risk by nature. Running them behind Tailscale (no public internet exposure), with CiliumNetworkPolicy segmentation (each component scoped to minimum required connectivity), and on a cluster with automated security scanning is a fundamentally different posture than plugging a smart hub into your router and hoping for the best.

Practical tradeoff documentation. Every design decision has an explicit tradeoff. hostNetwork for mDNS breaks NetworkPolicy enforcement — acknowledged, mitigated, monitored. SQLite on Longhorn causes locking — replaced with PostgreSQL from day one. Container mode lacks the Apps store — treated as a feature for K8s-native deployment. The value isn’t in making perfect decisions; it’s in making informed ones and documenting why.