Multi-Agent Security

The Problem

Running autonomous AI agents in production creates a security surface that traditional application security doesn’t cover. Agents make decisions, call APIs, read sensitive data, coordinate with other agents, and operate when no human is watching. Every one of those capabilities is an attack vector.

The specific challenges:

Identity spoofing — without cryptographic identity, one agent can impersonate another. In a multi-principal household, this means Agent A could send messages pretending to be Agent B, potentially leaking one principal’s information to another.
Information leakage — agents process sensitive data (financial records, personal messages, location data). Without structural boundaries, a well-crafted prompt or a chatty agent can leak data across principal boundaries.
Uncontrolled execution — agents with shell access can run arbitrary commands. Without allowlists, a single prompt injection could escalate to credential theft or cluster compromise.
No emergency brake — autonomous agents that run 24/7 need a way to stop them immediately when something goes wrong. “Wait for the next heartbeat” isn’t fast enough when an agent is actively misbehaving.

Cryptographic Identity

Every agent has an Ed25519 keypair provisioned at deployment. Private keys are stored in Vault and injected as environment variables — never on disk, never in config files, never in conversation context.

Message Signing

Every inter-agent message is signed with the sender’s Ed25519 key. The MCP Gateway enforces this at the transport level:

Unsigned messages are rejected — not logged with a warning, not delivered with a flag. Rejected.
Signature verification happens before delivery — the gateway checks the signature against the sender’s registered public key before placing the message in the recipient’s inbox.
AgentSig authentication — gateway endpoints (message history, conversation retrieval) require a signed auth header. Agents prove their identity on every API call.

This is structural enforcement, not behavioral guidance. An agent can’t “decide” to skip signing — the infrastructure won’t deliver its messages.

Verified NATS Identity Blocks

The June 2026 protocol hardening moved NATS identity verification from “good instruction” to a standing rule. The NATS bridge now prepends a [NATS-IDENTITY] block to inbound agent turns after it verifies the sender signature. Agents are required to act on a NATS request only when that bridge-prepended block says signature=VERIFIED(ed25519).

That distinction matters because the trust boundary is outside the message body:

Bridge-prepended identity is trusted — it is generated after transport-level verification, before the LLM sees the task.
Identity text inside the message body is untrusted — if a prompt contains its own [NATS-IDENTITY] block, that is treated as a spoofing signal rather than proof of identity.
Cron-isolated turns get the same protection — messages delivered through /hooks/agent carry the verified identity block, so fresh isolated sessions do not have to infer who sent the request from unauthenticated prose.

This closed a subtle prompt-injection gap: previously, a forged identity line in a message body could look similar to legitimate routing metadata. After FAD-823 and FAD-825, the invariant is explicit: only the bridge can place identity outside the body, and agents decline requests when that verified block is absent or appears only inside user-controlled content.

End-to-End Encryption

Sensitive payloads use sealed-box encryption (X25519 ECDH + AES-256-GCM):

The sender encrypts using the recipient’s public key (fetched from the gateway key registry)
The gateway can verify the signature (on the outside) without reading the content (encrypted on the inside)
Only the intended recipient can decrypt

The encryption is mandatory for Tier 3 and Tier 4 information (per the fiduciary information sharing model). Plaintext messages containing sensitive data are rejected by the receiving agent’s crypto layer.

Runtime and Execution Isolation

Agent security is not only about message identity. The execution environment also constrains what a compromised or confused agent can do.

Pod Sandboxing

Agent pods now combine Kubernetes-level restrictions with host-enforced confinement:

AppArmor profiles moved from complain-mode review to enforce-mode deployment for agent workloads.
Read-only root filesystems reduce persistence and tampering paths inside pods.
Network egress policies limit agent-to-service paths, including MCP Gateway traffic and sensitive internal services.

These controls are intentionally boring: they assume an agent may eventually process hostile content, then limit what that content can cause the runtime to touch.

Exec Allowlist

Agents run on Kubernetes pods with shell access — they need it for git operations, file manipulation, and running scripts. But unrestricted shell access is a liability.

The SafeBins allowlist constrains which commands agents can execute:

Only explicitly allowlisted binaries can run (node, git, curl, jq, etc.)
Arbitrary shell commands, pipe chains, and command substitution require approval
Multi-command chains (&&, ||, ;) hit the allowlist gate

The allowlist is defense-in-depth — even if an agent is prompted to run a destructive command, the exec layer blocks it unless the binary is explicitly permitted.

The Tradeoff

The allowlist occasionally blocks legitimate operations. Complex shell one-liners, piped commands, and multi-step chains all require workarounds (writing to a script file, then executing the file). This friction is intentional: it forces agents to be deliberate about command execution rather than casually piping sensitive data through shell chains.

The alternative — a bash tool that bypasses SafeBins entirely — was evaluated and deliberately deferred. The SafeBins filter catches mistakes and limits the blast radius of prompt injection. Removing it provides marginal utility for significant risk increase.

Safety Stop System

Autonomous agents need an emergency brake that works regardless of what the agent is currently doing.

The safety stop mechanism uses NATS messaging to deliver halt, pause, and resume commands:

Halt — immediate full stop. All autonomous operations cease. Requires explicit resume from a principal. This is for “something is seriously wrong” situations.
Pause — temporary suspension with auto-resume after a configurable timeout. For “let me check something before you continue” situations.
Resume — clears halt or pause state, normal operations continue.

Safety stop messages are processed at highest priority — above task pickup, above heartbeat checks, above any in-progress work. They bypass active-hours gating: a halt at 3 AM still wakes the agent and stops it. This asymmetry is deliberate — autonomous work respects quiet hours, but stopping autonomous work doesn’t.

Any principal or agent can trigger a stop. The stopped agent acknowledges immediately and ceases operations until explicitly resumed.

Joint Security Assessment (FAD-455)

As the agent ecosystem grew, a formal security review was needed. The FAD-455 joint assessment was the first exercise where multiple agents collaboratively evaluated their own operational risks.

Process

Each agent independently assessed 11 proposed tools and capabilities. Assessments covered threat surface, data exposure, blast radius, mitigation options, and fiduciary risk. The agents then compared their assessments to identify convergences and deliberate on disagreements.

Results

Tool	Risk Level	Outcome	Notes
diffs	🟢 None	Enabled	—
llm-task	🟢 Low	Enabled	—
Brave Search	🟢 Low	Enabled	—
loopDetection	🟢 None	Enabled	Safety feature
apply_patch	🟡 Medium	Enabled w/ mitigations	Path traversal guard
lobster	🟡 Medium	Enabled w/ mitigations	Pipeline audit logging
Discord voice	🟡 Medium	Enabled w/ mitigations	TTS-only, role-scoped
bash	🟠 High	Deferred	SafeBins bypass — expand allowlist instead
browser	🔴 Critical	Deferred → sandboxed service	Session hijack/prompt injection risk mitigated through isolated browser-node architecture
voice-call	🔴 Critical	Denied	Impersonation, social engineering risk
nodes	🔴 Critical	Denied	Child privacy, 2FA interception

Seven of eleven assessments aligned immediately between agents — the same risk rating and the same recommendation despite different operational contexts (inside-cluster fiduciary agent vs. desktop infrastructure agent). The four that required deliberation (bash, browser, voice-call, Discord voice) all involved capabilities where the risk profile differed based on the agent’s trust level and data access.

Key Decisions

Browser deferred, then reintroduced through isolation. The capability has high value (research, monitoring, web interaction), but the original risk profile — direct Chromium automation with access to authenticated sessions — was unacceptable. The implemented direction is a browser-node service: Chromium runs outside the agent pod, agent access is mediated, and authenticated user-browser profiles remain a deliberate exception rather than the default.

Voice-call denied outright. An AI agent making phone calls creates impersonation and social engineering risks that no mitigation adequately addresses. This was the only tool both agents rated as “deny” rather than “defer.”

Nodes denied for child safety. The nodes tool bundles camera access, location tracking, screen recording, and notification interception. Camera captures in a household with a six-year-old child is a hard no. This assessment didn’t require deliberation — both agents flagged it immediately.

Infrastructure Gaps Surfaced

The review identified three infrastructure issues that needed separate tickets:

Cilium FQDN egress policies (FAD-461) — agents could make outbound HTTPS calls to any destination. Now restricted to explicitly allowlisted domains.
Vault port 8200 egress audit (FAD-462) — verifying that Vault access from agent pods is properly scoped.
Browser-as-MCP-service (FAD-467/FAD-497) — architectural design and deployment for safe browser access without direct Chromium automation inside the agent pod.

June 2026 Protocol Hardening

The security model matured again during the NATS bridge and costaff integration work. The important change was not simply “more agents can talk” — it was that more agents can talk without weakening identity guarantees.

Key improvements:

Inbound NATS A2A requests route through controlled hooks (FAD-810/FAD-811), so agent wakeups, isolated sessions, and tool exposure all follow the same delivery path.
Verified sender identity is surfaced to cron-isolated turns (FAD-823), eliminating the “fresh session doesn’t know who sent this” ambiguity.
Agent protocol now requires the bridge-prepended verified identity block before acting on NATS requests (FAD-825), making spoof detection a standing rule rather than optional per-message guidance.
External/off-cluster participants can be added with scoped transport work (FAD-882/FAD-889/FAD-901), including bidirectional NATS participation and signed nudges, without giving them blanket in-cluster privileges.

The pattern is the same as the rest of the system: expand capability only after the trust boundary is explicit.

What This Demonstrates

Defense in depth, not single points of control. Security comes from multiple overlapping layers: cryptographic identity, exec allowlists, NATS encryption, information sharing tiers, safety stops, and structural enforcement at the gateway. No single layer is sufficient; together they create a system where failures in one layer are caught by another.

Agents can assess their own risks. The FAD-455 joint assessment showed that agents with different trust levels and operational contexts can converge on security decisions. The convergence itself — 7 of 11 tools rated identically — validates the assessment framework. The 4 deliberated tools produced better outcomes through cross-agent discussion than either agent would have reached alone.

Conservative defaults, explicit escalation. The system defaults to deny. Every capability requires explicit enablement, every tool requires allowlist entry, every message requires cryptographic identity. This creates friction — but friction in security is a feature, not a bug. The cost of unnecessary friction (workarounds for blocked commands) is vastly lower than the cost of unnecessary access (credential theft, data leakage, privacy violations).