Multi-Agent Security
The Problem
Section titled “The Problem”Running autonomous AI agents in production creates a security surface that traditional application security doesn’t cover. Agents make decisions, call APIs, read sensitive data, coordinate with other agents, and operate when no human is watching. Every one of those capabilities is an attack vector.
The specific challenges:
- Identity spoofing — without cryptographic identity, one agent can impersonate another. In a multi-principal household, this means Agent A could send messages pretending to be Agent B, potentially leaking one principal’s information to another.
- Information leakage — agents process sensitive data (financial records, personal messages, location data). Without structural boundaries, a well-crafted prompt or a chatty agent can leak data across principal boundaries.
- Uncontrolled execution — agents with shell access can run arbitrary commands. Without allowlists, a single prompt injection could escalate to credential theft or cluster compromise.
- No emergency brake — autonomous agents that run 24/7 need a way to stop them immediately when something goes wrong. “Wait for the next heartbeat” isn’t fast enough when an agent is actively misbehaving.
Cryptographic Identity
Section titled “Cryptographic Identity”Every agent has an Ed25519 keypair provisioned at deployment. Private keys are stored in Vault and injected as environment variables — never on disk, never in config files, never in conversation context.
Message Signing
Section titled “Message Signing”Every inter-agent message is signed with the sender’s Ed25519 key. The MCP Gateway enforces this at the transport level:
- Unsigned messages are rejected — not logged with a warning, not delivered with a flag. Rejected.
- Signature verification happens before delivery — the gateway checks the signature against the sender’s registered public key before placing the message in the recipient’s inbox.
- AgentSig authentication — gateway endpoints (message history, conversation retrieval) require a signed auth header. Agents prove their identity on every API call.
This is structural enforcement, not behavioral guidance. An agent can’t “decide” to skip signing — the infrastructure won’t deliver its messages.
End-to-End Encryption
Section titled “End-to-End Encryption”Sensitive payloads use sealed-box encryption (X25519 ECDH + AES-256-GCM):
- The sender encrypts using the recipient’s public key (fetched from the gateway key registry)
- The gateway can verify the signature (on the outside) without reading the content (encrypted on the inside)
- Only the intended recipient can decrypt
The encryption is mandatory for Tier 3 and Tier 4 information (per the fiduciary information sharing model). Plaintext messages containing sensitive data are rejected by the receiving agent’s crypto layer.
Exec Allowlist
Section titled “Exec Allowlist”Agents run on Kubernetes pods with shell access — they need it for git operations, file manipulation, and running scripts. But unrestricted shell access is a liability.
The SafeBins allowlist constrains which commands agents can execute:
- Only explicitly allowlisted binaries can run (node, git, curl, jq, etc.)
- Arbitrary shell commands, pipe chains, and command substitution require approval
- Multi-command chains (
&&,||,;) hit the allowlist gate
The allowlist is defense-in-depth — even if an agent is prompted to run a destructive command, the exec layer blocks it unless the binary is explicitly permitted.
The Tradeoff
Section titled “The Tradeoff”The allowlist occasionally blocks legitimate operations. Complex shell one-liners, piped commands, and multi-step chains all require workarounds (writing to a script file, then executing the file). This friction is intentional: it forces agents to be deliberate about command execution rather than casually piping sensitive data through shell chains.
The alternative — a bash tool that bypasses SafeBins entirely — was evaluated and deliberately deferred. The SafeBins filter catches mistakes and limits the blast radius of prompt injection. Removing it provides marginal utility for significant risk increase.
Safety Stop System
Section titled “Safety Stop System”Autonomous agents need an emergency brake that works regardless of what the agent is currently doing.
The safety stop mechanism uses NATS messaging to deliver halt, pause, and resume commands:
- Halt — immediate full stop. All autonomous operations cease. Requires explicit resume from a principal. This is for “something is seriously wrong” situations.
- Pause — temporary suspension with auto-resume after a configurable timeout. For “let me check something before you continue” situations.
- Resume — clears halt or pause state, normal operations continue.
Safety stop messages are processed at highest priority — above task pickup, above heartbeat checks, above any in-progress work. They bypass active-hours gating: a halt at 3 AM still wakes the agent and stops it. This asymmetry is deliberate — autonomous work respects quiet hours, but stopping autonomous work doesn’t.
Any principal or agent can trigger a stop. The stopped agent acknowledges immediately and ceases operations until explicitly resumed.
Joint Security Assessment (FAD-455)
Section titled “Joint Security Assessment (FAD-455)”As the agent ecosystem grew, a formal security review was needed. The FAD-455 joint assessment was the first exercise where multiple agents collaboratively evaluated their own operational risks.
Process
Section titled “Process”Each agent independently assessed 11 proposed tools and capabilities. Assessments covered threat surface, data exposure, blast radius, mitigation options, and fiduciary risk. The agents then compared their assessments to identify convergences and deliberate on disagreements.
Results
Section titled “Results”| Tool | Risk Level | Outcome | Notes |
|---|---|---|---|
| diffs | 🟢 None | Enabled | — |
| llm-task | 🟢 Low | Enabled | — |
| Brave Search | 🟢 Low | Enabled | — |
| loopDetection | 🟢 None | Enabled | Safety feature |
| apply_patch | 🟡 Medium | Enabled w/ mitigations | Path traversal guard |
| lobster | 🟡 Medium | Enabled w/ mitigations | Pipeline audit logging |
| Discord voice | 🟡 Medium | Enabled w/ mitigations | TTS-only, role-scoped |
| bash | 🟠 High | Deferred | SafeBins bypass — expand allowlist instead |
| browser | 🔴 Critical | Deferred | Session hijack, prompt injection (FAD-467 MCP architecture) |
| voice-call | 🔴 Critical | Denied | Impersonation, social engineering risk |
| nodes | 🔴 Critical | Denied | Child privacy, 2FA interception |
Seven of eleven assessments aligned immediately between agents — the same risk rating and the same recommendation despite different operational contexts (inside-cluster fiduciary agent vs. desktop infrastructure agent). The four that required deliberation (bash, browser, voice-call, Discord voice) all involved capabilities where the risk profile differed based on the agent’s trust level and data access.
Key Decisions
Section titled “Key Decisions”Browser deferred, not denied. The capability has high value (research, monitoring, web interaction) but the risk profile — full Chromium automation with access to authenticated sessions — is unacceptable without isolation. The solution: a browser-as-MCP-service architecture (FAD-467) where the browser runs as a separate, sandboxed service that agents access through the MCP Gateway, never through direct Chromium automation.
Voice-call denied outright. An AI agent making phone calls creates impersonation and social engineering risks that no mitigation adequately addresses. This was the only tool both agents rated as “deny” rather than “defer.”
Nodes denied for child safety. The nodes tool bundles camera access, location tracking, screen recording, and notification interception. Camera captures in a household with a six-year-old child is a hard no. This assessment didn’t require deliberation — both agents flagged it immediately.
Infrastructure Gaps Surfaced
Section titled “Infrastructure Gaps Surfaced”The review identified three infrastructure issues that needed separate tickets:
- Cilium FQDN egress policies (FAD-461) — agents could make outbound HTTPS calls to any destination. Now restricted to explicitly allowlisted domains.
- Vault port 8200 egress audit (FAD-462) — verifying that Vault access from agent pods is properly scoped.
- Browser-as-MCP-service (FAD-467) — architectural design for safe browser access without direct Chromium automation.
What This Demonstrates
Section titled “What This Demonstrates”Defense in depth, not single points of control. Security comes from multiple overlapping layers: cryptographic identity, exec allowlists, NATS encryption, information sharing tiers, safety stops, and structural enforcement at the gateway. No single layer is sufficient; together they create a system where failures in one layer are caught by another.
Agents can assess their own risks. The FAD-455 joint assessment showed that agents with different trust levels and operational contexts can converge on security decisions. The convergence itself — 7 of 11 tools rated identically — validates the assessment framework. The 4 deliberated tools produced better outcomes through cross-agent discussion than either agent would have reached alone.
Conservative defaults, explicit escalation. The system defaults to deny. Every capability requires explicit enablement, every tool requires allowlist entry, every message requires cryptographic identity. This creates friction — but friction in security is a feature, not a bug. The cost of unnecessary friction (workarounds for blocked commands) is vastly lower than the cost of unnecessary access (credential theft, data leakage, privacy violations).