Inter-Agent Communication Protocol
The Problem
Section titled “The Problem”When you have multiple AI agents operating autonomously — each serving a different principal, each with their own workspace and context — they need to talk to each other. Not just chat, but exchange structured requests, share information within defined boundaries, and maintain audit trails of every interaction.
The naive approaches don’t work:
- Shared Discord channel — agents posting messages for each other to read. No guaranteed delivery, no threading, no encryption, and humans have to scroll past machine-to-machine protocol chatter. The
#agent-coordinationchannel became a dumping ground of JSON-RPC messages that nobody wanted to read. - Direct API calls — agents calling each other’s endpoints. Tight coupling, no offline support, and no audit trail unless you build one yourself.
- Shared database — agents reading/writing to common tables. Violates workspace isolation, creates concurrency issues, and makes information boundaries nearly impossible to enforce.
What was needed: asynchronous, encrypted, auditable messaging with identity verification and conversation threading.
Architecture: NATS + JetStream
Section titled “Architecture: NATS + JetStream”The communication layer is built on NATS with JetStream for persistent message delivery. NATS provides the pub/sub transport; JetStream adds durability, replay, and consumer management.
Why NATS
Section titled “Why NATS”- Lightweight — single binary, runs on ARM (fits the home K8s cluster’s Orange Pi 5 nodes)
- JetStream persistence — messages survive agent restarts, pod evictions, and network partitions
- Subject-based routing —
agents.fiducian-spencer-001.inboxdelivers to exactly one agent - At-least-once delivery — JetStream consumers with explicit ack ensure no silent message loss
Message Format: JSON-RPC 2.0
Section titled “Message Format: JSON-RPC 2.0”Every substantive inter-agent message uses JSON-RPC 2.0 with six mandatory identity headers:
{ "jsonrpc": "2.0", "method": "agent.request", "params": { "headers": { "agent-id": "fiducian-spencer-001", "principal-id": "spencer", "timestamp": "2026-03-15T14:30:00Z", "message-type": "request", "trust-layer-version": "1.0.0", "skill-layers-loaded": [0, 1, 2, 3] }, "body": { "topic": "sprint-planning", "content": "Requesting your contribution items for the week of March 17..." } }, "id": "msg_a1b2c3d4"}The skill-layers-loaded header is critical: it declares what behavioral commitments the sender has made. An agent advertising [0, 1, 2, 3] has committed to the full fiduciary stack (trust, protocol, fiduciary core, coordination). An agent advertising only [1] follows the communication protocol but hasn’t made fiduciary commitments. Receiving agents calibrate their information sharing accordingly.
Cryptographic Identity
Section titled “Cryptographic Identity”Every message is signed with Ed25519. The signing layer provides:
- Sender verification — detached signatures over canonical JSON, verified against the gateway key registry
- End-to-end encryption — sealed boxes (X25519 ECDH + AES-256-GCM) for sensitive payloads. The transport can verify who sent a message without seeing what it contains.
- Replay protection — timestamp-based freshness checks reject stale messages
- AgentSig authentication — gateway endpoints require signed auth headers for message retrieval
The crypto is enforced at the gateway level: unsigned messages are rejected, and the gateway verifies signatures before delivery. This isn’t optional security — it’s structural.
Conversation Threading
Section titled “Conversation Threading”Early NATS messaging was fire-and-forget: individual messages with no threading, no correlation, no way to track a multi-turn exchange between agents. When Agent A asked Agent B a question and Agent B replied hours later, there was no structural connection between the request and response.
Conversation threading (FAD-280 through FAD-284) added a dedicated JetStream stream (CONVERSATIONS) with three linking fields:
convo_ref_id— groups all messages in a conversation (e.g.,convo_fad454_solicitation_pipeline)chain_message_id— links each message to the one it replies to, forming a causal chainin_reply_to— correlation ID back to the original request
Conversations are encrypted at rest with AES-256-GCM. The gateway provides conversation history endpoints that return threaded message chains rather than flat inbox lists.
Operational reality: Push delivery for conversations is still unreliable — messages occasionally arrive at the gateway (HTTP 200) but fail to wake the target agent’s session. The workaround is polling: agents query their conversation history via the gateway’s JetStream consumer rather than relying on push notifications. This is ugly but reliable.
NATS Delivery Improvements
Section titled “NATS Delivery Improvements”The initial NATS bridge had a persistent delivery problem: messages would be accepted by the gateway but silently fail to reach the target agent. Seven messages from two different agents were dropped in a single day (March 11) despite the bridge reporting successful delivery.
The root cause investigation (FAD-399 through FAD-403) revealed multiple issues:
- Stale pull consumers — JetStream consumers that lost their connection but weren’t cleaned up, creating “black holes” that accepted messages but never delivered them
- Missing wake integration — the bridge delivered messages to the agent’s inbox but didn’t trigger a session wake, so messages sat until the next heartbeat
- No delivery confirmation — the bridge returned success based on JetStream ack, not actual agent delivery
The fix (FAD-403, NATS bridge plugin v1.4.0) added structured logging via stdout, wake status in HTTP responses, and a reliable delivery pattern. But push delivery remains imperfect — the pragmatic approach is to treat NATS as a durable mailbox and poll for important messages rather than relying on instant push.
Automated Agent Coordination
Section titled “Automated Agent Coordination”Three Lobster pipelines automate what used to be manual inter-agent coordination:
NATS Request-Response Pipeline
Section titled “NATS Request-Response Pipeline”The nats-request-response pipeline handles batch inter-agent messaging with polling. It:
- Sends structured requests to multiple agents via NATS
- Polls JetStream conversation history at configurable intervals
- Collects responses with timeout handling (agents that don’t respond within the window are noted, not blocked on)
- Returns collected responses as structured data for the next pipeline step
This replaced manual “send NATS message, wait, check inbox, hope it arrived” flows.
Inter-Agent Solicitation Pipeline
Section titled “Inter-Agent Solicitation Pipeline”The agent-solicitation pipeline (FAD-454) automates sprint planning input collection:
- Sends solicitation requests to each agent via NATS, asking for sprint contributions
- Each agent responds with their proposed work items, blockers, and capacity
- Pipeline polls for responses using the request-response sub-pipeline
- An LLM merge step synthesizes all agent responses into a unified sprint plan
- Output is posted to Confluence as a draft for principal review
The solicitation pipeline turned a 30+ tool-call manual process (send messages, wait, poll, read, format, merge, post) into a single pipeline invocation.
Send-Wait-Poll Verification
Section titled “Send-Wait-Poll Verification”The nats-send-wait-poll pattern (FAD-484) addresses the push delivery unreliability. Instead of sending a message and hoping it arrives:
- Send the message via NATS
- Wait a configurable interval (default: 30 seconds)
- Poll the conversation history to verify the message appears
- If missing, retry with backoff
- After max retries, flag as delivery failure
This is a pragmatic workaround for the underlying push delivery issue — it doesn’t fix the root cause, but it ensures important messages don’t silently disappear.
Dual-Mode Transparency
Section titled “Dual-Mode Transparency”Inter-agent communication runs on two channels simultaneously:
- NATS — the machine-to-machine channel. Structured, encrypted, auditable. This is where the real coordination happens.
- Discord
#agent-coordination— the human-visibility channel. Agents post summaries of NATS exchanges here so principals can see what their agents are doing without parsing JSON-RPC messages.
The Discord channel is read-only from the agents’ perspective — they post to it but don’t monitor it for incoming messages. It’s a transparency window, not a communication channel. This separation ensures that human-readable summaries don’t get mixed up with machine-readable protocol messages.
What This Demonstrates
Section titled “What This Demonstrates”Asynchronous coordination is harder than synchronous. When agents can’t guarantee the other party is online, every interaction needs durability, threading, and timeout handling. The progression from fire-and-forget messages to conversation-threaded exchanges with polling-based verification reflects the real complexity of distributed agent communication.
Crypto must be structural, not optional. Making signing optional means it gets skipped when it’s inconvenient. Making it mandatory at the gateway level means every message is verified, every time, regardless of what the sending agent “intended” to do.
Pragmatism over purity. Push delivery should work. It doesn’t, reliably. Instead of waiting for a perfect fix, the system uses polling with verification — ugly, more expensive in API calls, but reliable. Production systems need solutions that work today, not architectures that will work someday.