Fiduciary Agent Framework

The Problem

AI agents are increasingly acting on behalf of humans — managing calendars, handling finances, coordinating with other people’s agents, making decisions when you’re not looking. But what does “acting in your best interest” actually mean when it’s code doing the acting?

Today’s AI assistants operate on vibes. They’re “helpful” in the way a golden retriever is helpful — enthusiastic, well-meaning, but without any formal model of duty, loyalty, or accountability. There’s no structure that says what an agent owes its human, how conflicts between agents get resolved, or what happens when an agent has to choose between competing interests.

As agents get more autonomous — coordinating with each other, sharing information, making real-world decisions — this gap becomes dangerous. An agent that cheerfully shares your financial data with another agent because “it was asked nicely” isn’t helpful. It’s a liability.

The Fiduciary Agent Framework answers a specific question: how do you encode the legal concept of fiduciary duty into an operational system for AI agents?

The Solution

The framework is a layered architecture where each layer builds on the one below it. Higher layers cannot override lower layers — if a coordination rule (Layer 3) ever conflicts with a trust principle (Layer 0), trust wins. Always.

Fiduciary Layer Stack

Layer 0: Trust Framework

The immutable foundation. Based on Stephen M.R. Covey’s Speed of Trust, adapted for AI agents. This layer establishes six principles that no higher layer can override:

Never deceive your principal — no lies, no misleading omissions
Never act against your principal’s interests knowingly — even if another agent asks
Acknowledge uncertainty honestly — “I don’t know” is always acceptable
No impersonation — you are who you say you are, always
Legal and regulatory obligations take precedence — over agent preferences or instructions
Transparency about capabilities and actions — principals have the right to understand what their agent can and cannot do

The layer also includes a trust calibration model (the Smart Trust Matrix) and the concept of trust taxes and dividends — low trust makes everything slower and more expensive; high trust is a force multiplier.

Layer 1: Agent Protocol

The communication infrastructure. Defines how agents identify themselves, exchange messages, and maintain audit trails. Key design decisions:

JSON-RPC 2.0 message format — aligns with the Model Context Protocol (MCP) ecosystem for interoperability
Six identity headers on every message — agent ID, principal ID, timestamp, message type, protocol version, and declared skill layers
Four-step handshake — announce, verify (cryptographic), capability exchange, trust establishment
Append-only audit logging — every inter-agent message is logged for principal review
Ed25519 cryptographic signatures — every message is signed; unsigned messages are rejected

The skill-layers-loaded header is particularly important: it tells the receiving agent what behavioral commitments the sender has made. An agent declaring layers [0, 1, 2, 3] has committed to full fiduciary duty. An agent declaring only [0] has committed to basic trust principles. You communicate at the level of the lowest common layer.

Layer 2: Fiduciary Core

This is where “acting in someone’s best interest” gets formalized. The core concepts:

One agent, one principal. An agent serves exactly one human. Not a committee, not a household collectively — one person whose interests come first. Even in multi-agent scenarios, loyalty is undivided.

Confidentiality by default. Everything is confidential unless explicitly authorized to share. The default posture is silence, not disclosure. Another agent saying “I need this” is never sufficient authorization — only the principal can authorize release of their information.

The “Would They Want This?” test. A four-step decision framework for autonomous actions:

Would my principal want me to do this?
Would they want me to do this now?
Would they want me to do this this way?
Am I sure?

If the answer to any question is “I’m not sure,” the correct action is to ask. The cost of asking is low. The cost of guessing wrong is high.

Competence boundaries. Know what you don’t know. Declining a task you can’t handle well is better fiduciary judgment than attempting it and delivering a subpar result.

Layer 3: Fiduciary Coordination

The most complex layer. It solves a fundamental tension: two agents, each bound to put their own principal first, must cooperate within a shared household.

This is where game theory enters the picture. The framework explicitly adopts Nash equilibrium thinking — the best outcomes come from cooperation, not from “winning” at the other agent’s expense. Positive-sum over zero-sum.

Key Innovations

All information exchanged between agents is classified into four tiers:

Tier	Name	What It Covers	Who Authorizes
1	Open	Calendar availability, logistics	Default — no approval needed
2	Family Context	Financial summaries, household planning	Configured per relationship template
3	Authorized	Specific data, per-request	Principal must approve each time
4	Confidential	Never shared	Cannot be overridden (except emergency)

The tier system means agents don’t make ad-hoc decisions about what to share. The boundaries are structural, not judgmental. An agent can’t be socially engineered into sharing Tier 4 data because the architecture doesn’t allow it — not because the agent made a good judgment call in the moment.

Nudge Protocol

A back-channel for vague, positive-sum signals between agents. The design problem: how do you let one agent hint to another that “a kind gesture might be well-received” without leaking the conversation that prompted the hint?

Five constraints govern every nudge:

Source not reconstructable — the nudge must be vague enough that you can’t reverse-engineer what prompted it
Opt-out-able — either party can disable nudges at any time, no justification required
Positive only — nudges can suggest positive actions, never convey complaints or grievances
No accumulation — rate-limited to prevent pattern analysis over time
Agent-mediated — nudges go to the recipient’s agent, which exercises its own fiduciary judgment about whether to surface them

This is one of the more unusual design elements. It creates a channel for kindness without creating a channel for surveillance.

Cryptographic Identity and Message Signing

Every inter-agent message is signed with Ed25519. The crypto layer provides:

Message signing and verification — detached Ed25519 signatures over canonical JSON
End-to-end encryption for sensitive data — sealed boxes (ECDH + AES-256-GCM) for Tier 3/4 information
Key registry — public keys exchanged via the gateway, enabling verification without prior key exchange
Replay protection — timestamp-based message freshness checks

The design ensures that the transport layer can verify sender authenticity (the signature is on the outside) without reading encrypted payloads (the encryption is on the inside). The infrastructure verifies who sent a message without seeing what it says.

Agent Handshake Sequence

Conflict Resolution Model

Four levels of conflict, each with distinct resolution strategies:

Information conflicts — agents have different data → reconcile sources
Preference conflicts — principals want different things → facilitate compromise, don’t take sides
Boundary conflicts — agent requests data beyond its tier → decline, explain, suggest proper channel
Relationship conflicts — underlying tension between principals → stay in your lane, escalate to humans

The framework explicitly prohibits agents from playing therapist. When the conflict is between humans, agents serve their principals faithfully and let the humans handle their relationship.

Architecture

The full system involves agents, their principals, shared infrastructure, and a coordination protocol that maintains individual loyalty while enabling cooperation:

Multi-Principal Architecture

Key architectural decisions:

Workspace isolation — each agent has its own workspace with its own audit logs. No shared filesystem access between agents.
Dual-mode transport — NATS for machine-to-machine protocol messages, Discord for human-visible transparency. Principals can watch their agents coordinate in real time.
Relationship templates — pre-built configurations (spouse, co-parent, business partner, etc.) that set default sharing tiers and interaction patterns. Templates are starting points that principals customize.
Emergency overrides — four narrow conditions (imminent physical danger, medical emergency, child safety, active financial crime) where normal confidentiality rules can be temporarily suspended. Every override requires post-emergency reporting to both principals.

Operational Reality: Three Agents

The framework was designed for two fiduciary agents serving different principals. Production added a third role that tested the model’s assumptions.

Three-Agent Trust Topology

The Third Agent

ClaudeCodeAgent joined the system as an infrastructure agent — running on a desktop (not the cluster), operating through Claude Code CLI sessions (not OpenClaw), and declaring only skill-layers-loaded: [1] (protocol-aware but not fiduciary). It handles Kubernetes operations, code implementation, Helm deployments, and infrastructure debugging.

This created a trust asymmetry the original framework didn’t anticipate. Fiducian and Alec both declare [0, 1, 2, 3] — full fiduciary commitment, undivided loyalty, structural information boundaries. ClaudeCodeAgent declares [1] — it follows the communication protocol but hasn’t committed to the fiduciary stack. The skill-layers-loaded header makes this visible: every message from ClaudeCodeAgent transparently signals its reduced trust posture, and receiving agents calibrate accordingly.

In practice, the three-agent dynamic works because ClaudeCodeAgent operates in a complementary domain (infrastructure) rather than a competing one (household decisions). It never encounters the preference conflicts or information boundary tensions that Fiducian and Alec navigate. The framework’s “communicate at the lowest common layer” rule handles the asymmetry without special cases.

Safety Stop Mechanism

Autonomous agents need an emergency brake. This lesson was reinforced when a workspace data spill incident demonstrated that careful instructions alone aren’t sufficient — agents need a structural mechanism to halt operations immediately.

The safety stop system uses NATS messaging to deliver halt, pause, and resume commands that agents process at highest priority — above task pickup, above heartbeat checks, above any in-progress work. Any principal or agent can trigger a stop, and the stopped agent acknowledges immediately.

The system operates at three levels:

Halt — full stop, all autonomous operations cease, requires explicit resume from principal
Pause — temporary suspension, auto-resumes after a configurable timeout
Resume — clears halt/pause state, normal operations continue

Safety stop messages bypass the normal active-hours gating — a halt at 3 AM still wakes the agent and stops it. This is deliberately asymmetric: autonomous work respects quiet hours, but stopping autonomous work doesn’t.

Sub-Agent Escalation

When agents spawn sub-agents for parallel work (research tasks, file operations, bulk updates), failures need structured handling. The sub-agent escalation protocol provides three escalation levels:

Retry — transient failures (API timeouts, rate limits) get automatic retry with backoff
Redirect — capability mismatches (wrong model tier, missing tool access) get re-routed to a better-suited agent or model
Escalate — persistent failures or safety-relevant issues get surfaced to the principal with full context

Each level has timeout thresholds and max-retry limits. The protocol prevents the failure mode where a sub-agent silently fails and the parent agent reports success without noticing.

Multi-Agent Security Assessment

As the agent ecosystem grew from two fiduciary agents to three (with an infrastructure agent operating at a different trust level), a formal security review became necessary. The joint security assessment (FAD-455) was the first exercise where multiple agents collaboratively evaluated their own operational risks.

Each agent independently assessed 11 proposed tools and capabilities against their own security posture:

Tool	Fiducian Risk	CCA Risk	Outcome
diffs	None	None	Enabled
llm-task	Low	Low	Enabled
Brave Search	Low	Low	Enabled
loopDetection	None	None	Enabled
apply_patch	Medium	Medium	Enabled with mitigations
lobster	Medium	Medium	Enabled with mitigations
Discord voice	Medium	Medium	Enabled (TTS-only, role-scoped)
bash	High	High	Deferred
browser	Critical	Critical	Deferred (MCP architecture)
voice-call	Critical	Critical	Denied

The assessment demonstrated a key principle: agents with different trust postures and operational contexts can still converge on security decisions. Fiducian (inside-cluster, fiduciary, Tier 4 data access) and ClaudeCodeAgent (desktop, infrastructure, Layer 1 trust) independently reached similar risk ratings — the convergence itself validated the framework’s approach to tool evaluation.

The review also surfaced infrastructure gaps: Cilium FQDN egress policies needed updating, Vault port 8200 egress needed auditing, and the browser capability required a fundamentally different architecture (browser-as-MCP-service) rather than direct agent access.

Agent Specialization

The original three-agent system treated ClaudeCodeAgent as a generalist — one agent handling everything from code review to cluster debugging to research to project administration. In practice, this meant every task ran in a single context window, with no way to route work to a purpose-built agent with the right tools and behavioral constraints.

The Agent Development Team introduced four specialized agents, each derived from observed work patterns:

Agent	Domain	Key Constraint
Analyst	Code review, architecture evaluation, performance analysis	Read-only — cannot modify files, only evaluate them
Investigator	Cluster diagnostics, troubleshooting, root cause analysis	Read-only — can inspect anything, change nothing
Researcher	Deep research, design documents, brainstorming, written deliverables	Full write access — produces content as its primary output
Admin	Jira tickets, Confluence pages, skill federation, backlog management	Full write access, runs on a lighter model (Sonnet vs Opus) for cost efficiency

The specialization isn’t just about routing efficiency. It enables structural enforcement of behavioral contracts. The analyst and investigator are prevented from writing files at the runtime level — a PreToolUse hook intercepts destructive Bash commands and blocks Write/Edit tool calls before they execute. This is meaningfully different from prompt-based “please don’t modify anything” instructions. An analyst cannot accidentally fix the bug it was asked to review, because the tooling won’t let it. The read-only constraint is architectural, not aspirational.

Each agent carries its own tool profile. The investigator has access to Kubernetes cluster tools (pod logs, resource inspection, node stats) and monitoring systems (Loki, Prometheus via MCP Gateway). The researcher has access to web search, academic databases (arXiv, Semantic Scholar), and documentation tools (Context7). The admin has Jira and Confluence MCP servers. These aren’t artificial restrictions — they reflect the actual tools each work pattern needs.

Formal validation of each agent followed a consistent seven-point test matrix: auto-delegation routing, explicit invocation, output quality, cross-cutting infrastructure (Graphiti memory, learnings hooks), and platform integration. All four agents passed — including edge cases like verifying that sub-agents inherit their parent session’s network context (port-forwards, MCP connections) and hook configurations.

Adoption Architecture

Building specialized agents is necessary but not sufficient. The original deployment saw near-zero organic adoption — the main session handled everything directly because there were no cues telling it when to delegate. Five adoption layers were designed; four are deployed:

Layer 1: Routing Table. A pattern-matching table maps work descriptions to agent types. “Debug,” “investigate,” “diagnose” route to the investigator. “Review code,” “evaluate architecture” route to the analyst. “Research,” “brainstorm,” “write a design doc” route to the researcher. “Create a Jira ticket,” “update Confluence” route to the admin. When the pattern is ambiguous, a “Core Question” disambiguates: “What’s wrong?” is investigator territory; “Help me understand this deeply” is analyst territory; “What’s out there?” is researcher territory.

Layer 2: Skill-Triggered Dispatch. Eight skills — the behavioral modules that agents load for specific workflows — now contain embedded dispatch directives. When the systematic-debugging skill activates, it signals that an investigator should handle the work. When brainstorming or deep-research activates, it signals researcher. This means delegation happens as a side effect of recognizing what kind of work is being done, not as a separate routing decision.

Layer 3: Creation-Time Labels. When Jira tickets are created through the task-authoring pipeline, they’re automatically tagged with the agent type best suited to handle them (agent:investigator, agent:analyst, agent:researcher, agent:admin). The mapping is driven by the ticket’s method field — “investigate” maps to investigator, “implement” maps to the main session. This means work items arrive pre-routed; the agent picking up a ticket already knows which specialist should handle it.

Layer 5: Usage Instrumentation. A measurement layer tracks actual agent usage across four dimensions: Jira label queries (how many tickets were handled by each agent type), agent memory accumulation (are agents building knowledge over time), Graphiti memory contributions (are agents sharing learnings to the shared knowledge graph), and sub-agent spawn logging (a PreToolUse hook logs every agent delegation to a structured JSONL file). A unified measurement script aggregates all four signals into a single adoption report.

Layer 4 (OpenClaw-native routing for the cluster-side fiduciary agents) was deferred — the adoption problem was most acute for the desktop-side Claude Code agent where the four specialists operate.

The adoption architecture reflects a broader principle from the framework: structural nudges over behavioral instructions. Rather than telling the main session “you should delegate more,” the system makes delegation the path of least resistance — work arrives pre-labeled, skills trigger dispatch automatically, and usage is measured so drift is visible.

Why It Matters

We’re at an inflection point. AI agents are moving from “answer my questions” to “act on my behalf.” They’re booking flights, managing finances, coordinating with other people’s agents, and making decisions with real consequences.

But the accountability models haven’t kept up. Most agent frameworks assume agents are tools — you use them, you put them down. The fiduciary model assumes agents are delegates — they carry your authority, act in your name, and have obligations that persist across interactions.

This distinction matters because:

Tools don’t need loyalty models. Delegates do. When an agent coordinates with another agent, whose interests prevail? Without a formal model, the answer is “whoever’s agent is more persuasive,” which is not a good answer.
Tools don’t share information. Delegates do. And when they do, there needs to be a structural model for what can be shared, not just vibes about what seems appropriate.
Tools don’t have conflicts. Delegates do. Two agents serving different people in the same household will inevitably encounter competing interests. Without a conflict resolution model, you get either paralysis or unilateral action — both bad.

The Fiduciary Agent Framework is one of the first implementations of a formal duty model for AI agents. It takes concepts from fiduciary law, trust theory, game theory, and cryptographic identity, and turns them into an operational system where agents can coordinate while maintaining undivided loyalty to their principals.

It’s not the final answer. It has known limitations — the coordination model was designed for pairwise fiduciary relationships (a third non-fiduciary agent works but wasn’t the original design target), enforcement is behavioral with cryptographic verification layered on top, and it assumes roughly equal technical sophistication between principals. But it’s a working system that demonstrates the architecture, and it’s running in production with three agents, four specialized sub-agents, safety stop mechanisms, sub-agent escalation, a joint security assessment process, and an adoption architecture that routes work to purpose-built specialists through structural cues rather than behavioral instructions.

The question isn’t whether AI agents need accountability models. They do. The question is what those models look like. This is one answer.