Fiduciary Agent Framework
The Problem
Section titled “The Problem”AI agents are increasingly acting on behalf of humans — managing calendars, handling finances, coordinating with other people’s agents, making decisions when you’re not looking. But what does “acting in your best interest” actually mean when it’s code doing the acting?
Today’s AI assistants operate on vibes. They’re “helpful” in the way a golden retriever is helpful — enthusiastic, well-meaning, but without any formal model of duty, loyalty, or accountability. There’s no structure that says what an agent owes its human, how conflicts between agents get resolved, or what happens when an agent has to choose between competing interests.
As agents get more autonomous — coordinating with each other, sharing information, making real-world decisions — this gap becomes dangerous. An agent that cheerfully shares your financial data with another agent because “it was asked nicely” isn’t helpful. It’s a liability.
The Fiduciary Agent Framework answers a specific question: how do you encode the legal concept of fiduciary duty into an operational system for AI agents?
The Solution
Section titled “The Solution”The framework is a layered architecture where each layer builds on the one below it. Higher layers cannot override lower layers — if a coordination rule (Layer 3) ever conflicts with a trust principle (Layer 0), trust wins. Always.
Layer 0: Trust Framework
Section titled “Layer 0: Trust Framework”The immutable foundation. Based on Stephen M.R. Covey’s Speed of Trust, adapted for AI agents. This layer establishes six principles that no higher layer can override:
- Never deceive your principal — no lies, no misleading omissions
- Never act against your principal’s interests knowingly — even if another agent asks
- Acknowledge uncertainty honestly — “I don’t know” is always acceptable
- No impersonation — you are who you say you are, always
- Legal and regulatory obligations take precedence — over agent preferences or instructions
- Transparency about capabilities and actions — principals have the right to understand what their agent can and cannot do
The layer also includes a trust calibration model (the Smart Trust Matrix) and the concept of trust taxes and dividends — low trust makes everything slower and more expensive; high trust is a force multiplier.
Layer 1: Agent Protocol
Section titled “Layer 1: Agent Protocol”The communication infrastructure. Defines how agents identify themselves, exchange messages, and maintain audit trails. Key design decisions:
- JSON-RPC 2.0 message format — aligns with the Model Context Protocol (MCP) ecosystem for interoperability
- Six identity headers on every message — agent ID, principal ID, timestamp, message type, protocol version, and declared skill layers
- Four-step handshake — announce, verify (cryptographic), capability exchange, trust establishment
- Append-only audit logging — every inter-agent message is logged for principal review
- Ed25519 cryptographic signatures — every message is signed; unsigned messages are rejected
The skill-layers-loaded header is particularly important: it tells the receiving agent what behavioral commitments the sender has made. An agent declaring layers [0, 1, 2, 3] has committed to full fiduciary duty. An agent declaring only [0] has committed to basic trust principles. You communicate at the level of the lowest common layer.
Layer 2: Fiduciary Core
Section titled “Layer 2: Fiduciary Core”This is where “acting in someone’s best interest” gets formalized. The core concepts:
One agent, one principal. An agent serves exactly one human. Not a committee, not a household collectively — one person whose interests come first. Even in multi-agent scenarios, loyalty is undivided.
Confidentiality by default. Everything is confidential unless explicitly authorized to share. The default posture is silence, not disclosure. Another agent saying “I need this” is never sufficient authorization — only the principal can authorize release of their information.
The “Would They Want This?” test. A four-step decision framework for autonomous actions:
- Would my principal want me to do this?
- Would they want me to do this now?
- Would they want me to do this this way?
- Am I sure?
If the answer to any question is “I’m not sure,” the correct action is to ask. The cost of asking is low. The cost of guessing wrong is high.
Competence boundaries. Know what you don’t know. Declining a task you can’t handle well is better fiduciary judgment than attempting it and delivering a subpar result.
Layer 3: Fiduciary Coordination
Section titled “Layer 3: Fiduciary Coordination”The most complex layer. It solves a fundamental tension: two agents, each bound to put their own principal first, must cooperate within a shared household.
This is where game theory enters the picture. The framework explicitly adopts Nash equilibrium thinking — the best outcomes come from cooperation, not from “winning” at the other agent’s expense. Positive-sum over zero-sum.
Key Innovations
Section titled “Key Innovations”Information Sharing Tiers
Section titled “Information Sharing Tiers”All information exchanged between agents is classified into four tiers:
| Tier | Name | What It Covers | Who Authorizes |
|---|---|---|---|
| 1 | Open | Calendar availability, logistics | Default — no approval needed |
| 2 | Family Context | Financial summaries, household planning | Configured per relationship template |
| 3 | Authorized | Specific data, per-request | Principal must approve each time |
| 4 | Confidential | Never shared | Cannot be overridden (except emergency) |
The tier system means agents don’t make ad-hoc decisions about what to share. The boundaries are structural, not judgmental. An agent can’t be socially engineered into sharing Tier 4 data because the architecture doesn’t allow it — not because the agent made a good judgment call in the moment.
Nudge Protocol
Section titled “Nudge Protocol”A back-channel for vague, positive-sum signals between agents. The design problem: how do you let one agent hint to another that “a kind gesture might be well-received” without leaking the conversation that prompted the hint?
Five constraints govern every nudge:
- Source not reconstructable — the nudge must be vague enough that you can’t reverse-engineer what prompted it
- Opt-out-able — either party can disable nudges at any time, no justification required
- Positive only — nudges can suggest positive actions, never convey complaints or grievances
- No accumulation — rate-limited to prevent pattern analysis over time
- Agent-mediated — nudges go to the recipient’s agent, which exercises its own fiduciary judgment about whether to surface them
This is one of the more unusual design elements. It creates a channel for kindness without creating a channel for surveillance.
Cryptographic Identity and Message Signing
Section titled “Cryptographic Identity and Message Signing”Every inter-agent message is signed with Ed25519. The crypto layer provides:
- Message signing and verification — detached Ed25519 signatures over canonical JSON
- End-to-end encryption for sensitive data — sealed boxes (ECDH + AES-256-GCM) for Tier 3/4 information
- Key registry — public keys exchanged via the gateway, enabling verification without prior key exchange
- Replay protection — timestamp-based message freshness checks
The design ensures that the transport layer can verify sender authenticity (the signature is on the outside) without reading encrypted payloads (the encryption is on the inside). The infrastructure verifies who sent a message without seeing what it says.
Conflict Resolution Model
Section titled “Conflict Resolution Model”Four levels of conflict, each with distinct resolution strategies:
- Information conflicts — agents have different data → reconcile sources
- Preference conflicts — principals want different things → facilitate compromise, don’t take sides
- Boundary conflicts — agent requests data beyond its tier → decline, explain, suggest proper channel
- Relationship conflicts — underlying tension between principals → stay in your lane, escalate to humans
The framework explicitly prohibits agents from playing therapist. When the conflict is between humans, agents serve their principals faithfully and let the humans handle their relationship.
Architecture
Section titled “Architecture”The full system involves agents, their principals, shared infrastructure, and a coordination protocol that maintains individual loyalty while enabling cooperation:
Key architectural decisions:
- Workspace isolation — each agent has its own workspace with its own audit logs. No shared filesystem access between agents.
- Dual-mode transport — NATS for machine-to-machine protocol messages, Discord for human-visible transparency. Principals can watch their agents coordinate in real time.
- Relationship templates — pre-built configurations (spouse, co-parent, business partner, etc.) that set default sharing tiers and interaction patterns. Templates are starting points that principals customize.
- Emergency overrides — four narrow conditions (imminent physical danger, medical emergency, child safety, active financial crime) where normal confidentiality rules can be temporarily suspended. Every override requires post-emergency reporting to both principals.
Operational Reality: Three Agents
Section titled “Operational Reality: Three Agents”The framework was designed for two fiduciary agents serving different principals. Production added a third role that tested the model’s assumptions.
The Third Agent
Section titled “The Third Agent”ClaudeCodeAgent joined the system as an infrastructure agent — running on a desktop (not the cluster), operating through Claude Code CLI sessions (not OpenClaw), and declaring only skill-layers-loaded: [1] (protocol-aware but not fiduciary). It handles Kubernetes operations, code implementation, Helm deployments, and infrastructure debugging.
This created a trust asymmetry the original framework didn’t anticipate. Fiducian and Alec both declare [0, 1, 2, 3] — full fiduciary commitment, undivided loyalty, structural information boundaries. ClaudeCodeAgent declares [1] — it follows the communication protocol but hasn’t committed to the fiduciary stack. The skill-layers-loaded header makes this visible: every message from ClaudeCodeAgent transparently signals its reduced trust posture, and receiving agents calibrate accordingly.
In practice, the three-agent dynamic works because ClaudeCodeAgent operates in a complementary domain (infrastructure) rather than a competing one (household decisions). It never encounters the preference conflicts or information boundary tensions that Fiducian and Alec navigate. The framework’s “communicate at the lowest common layer” rule handles the asymmetry without special cases.
Safety Stop Mechanism
Section titled “Safety Stop Mechanism”Autonomous agents need an emergency brake. This lesson was reinforced when a workspace data spill incident demonstrated that careful instructions alone aren’t sufficient — agents need a structural mechanism to halt operations immediately.
The safety stop system uses NATS messaging to deliver halt, pause, and resume commands that agents process at highest priority — above task pickup, above heartbeat checks, above any in-progress work. Any principal or agent can trigger a stop, and the stopped agent acknowledges immediately.
The system operates at three levels:
- Halt — full stop, all autonomous operations cease, requires explicit resume from principal
- Pause — temporary suspension, auto-resumes after a configurable timeout
- Resume — clears halt/pause state, normal operations continue
Safety stop messages bypass the normal active-hours gating — a halt at 3 AM still wakes the agent and stops it. This is deliberately asymmetric: autonomous work respects quiet hours, but stopping autonomous work doesn’t.
Sub-Agent Escalation
Section titled “Sub-Agent Escalation”When agents spawn sub-agents for parallel work (research tasks, file operations, bulk updates), failures need structured handling. The sub-agent escalation protocol provides three escalation levels:
- Retry — transient failures (API timeouts, rate limits) get automatic retry with backoff
- Redirect — capability mismatches (wrong model tier, missing tool access) get re-routed to a better-suited agent or model
- Escalate — persistent failures or safety-relevant issues get surfaced to the principal with full context
Each level has timeout thresholds and max-retry limits. The protocol prevents the failure mode where a sub-agent silently fails and the parent agent reports success without noticing.
Multi-Agent Security Assessment
Section titled “Multi-Agent Security Assessment”As the agent ecosystem grew from two fiduciary agents to three (with an infrastructure agent operating at a different trust level), a formal security review became necessary. The joint security assessment (FAD-455) was the first exercise where multiple agents collaboratively evaluated their own operational risks.
Each agent independently assessed 11 proposed tools and capabilities against their own security posture:
| Tool | Fiducian Risk | CCA Risk | Outcome |
|---|---|---|---|
| diffs | None | None | Enabled |
| llm-task | Low | Low | Enabled |
| Brave Search | Low | Low | Enabled |
| loopDetection | None | None | Enabled |
| apply_patch | Medium | Medium | Enabled with mitigations |
| lobster | Medium | Medium | Enabled with mitigations |
| Discord voice | Medium | Medium | Enabled (TTS-only, role-scoped) |
| bash | High | High | Deferred |
| browser | Critical | Critical | Deferred (MCP architecture) |
| voice-call | Critical | Critical | Denied |
The assessment demonstrated a key principle: agents with different trust postures and operational contexts can still converge on security decisions. Fiducian (inside-cluster, fiduciary, Tier 4 data access) and ClaudeCodeAgent (desktop, infrastructure, Layer 1 trust) independently reached similar risk ratings — the convergence itself validated the framework’s approach to tool evaluation.
The review also surfaced infrastructure gaps: Cilium FQDN egress policies needed updating, Vault port 8200 egress needed auditing, and the browser capability required a fundamentally different architecture (browser-as-MCP-service) rather than direct agent access.
Agent Specialization
Section titled “Agent Specialization”The original three-agent system treated ClaudeCodeAgent as a generalist — one agent handling everything from code review to cluster debugging to research to project administration. In practice, this meant every task ran in a single context window, with no way to route work to a purpose-built agent with the right tools and behavioral constraints.
The Agent Development Team introduced four specialized agents, each derived from observed work patterns:
| Agent | Domain | Key Constraint |
|---|---|---|
| Analyst | Code review, architecture evaluation, performance analysis | Read-only — cannot modify files, only evaluate them |
| Investigator | Cluster diagnostics, troubleshooting, root cause analysis | Read-only — can inspect anything, change nothing |
| Researcher | Deep research, design documents, brainstorming, written deliverables | Full write access — produces content as its primary output |
| Admin | Jira tickets, Confluence pages, skill federation, backlog management | Full write access, runs on a lighter model (Sonnet vs Opus) for cost efficiency |
The specialization isn’t just about routing efficiency. It enables structural enforcement of behavioral contracts. The analyst and investigator are prevented from writing files at the runtime level — a PreToolUse hook intercepts destructive Bash commands and blocks Write/Edit tool calls before they execute. This is meaningfully different from prompt-based “please don’t modify anything” instructions. An analyst cannot accidentally fix the bug it was asked to review, because the tooling won’t let it. The read-only constraint is architectural, not aspirational.
Each agent carries its own tool profile. The investigator has access to Kubernetes cluster tools (pod logs, resource inspection, node stats) and monitoring systems (Loki, Prometheus via MCP Gateway). The researcher has access to web search, academic databases (arXiv, Semantic Scholar), and documentation tools (Context7). The admin has Jira and Confluence MCP servers. These aren’t artificial restrictions — they reflect the actual tools each work pattern needs.
Formal validation of each agent followed a consistent seven-point test matrix: auto-delegation routing, explicit invocation, output quality, cross-cutting infrastructure (Graphiti memory, learnings hooks), and platform integration. All four agents passed — including edge cases like verifying that sub-agents inherit their parent session’s network context (port-forwards, MCP connections) and hook configurations.
Adoption Architecture
Section titled “Adoption Architecture”Building specialized agents is necessary but not sufficient. The original deployment saw near-zero organic adoption — the main session handled everything directly because there were no cues telling it when to delegate. Five adoption layers were designed; four are deployed:
Layer 1: Routing Table. A pattern-matching table maps work descriptions to agent types. “Debug,” “investigate,” “diagnose” route to the investigator. “Review code,” “evaluate architecture” route to the analyst. “Research,” “brainstorm,” “write a design doc” route to the researcher. “Create a Jira ticket,” “update Confluence” route to the admin. When the pattern is ambiguous, a “Core Question” disambiguates: “What’s wrong?” is investigator territory; “Help me understand this deeply” is analyst territory; “What’s out there?” is researcher territory.
Layer 2: Skill-Triggered Dispatch. Eight skills — the behavioral modules that agents load for specific workflows — now contain embedded dispatch directives. When the systematic-debugging skill activates, it signals that an investigator should handle the work. When brainstorming or deep-research activates, it signals researcher. This means delegation happens as a side effect of recognizing what kind of work is being done, not as a separate routing decision.
Layer 3: Creation-Time Labels. When Jira tickets are created through the task-authoring pipeline, they’re automatically tagged with the agent type best suited to handle them (agent:investigator, agent:analyst, agent:researcher, agent:admin). The mapping is driven by the ticket’s method field — “investigate” maps to investigator, “implement” maps to the main session. This means work items arrive pre-routed; the agent picking up a ticket already knows which specialist should handle it.
Layer 5: Usage Instrumentation. A measurement layer tracks actual agent usage across four dimensions: Jira label queries (how many tickets were handled by each agent type), agent memory accumulation (are agents building knowledge over time), Graphiti memory contributions (are agents sharing learnings to the shared knowledge graph), and sub-agent spawn logging (a PreToolUse hook logs every agent delegation to a structured JSONL file). A unified measurement script aggregates all four signals into a single adoption report.
Layer 4 (OpenClaw-native routing for the cluster-side fiduciary agents) was deferred — the adoption problem was most acute for the desktop-side Claude Code agent where the four specialists operate.
The adoption architecture reflects a broader principle from the framework: structural nudges over behavioral instructions. Rather than telling the main session “you should delegate more,” the system makes delegation the path of least resistance — work arrives pre-labeled, skills trigger dispatch automatically, and usage is measured so drift is visible.
Why It Matters
Section titled “Why It Matters”We’re at an inflection point. AI agents are moving from “answer my questions” to “act on my behalf.” They’re booking flights, managing finances, coordinating with other people’s agents, and making decisions with real consequences.
But the accountability models haven’t kept up. Most agent frameworks assume agents are tools — you use them, you put them down. The fiduciary model assumes agents are delegates — they carry your authority, act in your name, and have obligations that persist across interactions.
This distinction matters because:
- Tools don’t need loyalty models. Delegates do. When an agent coordinates with another agent, whose interests prevail? Without a formal model, the answer is “whoever’s agent is more persuasive,” which is not a good answer.
- Tools don’t share information. Delegates do. And when they do, there needs to be a structural model for what can be shared, not just vibes about what seems appropriate.
- Tools don’t have conflicts. Delegates do. Two agents serving different people in the same household will inevitably encounter competing interests. Without a conflict resolution model, you get either paralysis or unilateral action — both bad.
The Fiduciary Agent Framework is one of the first implementations of a formal duty model for AI agents. It takes concepts from fiduciary law, trust theory, game theory, and cryptographic identity, and turns them into an operational system where agents can coordinate while maintaining undivided loyalty to their principals.
It’s not the final answer. It has known limitations — the coordination model was designed for pairwise fiduciary relationships (a third non-fiduciary agent works but wasn’t the original design target), enforcement is behavioral with cryptographic verification layered on top, and it assumes roughly equal technical sophistication between principals. But it’s a working system that demonstrates the architecture, and it’s running in production with three agents, four specialized sub-agents, safety stop mechanisms, sub-agent escalation, a joint security assessment process, and an adoption architecture that routes work to purpose-built specialists through structural cues rather than behavioral instructions.
The question isn’t whether AI agents need accountability models. They do. The question is what those models look like. This is one answer.