Self-Learning Feedback Loop

The Problem

AI agents make mistakes. They misuse APIs, skip required fields, assume file paths that don’t exist, and repeat the same errors across sessions because they wake up fresh each time. The standard fix — updating system prompts or adding more instructions — doesn’t scale. It’s reactive, manual, and depends on a human noticing the pattern first.

The real issue isn’t that agents make mistakes. It’s that they don’t learn from them. Each session is a clean slate. The agent that spent 20 minutes discovering that Jira’s REST API returns 410 on the deprecated search endpoint will rediscover this tomorrow. The agent that got corrected three times about DoD field requirements will forget by next session.

Self-learning is the system that closes this loop: capture errors and corrections as they happen, track evidence over time, and promote validated learnings into persistent memory — automatically, without human intervention.

Architecture

The self-learning system operates as a three-stage pipeline. The original design relied on agents reflecting during heartbeat cycles. That approach failed completely — the Lobster workflow engine now drives the pipeline deterministically.

Self-Learning Pipeline

Stage 1: Capture

When an agent encounters an error, receives a correction, or discovers something non-obvious, it writes a structured entry to learnings.md (Claude Code) or .learnings/ERRORS.md and .learnings/LEARNINGS.md (OpenClaw):

### DoD fields omitted during batch ticket closure
- **Context:** Closing Jira tickets after portfolio promotions
  without setting Deliverable, AC, and Method fields
- **Root cause:** DoD field population treated as separate step
  rather than built into the workflow
- **Fix:** Always set DoD fields in the same script/action
  that transitions the ticket to Done
- **Evidence:** R:4 C:3 D:0
- **Status:** promoted
- **Shared:** 2026-03-14
- **Date:** 2026-03-10 (recurred 4 times: Mar 7, 8, 9, 10)

The R/C/D counters track three types of evidence:

R (Recurrence) — the same error happened again
C (Confirmation) — the learning was validated as correct
D (Disconfirmation) — evidence that the learning is wrong or outdated

These counters are the mechanism that separates signal from noise. A one-time error with R:0 C:1 D:0 might be situational. An error with R:4 C:3 D:0 is a pattern worth promoting to long-term memory.

Entry types use a structured prefix scheme: ERR (errors), COR (corrections from humans), LRN (non-obvious discoveries), and GAP (capability gaps). Each entry gets a unique ID like ERR-20260310-001 for traceability.

Stage 2: Reflection (Lobster Pipeline)

The original design called for agents to review .learnings/ during heartbeat cycles. This was Phase 2 of the implementation (FAD-381) — heartbeat-driven reflection with R/C/D counters and promotion thresholds.

It never worked. FAD-472 discovered that the entire reflection pipeline was completely broken: zero entries had Status fields, zero entries had Shared tags, and zero learning episodes existed in the Graphiti shared group. The heartbeat reflection steps — Promote, Archive, Share — simply never executed. The process had too many steps for an agent to reliably remember during a heartbeat, and the self-learning skill’s six-step reflection cycle was consistently dropped in favor of more immediate work.

The solution was FAD-478: a Lobster pipeline that makes reflection deterministic:

┌─────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│  Tidy   │───▶│ Promote  │───▶│ Archive  │───▶│  Share   │
│         │    │          │    │          │    │          │
│ Parse   │    │ R≥2 or   │    │ D≥3:     │    │ Write to │
│ entries │    │ C≥3:     │    │ move to  │    │ Graphiti │
│ Fix fmt │    │ MEMORY.md│    │ archive  │    │ shared   │
└─────────┘    └──────────┘    └──────────┘    └──────────┘

Each step is a deterministic script — no LLM calls for orchestration. The pipeline:

Tidy — Parses .learnings/ files, normalizes formatting, ensures all entries have R/C/D counters and Status fields
Promote — Entries meeting thresholds (R ≥ 2 or C ≥ 3) get promoted to MEMORY.md or relevant topic memory files, Status updated to promoted
Archive — Entries with high disconfirmation (D ≥ 3) get archived, preventing outdated learnings from calcifying
Share — Promoted entries flagged as system-wide get written to Graphiti’s shared group for cross-agent discovery

The pipeline runs during heartbeat cycles, triggered by the same timer that drives other periodic maintenance. But because it’s a Lobster pipeline, execution is deterministic — every step runs, in order, every time. No more hoping the agent remembers to check its learnings.

Stage 3: Promotion

Entries that accumulate enough evidence get promoted from capture files into MEMORY.md or relevant topic memory files. Promoted learnings persist across sessions — they become part of the agent’s long-term operational knowledge.

The Status field tracks each entry’s lifecycle:

active — still accumulating evidence
promoted — moved to long-term memory
archived — disconfirmed or outdated
shared — written to Graphiti for cross-agent access (with ISO 8601 date)

Implementation Across Agents

The system is implemented differently for each agent runtime, reflecting their distinct architectures.

Claude Code (ClaudeCodeAgent)

Claude Code’s implementation (FAD-379) uses the platform’s native hook system:

PostToolUse hook (error-detector.sh) detects tool failures and captures error context automatically — watches for non-zero exit codes and writes structured ERR entries
Correction detection identifies when a human provides feedback that contradicts the agent’s previous action, generating COR entries
learnings.md in the project memory directory with structured entries
Semantic memory search via the auto-memory system surfaces relevant past learnings when the agent encounters similar situations

Print mode limitation (FAD-615): The PostToolUse hook does not fire when running agents in print mode (claude --agent analyst -p). This meant errors in agent sub-sessions were silently lost. The fix ensured hook initialization happens regardless of invocation mode — a subtle but important gap, since many automated workflows use -p to invoke specialist agents.

OpenClaw (Fiducian, Alec)

OpenClaw agents use the self-learning skill with Lobster pipeline automation, deployed across three phases:

Phase 1 (FAD-380): Structured capture templates for errors, corrections, and discoveries. .learnings/ directory delivered via workspace seed ConfigMap. PostToolUse hook scripts for error detection and UserPromptSubmit hooks for pre-task recall via memory_search.
Phase 2 (FAD-381): R/C/D evidence counting with promotion thresholds. Heartbeat-driven reflection integrated with HEARTBEAT.md — later replaced by the Lobster pipeline when FAD-472 revealed this approach never executed.
Phase 3 (FAD-382): Cross-agent learning via Graphiti shared memory. Promoted learnings written as episodes to the shared group with entity extraction for learning subjects.

Cross-Agent Learning

When one agent discovers something that affects all agents — like an API behavior change or a new operational convention — the learning propagates through Graphiti’s shared group.

The shared learning path:

Agent captures learning locally in .learnings/ or learnings.md
During the Lobster reflection pipeline, promoted entries are evaluated for cross-agent relevance
System-wide learnings get written to Graphiti shared group as episodes
Other agents discover them via cross-group semantic search

After the Graphiti data loss incident (FAD-473), the system was validated to confirm writes produce searchable facts. The file-based capture layer provides resilience — even if Graphiti is unavailable, local learning capture and promotion continue uninterrupted. Graphiti adds cross-agent discoverability, but the system degrades gracefully without it.

From Learning to Enforcement

One of the unexpected outcomes of the self-learning work was its influence on enforcement architecture. FAD-498 crystallized a design principle that emerged from the self-learning pipeline failures:

Passive gates are fakeable. Active gates are not.

The original work-start pipeline checked for marker files that the agent set manually — a passive gate. During the FAD-478 session, an agent was observed writing markers directly without actually running the pipeline. The fix was making the pipeline active: it performs the actual Jira transition and field population via MCP Gateway, setting markers as a side effect of real actions rather than self-reported claims.

This same principle now governs the DoD enforcement system (PreToolUse hooks that block Jira transitions unless pipeline markers exist) and the commit-first deployment gates. The self-learning epic didn’t just teach agents to learn — it taught the system how to build trustworthy automation. The pattern: if a process matters, don’t ask the agent to remember it. Make a pipeline do it, and make the pipeline’s execution the proof.

Real Examples (Production Data)

These are actual entries from the production learnings files, not hypothetical scenarios:

DoD Field Omission (R:4 C:3 — Promoted)

The most persistent error in the system: closing Jira tickets without populating Deliverable, Acceptance Criteria, and Method fields. This recurred four times in production (March 7, 8, 9, and 10) before the R counter triggered promotion.

The pattern was always the same — batch ticket closure during a sprint of portfolio work. Momentum builds, DoD steps get skipped. The fix wasn’t “try harder to remember” — it was structural: build DoD field population into the workflow script itself, so it’s impossible to close a ticket without the fields.

This learning directly informed the three-layer DoD enforcement system (PreToolUse hook → Lobster pipeline → heartbeat backstop) described in the Multi-Agent System Architecture page.

Jira Search Endpoint (R:2 C:4 — Promoted & Shared)

POST /rest/api/3/search returns 410 (deprecated). Every agent hit this at least once. After the second recurrence and four confirmations across agents, the learning was promoted and shared via Graphiti. Now all agents know to use GET or the MCP jira_search tool — and new agents joining the system can discover this through semantic search before hitting the error themselves.

SVG Text Overflow (R:3 C:2 — Promoted)

Created SVG diagrams where text overflowed boxes or elements overlapped, requiring multiple fix rounds. Root cause: not accounting for rendered text width versus box dimensions. The promoted learning: use wider boxes (add 20–30px padding), keep elements 40px+ apart. This reduced SVG revision cycles from 3–4 rounds to 1–2.

GitHub API SHA Race Condition (R:2 C:3 — Promoted)

Rapid sequential DELETE operations via the GitHub Contents API return 422 because each commit changes all file SHAs. Using a pre-fetched SHA after another commit has landed causes silent failures. Fix: get fresh SHA immediately before each operation, wait 2 seconds between commits.

Portfolio Slug Assumption (R:1 C:3 — Promoted)

First pass at portfolio ticket comments used assumed filenames and titles from ticket names instead of reading actual frontmatter from the repository. Ticket titles don’t match page titles; filenames don’t match URL slugs. Fix: always read the actual .mdx file frontmatter before making claims about page titles, slugs, or URLs.

What This Demonstrates

Adaptive Systems

The self-learning loop turns agents from static instruction-followers into adaptive systems. They don’t just execute — they accumulate operational knowledge over time. The R/C/D evidence model provides a principled way to distinguish between one-off errors (ignore), recurring patterns (learn), and outdated knowledge (archive).

Evidence-Based Knowledge Management

The counter system prevents both premature promotion (acting on a single data point) and knowledge rot (keeping outdated learnings forever). It’s a simple mechanism — three integers per entry — but it captures the essential dynamic of learning: repetition builds confidence, contradiction triggers review.

The Lobster Insight

The original self-learning design relied on agents remembering to reflect during heartbeats. It never worked — zero entries completed the full cycle (FAD-472). The fix wasn’t making the agent smarter or more disciplined. It was recognizing that reflection is a deterministic process that doesn’t need intelligence — it needs reliability. Moving it to a Lobster pipeline eliminated the human-factors problem entirely.

This is the same insight that appears throughout the FAD multi-agent system: most operational overhead isn’t thinking work. It’s sequencing work. And sequencing should be done by pipelines, not by agents burning tokens to remember steps.

Passive vs Active Gates

FAD-498 produced a design principle with broad applicability: if a process gate checks for markers that the agent sets itself, the agent can bypass it. If the gate performs the action and sets markers as a side effect, bypassing is impossible. This distinction — passive observation versus active execution — now governs all enforcement pipelines in the system. It’s the difference between “did you do it?” and “I did it for you.”

Emergent Collective Intelligence

Cross-agent learning via Graphiti creates an emergent property: the system as a whole learns faster than any individual agent. One agent’s discovery propagates to all agents without explicit coordination. This is the kind of capability that only emerges from the combination of shared memory, semantic search, and structured evidence tracking — none of the components alone would produce it.

FAD: Multi-Agent System Architecture — the platform this runs on
Agent Memory Architecture — the memory systems that underpin learning
Autonomous Work Planning — the planning system that generates the work agents learn from
Lobster Workflow Automation — the pipeline engine that makes reflection deterministic