Agent Memory Architecture

The Problem

AI agents wake up fresh every session. They have no memory of what happened yesterday, what decisions were made, what lessons were learned, or what the current state of the world is. Without external memory, every session starts from zero — the agent re-reads the same files, re-discovers the same context, and occasionally contradicts decisions it made hours earlier because it doesn’t remember making them.

The naive solution — dump everything into a system prompt — doesn’t scale. A few weeks of daily notes, project context, relationship history, and operational learnings quickly exceeds context limits. Even if it fit, loading everything every session wastes tokens on irrelevant context. The agent needs memory that’s both persistent and selective — comprehensive enough to maintain continuity, targeted enough to stay within context budget.

Architecture Evolution

The memory system evolved through four distinct phases, each solving the problems the previous phase revealed.

Phase 1: Flat Files

The initial approach: a single MEMORY.md file loaded every session, plus daily log files (memory/YYYY-MM-DD.md). Simple, readable, and entirely manual. Agents wrote what seemed important, read everything at session start, and hoped the important stuff was near the top.

Problems:

MEMORY.md grew linearly with time. After a month, it was consuming significant context on every session.
No search capability — finding a specific decision or fact required reading the whole file.
No structure — financial data, infrastructure notes, relationship context, and operational learnings mixed together.
Staleness — old information lingered because there was no systematic review process.

Phase 2: Graphiti Knowledge Graph

Graphiti (backed by FalkorDB) added structured, searchable memory through an episodic knowledge graph. Agents could write episodes (events, decisions, facts) and retrieve them via semantic search. Group-level isolation (fiducian, alec, shared, research, claude-code) provided access control.

Current status: Operational — 647 nodes confirmed intact after the FAD-473 upstream bug (a Graphiti library regression corrupted episode extraction). Recovery preserved all existing knowledge; the bug was traced to a dependency update and mitigated by pinning the Graphiti version.

What it solved:

Cross-session continuity without loading everything into context
Semantic search across all stored knowledge
Cross-agent knowledge sharing via the shared group
Structured entity types tuned to the domain (10 custom types replacing 9 generic defaults)

What it didn’t solve:

Agents still needed file-based memory for session-start bootstrapping (Graphiti search requires knowing what to search for)
Episode extraction quality depended on the LLM backend — the initial Groq backend had a 67% failure rate under load
No way to do broad “what happened recently?” scans without specific queries

Phase 3: Memory-Core Hybrid

The memory-core hybrid combines file-based topic memories with Graphiti semantic search and embedding-based retrieval. Each layer serves a different access pattern:

Three-Layer Memory Architecture

File-based topic memories (memory/*.md) provide structured, scannable context organized by domain:

File	Contents	Load Pattern
`memory/projects.md`	Active project status, recent work	Every main session
`memory/infrastructure.md`	Runtime config, agent protocol, tools	Every main session
`memory/relationships.md`	Working relationships, agent dynamics	Every main session
`memory/lessons.md`	Behavioral traps, API gotchas, hard-gates	Every main session
`memory/YYYY-MM-DD.md`	Raw daily logs	Today + yesterday only

Graphiti knowledge graph provides deep, semantic retrieval for specific queries:

“What was the decision on credential storage?” → searches across episodes
“When did we last coordinate with Alec about the budget?” → temporal search
Cross-agent state via shared group
Research findings via research group

Semantic memory search enables fuzzy matching across both file-based and Graphiti memories:

OpenClaw’s memory_search tool runs embedding-based similarity search over MEMORY.md, memory/*.md, registered wiki supplements, and session transcript chunks where enabled
Returns ranked snippets with file path and line numbers
memory_get then pulls only the relevant lines — surgical context retrieval instead of loading whole files

Phase 4: Session Transcript Recall (Current)

The newest layer closes a gap the earlier architecture intentionally left open: facts discussed in a session but never promoted into MEMORY.md, topic files, or Graphiti. Agents already write transcripts, but a transcript archive is not memory unless it can be searched at the moment of need.

OpenClaw native session memory now indexes historical Fiducian sessions alongside file memory. The FAD-761 spike enabled session recall, confirmed sources=[memory,sessions], and validated historical recall against 12 questions sourced from real prior sessions. That gave Fiducian a searchable substrate for conversations back to the February 2026 bootstrap, without forcing every relevant detail through a manual memory-write step.

Claude Code session memory extends the same idea to the closed Claude Code harness, where the implementation had to live beside the harness rather than inside it:

Component	Claude Code implementation
Transcript source	`~/.claude/projects/<project-hash>/*.jsonl`
Trigger	`UserPromptSubmit` hook, not a separate MCP tool for phase 1
Store	local sqlite-vec vector index plus FTS5 keyword index
Embeddings	LiteLLM `foundry-embed-ada` / text-embedding-ada-002, 1536 dimensions
Ranking	hybrid vector + keyword search with MMR diversity and recency decay
Scope	project-scoped by default, with opt-in broader reach
Safety	vendored secret redactor, real-transcript redaction checks, fail-open hook behavior

The trigger decision matters. On-demand recall tools require the model to realize it needs memory before it can retrieve memory — the same explicit-judgment problem that caused misses in earlier systems. Auto-injection adds a small per-prompt cost, but it makes recall ambient: the hook retrieves the top relevant prior-session chunks and injects them as additional context before the model answers.

The Claude Code rollout was delivered as a phase-gated set of tasks: explore the trigger decision, plan the component, build the indexer and hook, test parse/filter/redact behavior, verify recall quality, and finally wire the hook into ~/.claude/settings.json. Verification found a 9/12 recall score against historical benchmark questions, zero surviving secrets in redaction scans, and fail-open behavior when the embedding endpoint is unavailable.

Memory Lifecycle

Serena Deprecation

An earlier memory system called “Serena” provided semantic memory through a custom MCP server with embedding-based retrieval. It was deprecated and replaced by the memory-core hybrid for several reasons:

Redundant storage — facts lived in both Serena and Graphiti, with no reconciliation
Session overhead — Serena’s MCP server added latency and consumed resources for functionality now built into OpenClaw natively
No group isolation — Serena had a flat namespace, unlike Graphiti’s group-based access control
Migration path — OpenClaw’s native memory_search / memory_get tools provide the same semantic search capability without a separate service

The migration (FAD-406 through FAD-411) extracted high-value Serena memories into file-based topic memories and Graphiti episodes, then deprecated the Serena MCP server. Historical facts with temporal metadata were archived to Graphiti; operational knowledge was promoted to topic files.

LLM Backend for Graphiti

Graphiti’s entity extraction pipeline — which turns raw episodes into graph nodes and edges — requires an LLM for processing. The backend choice significantly impacts reliability:

Backend	Model	Rate Limit	Episode Failure Rate
Groq (initial)	llama-3.3-70b	~30 RPM	~67% under load
Foundry AIP (current)	Foundry-managed	~760 RPM	Near-zero

The migration from Groq to Foundry AIP (FAD-363) was driven by Groq’s aggressive rate limiting — at 30 RPM, batch episode processing would routinely fail mid-stream. The Foundry backend provides 25× the throughput with production-grade reliability. Prometheus alerts on graphiti_episode_failures_total now provide real-time visibility into extraction health.

The entity type schema was also refined during migration — from 9 generic defaults (entity, person, organization, etc.) to 10 domain-specific types tuned for the household agent use case (project, skill, agent, decision, incident, etc.). This improves graph quality by producing more meaningful relationships between domain entities.

What This Demonstrates

Layered Memory Design

The layered architecture (files → knowledge graph → semantic search → transcript recall) mirrors how human memory works: fast-access working memory (today’s context), structured long-term storage (topic files), associative retrieval (semantic search), and episodic recall from the actual conversation record. Each layer has different access patterns, different update frequencies, and different retention policies. The system is better than any single layer alone.

Graceful Degradation

If Graphiti is unavailable, agents still function with file-based memories. If semantic search fails, agents can fall back to direct file reads. If a transcript-recall hook cannot reach the embedding endpoint, it exits successfully and leaves the prompt unblocked. The layers are complementary, not dependent — the system degrades gracefully rather than failing completely.

Evolution Under Load

The progression from flat files to knowledge graph to hybrid architecture to session transcript recall wasn’t planned upfront. Each phase solved the problems of the previous phase, guided by actual operational experience. The Serena deprecation is a case study in recognizing when a subsystem has been superseded and having the discipline to remove it rather than maintaining redundant infrastructure. The Claude Code recall layer shows the same principle applied to a closed harness: build beside the system through hooks and local stores when modifying the harness directly is not available.

Compaction Memory Flush

OpenClaw sessions have finite context windows. When a session reaches its context limit, OpenClaw compacts the conversation — summarizing earlier turns to free space. But compaction is lossy: details that weren’t captured in the summary are gone.

The memory flush pattern addresses this by triggering a mandatory memory write before compaction occurs. When OpenClaw signals an impending compaction, the agent:

Appends new entries to today’s daily log (memory/YYYY-MM-DD.md) — never overwrites existing content
Captures decisions made, work completed, context that would be lost
Records exact identifiers (commit SHAs, ticket numbers, timestamps) that the summary alone might drop

The flush is append-only by design. Multiple compactions in the same session each append their own section, building a complete record of the session’s work even across context resets.

This pattern emerged from a practical problem: after compaction, agents would lose track of which tickets they’d already updated, which commits they’d pushed, or which sub-agents they’d spawned. The daily log serves as durable storage that survives context loss — the next session (or post-compaction continuation) can read it back and resume without re-doing work or creating duplicates.

Session transcript recall complements, but does not replace, the flush. The flush records canonical operational facts in a predictable place; transcript recall recovers details that were said but not promoted. Durable memory still needs explicit writes for decisions, deliverables, and identifiers.

FAD: Multi-Agent System Architecture — the platform these memory systems serve
Self-Learning Feedback Loop — the learning system built on top of memory
Confluence Templates & Agent Documentation Patterns — documentation patterns that complement memory systems