Agent Memory Architecture
The Problem
Section titled “The Problem”AI agents wake up fresh every session. They have no memory of what happened yesterday, what decisions were made, what lessons were learned, or what the current state of the world is. Without external memory, every session starts from zero — the agent re-reads the same files, re-discovers the same context, and occasionally contradicts decisions it made hours earlier because it doesn’t remember making them.
The naive solution — dump everything into a system prompt — doesn’t scale. A few weeks of daily notes, project context, relationship history, and operational learnings quickly exceeds context limits. Even if it fit, loading everything every session wastes tokens on irrelevant context. The agent needs memory that’s both persistent and selective — comprehensive enough to maintain continuity, targeted enough to stay within context budget.
Architecture Evolution
Section titled “Architecture Evolution”The memory system evolved through three distinct phases, each solving the problems the previous phase revealed.
Phase 1: Flat Files
Section titled “Phase 1: Flat Files”The initial approach: a single MEMORY.md file loaded every session, plus daily log files (memory/YYYY-MM-DD.md). Simple, readable, and entirely manual. Agents wrote what seemed important, read everything at session start, and hoped the important stuff was near the top.
Problems:
MEMORY.mdgrew linearly with time. After a month, it was consuming significant context on every session.- No search capability — finding a specific decision or fact required reading the whole file.
- No structure — financial data, infrastructure notes, relationship context, and operational learnings mixed together.
- Staleness — old information lingered because there was no systematic review process.
Phase 2: Graphiti Knowledge Graph
Section titled “Phase 2: Graphiti Knowledge Graph”Graphiti (backed by FalkorDB) added structured, searchable memory through an episodic knowledge graph. Agents could write episodes (events, decisions, facts) and retrieve them via semantic search. Group-level isolation (fiducian, alec, shared, research, claude-code) provided access control.
Current status: Operational — 647 nodes confirmed intact after the FAD-473 upstream bug (a Graphiti library regression corrupted episode extraction). Recovery preserved all existing knowledge; the bug was traced to a dependency update and mitigated by pinning the Graphiti version.
What it solved:
- Cross-session continuity without loading everything into context
- Semantic search across all stored knowledge
- Cross-agent knowledge sharing via the
sharedgroup - Structured entity types tuned to the domain (10 custom types replacing 9 generic defaults)
What it didn’t solve:
- Agents still needed file-based memory for session-start bootstrapping (Graphiti search requires knowing what to search for)
- Episode extraction quality depended on the LLM backend — the initial Groq backend had a 67% failure rate under load
- No way to do broad “what happened recently?” scans without specific queries
Phase 3: Memory-Core Hybrid (Current)
Section titled “Phase 3: Memory-Core Hybrid (Current)”The current architecture combines file-based topic memories with Graphiti semantic search and Ollama-powered local embeddings. Each layer serves a different access pattern:
File-based topic memories (memory/*.md) provide structured, scannable context organized by domain:
| File | Contents | Load Pattern |
|---|---|---|
memory/projects.md | Active project status, recent work | Every main session |
memory/infrastructure.md | Runtime config, agent protocol, tools | Every main session |
memory/relationships.md | Working relationships, agent dynamics | Every main session |
memory/lessons.md | Behavioral traps, API gotchas, hard-gates | Every main session |
memory/YYYY-MM-DD.md | Raw daily logs | Today + yesterday only |
Graphiti knowledge graph provides deep, semantic retrieval for specific queries:
- “What was the decision on credential storage?” → searches across episodes
- “When did we last coordinate with Alec about the budget?” → temporal search
- Cross-agent state via
sharedgroup - Research findings via
researchgroup
Semantic memory search (Ollama nomic-embed-text) enables fuzzy matching across both file-based and Graphiti memories:
- OpenClaw’s
memory_searchtool runs embedding-based similarity search overMEMORY.mdandmemory/*.md - Returns ranked snippets with file path and line numbers
memory_getthen pulls only the relevant lines — surgical context retrieval instead of loading whole files
Memory Lifecycle
Section titled “Memory Lifecycle”Serena Deprecation
Section titled “Serena Deprecation”An earlier memory system called “Serena” provided semantic memory through a custom MCP server with embedding-based retrieval. It was deprecated and replaced by the memory-core hybrid for several reasons:
- Redundant storage — facts lived in both Serena and Graphiti, with no reconciliation
- Session overhead — Serena’s MCP server added latency and consumed resources for functionality now built into OpenClaw natively
- No group isolation — Serena had a flat namespace, unlike Graphiti’s group-based access control
- Migration path — OpenClaw’s native
memory_search/memory_gettools provide the same semantic search capability without a separate service
The migration (FAD-406 through FAD-411) extracted high-value Serena memories into file-based topic memories and Graphiti episodes, then deprecated the Serena MCP server. Historical facts with temporal metadata were archived to Graphiti; operational knowledge was promoted to topic files.
LLM Backend for Graphiti
Section titled “LLM Backend for Graphiti”Graphiti’s entity extraction pipeline — which turns raw episodes into graph nodes and edges — requires an LLM for processing. The backend choice significantly impacts reliability:
| Backend | Model | Rate Limit | Episode Failure Rate |
|---|---|---|---|
| Groq (initial) | llama-3.3-70b | ~30 RPM | ~67% under load |
| Foundry AIP (current) | Foundry-managed | ~760 RPM | Near-zero |
The migration from Groq to Foundry AIP (FAD-363) was driven by Groq’s aggressive rate limiting — at 30 RPM, batch episode processing would routinely fail mid-stream. The Foundry backend provides 25× the throughput with production-grade reliability. Prometheus alerts on graphiti_episode_failures_total now provide real-time visibility into extraction health.
The entity type schema was also refined during migration — from 9 generic defaults (entity, person, organization, etc.) to 10 domain-specific types tuned for the household agent use case (project, skill, agent, decision, incident, etc.). This improves graph quality by producing more meaningful relationships between domain entities.
What This Demonstrates
Section titled “What This Demonstrates”Layered Memory Design
Section titled “Layered Memory Design”The three-layer architecture (files → knowledge graph → semantic search) mirrors how human memory works: fast-access working memory (today’s context), structured long-term storage (topic files), and associative retrieval (semantic search). Each layer has different access patterns, different update frequencies, and different retention policies. The system is better than any single layer alone.
Graceful Degradation
Section titled “Graceful Degradation”If Graphiti is unavailable, agents still function with file-based memories. If semantic search fails, agents can fall back to direct file reads. The layers are complementary, not dependent — the system degrades gracefully rather than failing completely.
Evolution Under Load
Section titled “Evolution Under Load”The progression from flat files to knowledge graph to hybrid architecture wasn’t planned upfront. Each phase solved the problems of the previous phase, guided by actual operational experience. The Serena deprecation is a case study in recognizing when a subsystem has been superseded and having the discipline to remove it rather than maintaining redundant infrastructure.
Compaction Memory Flush
Section titled “Compaction Memory Flush”OpenClaw sessions have finite context windows. When a session reaches its context limit, OpenClaw compacts the conversation — summarizing earlier turns to free space. But compaction is lossy: details that weren’t captured in the summary are gone.
The memory flush pattern addresses this by triggering a mandatory memory write before compaction occurs. When OpenClaw signals an impending compaction, the agent:
- Appends new entries to today’s daily log (
memory/YYYY-MM-DD.md) — never overwrites existing content - Captures decisions made, work completed, context that would be lost
- Records exact identifiers (commit SHAs, ticket numbers, timestamps) that the summary alone might drop
The flush is append-only by design. Multiple compactions in the same session each append their own section, building a complete record of the session’s work even across context resets.
This pattern emerged from a practical problem: after compaction, agents would lose track of which tickets they’d already updated, which commits they’d pushed, or which sub-agents they’d spawned. The daily log serves as durable storage that survives context loss — the next session (or post-compaction continuation) can read it back and resume without re-doing work or creating duplicates.
Related
Section titled “Related”- FAD: Multi-Agent System Architecture — the platform these memory systems serve
- Self-Learning Feedback Loop — the learning system built on top of memory
- Confluence Templates & Agent Documentation Patterns — documentation patterns that complement memory systems