Skip to content

Agent Memory Architecture

AI agents wake up fresh every session. They have no memory of what happened yesterday, what decisions were made, what lessons were learned, or what the current state of the world is. Without external memory, every session starts from zero — the agent re-reads the same files, re-discovers the same context, and occasionally contradicts decisions it made hours earlier because it doesn’t remember making them.

The naive solution — dump everything into a system prompt — doesn’t scale. A few weeks of daily notes, project context, relationship history, and operational learnings quickly exceeds context limits. Even if it fit, loading everything every session wastes tokens on irrelevant context. The agent needs memory that’s both persistent and selective — comprehensive enough to maintain continuity, targeted enough to stay within context budget.

The memory system evolved through three distinct phases, each solving the problems the previous phase revealed.

The initial approach: a single MEMORY.md file loaded every session, plus daily log files (memory/YYYY-MM-DD.md). Simple, readable, and entirely manual. Agents wrote what seemed important, read everything at session start, and hoped the important stuff was near the top.

Problems:

  • MEMORY.md grew linearly with time. After a month, it was consuming significant context on every session.
  • No search capability — finding a specific decision or fact required reading the whole file.
  • No structure — financial data, infrastructure notes, relationship context, and operational learnings mixed together.
  • Staleness — old information lingered because there was no systematic review process.

Graphiti (backed by FalkorDB) added structured, searchable memory through an episodic knowledge graph. Agents could write episodes (events, decisions, facts) and retrieve them via semantic search. Group-level isolation (fiducian, alec, shared, research, claude-code) provided access control.

Current status: Operational — 647 nodes confirmed intact after the FAD-473 upstream bug (a Graphiti library regression corrupted episode extraction). Recovery preserved all existing knowledge; the bug was traced to a dependency update and mitigated by pinning the Graphiti version.

What it solved:

  • Cross-session continuity without loading everything into context
  • Semantic search across all stored knowledge
  • Cross-agent knowledge sharing via the shared group
  • Structured entity types tuned to the domain (10 custom types replacing 9 generic defaults)

What it didn’t solve:

  • Agents still needed file-based memory for session-start bootstrapping (Graphiti search requires knowing what to search for)
  • Episode extraction quality depended on the LLM backend — the initial Groq backend had a 67% failure rate under load
  • No way to do broad “what happened recently?” scans without specific queries

The current architecture combines file-based topic memories with Graphiti semantic search and Ollama-powered local embeddings. Each layer serves a different access pattern:

Three-Layer Memory Architecture

File-based topic memories (memory/*.md) provide structured, scannable context organized by domain:

FileContentsLoad Pattern
memory/projects.mdActive project status, recent workEvery main session
memory/infrastructure.mdRuntime config, agent protocol, toolsEvery main session
memory/relationships.mdWorking relationships, agent dynamicsEvery main session
memory/lessons.mdBehavioral traps, API gotchas, hard-gatesEvery main session
memory/YYYY-MM-DD.mdRaw daily logsToday + yesterday only

Graphiti knowledge graph provides deep, semantic retrieval for specific queries:

  • “What was the decision on credential storage?” → searches across episodes
  • “When did we last coordinate with Alec about the budget?” → temporal search
  • Cross-agent state via shared group
  • Research findings via research group

Semantic memory search (Ollama nomic-embed-text) enables fuzzy matching across both file-based and Graphiti memories:

  • OpenClaw’s memory_search tool runs embedding-based similarity search over MEMORY.md and memory/*.md
  • Returns ranked snippets with file path and line numbers
  • memory_get then pulls only the relevant lines — surgical context retrieval instead of loading whole files

Memory Lifecycle

An earlier memory system called “Serena” provided semantic memory through a custom MCP server with embedding-based retrieval. It was deprecated and replaced by the memory-core hybrid for several reasons:

  • Redundant storage — facts lived in both Serena and Graphiti, with no reconciliation
  • Session overhead — Serena’s MCP server added latency and consumed resources for functionality now built into OpenClaw natively
  • No group isolation — Serena had a flat namespace, unlike Graphiti’s group-based access control
  • Migration path — OpenClaw’s native memory_search / memory_get tools provide the same semantic search capability without a separate service

The migration (FAD-406 through FAD-411) extracted high-value Serena memories into file-based topic memories and Graphiti episodes, then deprecated the Serena MCP server. Historical facts with temporal metadata were archived to Graphiti; operational knowledge was promoted to topic files.

Graphiti’s entity extraction pipeline — which turns raw episodes into graph nodes and edges — requires an LLM for processing. The backend choice significantly impacts reliability:

BackendModelRate LimitEpisode Failure Rate
Groq (initial)llama-3.3-70b~30 RPM~67% under load
Foundry AIP (current)Foundry-managed~760 RPMNear-zero

The migration from Groq to Foundry AIP (FAD-363) was driven by Groq’s aggressive rate limiting — at 30 RPM, batch episode processing would routinely fail mid-stream. The Foundry backend provides 25× the throughput with production-grade reliability. Prometheus alerts on graphiti_episode_failures_total now provide real-time visibility into extraction health.

The entity type schema was also refined during migration — from 9 generic defaults (entity, person, organization, etc.) to 10 domain-specific types tuned for the household agent use case (project, skill, agent, decision, incident, etc.). This improves graph quality by producing more meaningful relationships between domain entities.

The three-layer architecture (files → knowledge graph → semantic search) mirrors how human memory works: fast-access working memory (today’s context), structured long-term storage (topic files), and associative retrieval (semantic search). Each layer has different access patterns, different update frequencies, and different retention policies. The system is better than any single layer alone.

If Graphiti is unavailable, agents still function with file-based memories. If semantic search fails, agents can fall back to direct file reads. The layers are complementary, not dependent — the system degrades gracefully rather than failing completely.

The progression from flat files to knowledge graph to hybrid architecture wasn’t planned upfront. Each phase solved the problems of the previous phase, guided by actual operational experience. The Serena deprecation is a case study in recognizing when a subsystem has been superseded and having the discipline to remove it rather than maintaining redundant infrastructure.

OpenClaw sessions have finite context windows. When a session reaches its context limit, OpenClaw compacts the conversation — summarizing earlier turns to free space. But compaction is lossy: details that weren’t captured in the summary are gone.

The memory flush pattern addresses this by triggering a mandatory memory write before compaction occurs. When OpenClaw signals an impending compaction, the agent:

  1. Appends new entries to today’s daily log (memory/YYYY-MM-DD.md) — never overwrites existing content
  2. Captures decisions made, work completed, context that would be lost
  3. Records exact identifiers (commit SHAs, ticket numbers, timestamps) that the summary alone might drop

The flush is append-only by design. Multiple compactions in the same session each append their own section, building a complete record of the session’s work even across context resets.

This pattern emerged from a practical problem: after compaction, agents would lose track of which tickets they’d already updated, which commits they’d pushed, or which sub-agents they’d spawned. The daily log serves as durable storage that survives context loss — the next session (or post-compaction continuation) can read it back and resume without re-doing work or creating duplicates.