agentlens | ← All Guides | The Stateless Agent 15 concepts · 5 phases Guide 05 →
Guide 04 of 9  ·  How does it remember?

Agent Memory
& State

How agents remember, persist, and reason across time — the architecture of agent cognition, from context windows to vector stores.

✓ No prior knowledge needed 15 concepts explained 5 memory phases
Scroll to explore
Part 1
The Memory Problem
Phase 1 of 5 · The Memory Problem

The Stateless Agent

Every LLM call starts fresh. The model has no persistent memory by default — you rebuild its entire world from scratch on every invocation. The scaffolding you write around those calls is the agent's memory.

Most early agent bugs are actually memory bugs: wrong context, stale data, missing history. A user explains their preferences in turn 1, and by turn 8 the agent has forgotten them entirely.

The core insight: The model isn't broken — it just has no continuity. Memory is an engineering problem, not a model capability problem. You design it; you own it.

cold start stale context persistent memory

The Architecture

The Four Memory Rings

Memory architecture has four concentric rings, each more persistent and more expensive than the one inside it. In-context is fastest but ephemeral — gone when the session ends. Semantic/vector stores enable fuzzy retrieval from large knowledge bases. Episodic captures specific past interactions. Procedural is knowledge baked into the system prompt itself — most durable, slowest to update.

The trade-off: Moving outward through the rings increases persistence but adds latency, cost, and complexity. Design with the center first; move outward only when you hit a real limit.

in-context semantic episodic procedural

Memory by Design

The Memory Mindset

Memory is architecture, not an afterthought. Design your memory stack before writing instructions. Ask three questions: What does the agent need to remember within a session? Across sessions? Across users?

The answers map directly to which rings you need. Most agents only need in-context plus one outer ring. Don't over-engineer: a well-managed context window solves more than most teams expect.

Decision rule: If it fits in the context window and you only need it for this session → in-context. If it needs to survive session boundaries → go one ring out.

within session? across sessions? across users?

Part 2
In-Context Memory
Phase 2 of 5 · In-Context Memory

Managing the Context Window

The context window is your most valuable real estate. Key decisions: how much conversation history to keep, what system prompt content to include, how to structure injected data. XML beats prose for structured state — the model reads it more reliably.

The window fills fast. If you don't manage it deliberately, the oldest and most important context gets evicted first. Design eviction policies before you hit the limit, not after.

Fill order matters: System prompt → retrieved docs → recent history → current turn. Structure it this way so high-priority context survives window compression. Never let history push the system prompt out.

system prompt history retrieved docs current turn

History Strategies

Conversation History Strategies

Four strategies for multi-turn history: Full (keep everything — simple, hits the limit fast), Windowed (keep last N turns — predictable, loses early context), Summarized (compress old turns into a rolling summary — requires an extra LLM call but preserves key facts), Semantic (retrieve only the most relevant past turns — complex but powerful).

Rule of thumb: Start with windowed. Graduate to summarized when users complain the agent "forgot" something important. Add semantic retrieval only for very long sessions or large-scale deployments.

full windowed summarized semantic

State as Document

Structured State in Context

For agents that maintain complex state — task lists, user preferences, workflow progress — embed structured data directly in context using XML or JSON. The model reads it faithfully and updates it predictably.

Treat in-context structured state like a document the model is editing collaboratively with you. Define a schema, inject it on every turn, parse it back out. Pitfall: state that grows unbounded eventually overflows. Design eviction policies from the start.

Pattern: <state> block in system prompt → model reads + updates → you parse response → persist externally → re-inject next turn.

inject read update parse persist

Part 3
External Memory
Phase 3 of 5 · External Memory

Vector Stores & Retrieval

When your agent needs to know things that don't fit in the context window — internal docs, past conversations, knowledge bases — add a vector store. The pattern: embed the user's message, retrieve the top-K most semantically similar chunks, inject them into context.

Three quality levers: embedding model choice, chunk size, and how many chunks to retrieve. Common mistake: chunking too coarsely (whole documents) or too finely (individual sentences). 200-400 token chunks with overlap usually wins.

Don't add a vector store prematurely. Exhaust what you can do with a well-managed context window first. A vector store adds operational complexity — only add it when you've genuinely hit the limit.

embed similarity search top-K chunks inject

Retrieval Quality

Writing Good Retrievals

Retrieval quality determines answer quality. Three techniques that compound: Query rewriting (reformulate the user's question for retrieval before embedding), HyDE (generate a hypothetical ideal answer, embed that to retrieve), Reranking (after top-K, use a cross-encoder to reorder by relevance).

Each step costs a bit more latency. Each one meaningfully improves precision. Stack them incrementally — start with rewriting, add reranking when precision matters most.

HyDE insight: The embedding of "what a good answer looks like" is often closer to the right documents than the embedding of the raw question. Especially effective for technical queries.

raw query rewritten HyDE reranked

Interaction Memory

Episodic Memory

Episodic memory stores specific past interactions and retrieves them when relevant. "User said they prefer concise responses." "This customer's previous complaint was about billing." Retrieved and injected into the next session's context.

Implementation: at session end, extract key facts with an LLM → embed and store → at next session start, retrieve top relevant facts → inject. TTL everything — a memory from 6 months ago may no longer reflect reality.

Give users control. Users should be able to view, edit, and delete their episodic memories. An agent that "remembers" wrong things is worse than one that forgets. Memory is a trust feature.

extract facts embed + store retrieve inject

Part 4
Procedural & Shared Memory
Phase 4 of 5 · Procedural & Shared

Procedural Memory

Procedural memory is knowledge encoded in the system prompt itself. When the agent repeatedly makes the same mistake, the fix isn't episodic — it's procedural: update the system prompt to embed the corrected behavior permanently.

This is the most durable form of memory and the cheapest to retrieve (zero cost — it's already in context). Pattern: monitor evals for systematic failure modes → diagnose root cause → encode the fix as a rule → re-run evals. Agent learning without fine-tuning.

Procedural beats episodic for systematic behaviors. If an agent always makes the same mistake, fix it in the prompt. Episodic memory can't override a consistent failure mode — procedural memory can.

failure trace diagnose encode rule eval improvement

Multi-Agent Memory

Shared Memory in Multi-Agent Systems

When multiple agents collaborate, they need shared state. Two patterns: Blackboard (shared read-write store all agents can access — simple, requires locking) and Message passing (agents communicate via structured messages — more complex, naturally ordered).

Key risks: race conditions (two agents write conflicting state simultaneously), staleness (agent reads superseded state), visibility (agents don't know what others have done). Treat shared memory like a database: define schemas, use locking, log all writes.

Design rule: If agents run in parallel, use optimistic locking — read the current version, apply your write only if version hasn't changed since your read. Prevents silent state corruption.

blackboard message passing race condition locking

Stale Memory

Memory Conflicts & Staleness

Memories go stale. A user's preferences change. A document gets updated. An episodic memory from three months ago contradicts the current system state. Stale memories cause confident incorrect behavior — often worse than no memory at all.

Strategies: TTL-based expiry, version-based invalidation (memories linked to document versions), confidence decay (memories become less trusted over time), user-facing controls (let users view and delete).

TTL everything. Set conservative TTLs by default — 30 days for preferences, 7 days for task state. Let users opt into longer retention. Expired memories that get re-confirmed are fine. Stale memories that go unquestioned cause real failures.

stale TTL expiry version invalidation user controls

Part 5
Memory Engineering
Phase 5 of 5 · Memory Engineering

The Memory Stack in Practice

Decision guide — which ring to use for which problem: In-context for current session, structured state, recent history. Vector store for large knowledge bases, past conversations needing semantic search. Episodic for user preferences, project context, customer history. Procedural for systematic agent behaviors and learned rules.

Start at the center, move outward only when you hit a real limit. The most common over-engineering mistake: jumping to vector stores before exhausting what a well-managed context window can do.

Anti-pattern: Adding a vector store "just in case." Operational complexity is real — indexing pipelines, embedding costs, retrieval latency, stale index management. Add it when you've actually hit the limit, not as a precaution.
Privacy by Design

Privacy & Memory

Memory creates privacy obligations. You're storing user behavior, preferences, and content — often across sessions. GDPR and similar regulations give users rights to access, correct, and delete their data. Design for deletion from day one.

Never store sensitive information (health data, financial data, passwords) in agent memory without explicit consent. Separate memories by user ID. Keep audit logs of what was stored and when. Make per-user purge a first-class operation, not a hotfix.

Memory is a trust feature. Users who know exactly what the agent remembers — and can delete it — trust the agent more, not less. Transparency about memory is a feature. Opacity about memory is a liability.

GDPR per-user isolation right to delete audit log

Principles

8 Memory Rules

Eight rules that memory-mature teams follow. Violate them and you'll rediscover why they exist.

① Start in-context — Don't add a vector store until you've hit the window limit.
② Design eviction policies first — Unbounded memory always overflows.
③ TTL everything — Stale memories are worse than no memories.
④ Give users control — View and delete is a trust feature.
⑤ Separate by user — Never let one user's memory leak to another's context.
⑥ Procedural beats episodic — Systematic failures go in the prompt, not a memory store.
⑦ Log every write — You need an audit trail when memory misbehaves.
⑧ Test memory paths — Write evals that specifically test retrieval and injection.