The Claude Agent SDK makes it easy to build autonomous agents that reason, plan, and execute. But without persistent memory, every conversation starts from zero. Your agent forgets what it learned, who it talked to, and what worked last time. It becomes reactive instead of truly autonomous.
We built a live production agent system for Like One that handles grant applications, funding research, and donor communication. It needs to remember: grant eligibility rules, past donor preferences, proposal templates that worked, and what failed. That's not just better UX — it's the difference between a toy chatbot and a system that gets smarter over time.
This guide shows five memory patterns we use in production. Pick the one that fits your agent's job, or layer multiple patterns. All examples run on the Claude API with no external services required (though we show how to scale each one).
Pattern 1: Session Context (Simplest)
Store the entire conversation history in your agent's context window. This works for short sessions (under ~50K tokens) and requires zero infrastructure.
from anthropic import Anthropic
client = Anthropic()
class SessionAgent:
def __init__(self, system_prompt: str):
self.system = system_prompt
self.messages = []
def chat(self, user_input: str) -> str:
self.messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=self.system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
agent = SessionAgent(
system_prompt="You are a grant research assistant. Help find funding opportunities."
)
print(agent.chat("What grants exist for AI nonprofits?"))
print(agent.chat("Show me the ones that fund open source."))
When to use: Debugging, short workflows, internal tools, anything under 2-hour session length. No infrastructure cost.
When NOT to use: Multi-day workflows, user-facing products, anything requiring conversation history after restart.
Pattern 2: Episodic Memory (Practical)
Store summaries of past conversations in a database. Before each new session, retrieve relevant past interactions. This is what we use in production.
import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic
client = Anthropic()
class EpisodicAgent:
def __init__(self, system_prompt: str, db_path: str = "agent_memory.db"):
self.system = system_prompt
self.db_path = db_path
self.messages = []
self._init_db()
def _init_db(self):
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
CREATE TABLE IF NOT EXISTS episodes (
id INTEGER PRIMARY KEY,
user_id TEXT,
timestamp TEXT,
summary TEXT,
outcome TEXT,
metadata TEXT
)
''')
conn.commit()
conn.close()
def recall(self, user_id: str, query: str, limit: int = 3) -> str:
"""Retrieve relevant past episodes."""
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
SELECT summary, outcome FROM episodes
WHERE user_id = ?
ORDER BY timestamp DESC
LIMIT ?
''', (user_id, limit))
episodes = c.fetchall()
conn.close()
if not episodes:
return "No prior interactions found."
recall_text = "Past interactions:\n"
for summary, outcome in episodes:
recall_text += f"- {summary}\n Outcome: {outcome}\n"
return recall_text
def store_episode(self, user_id: str, summary: str, outcome: str, metadata: dict = None):
"""Save this session's summary to memory."""
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
INSERT INTO episodes (user_id, timestamp, summary, outcome, metadata)
VALUES (?, ?, ?, ?, ?)
''', (user_id, datetime.now().isoformat(), summary, outcome, json.dumps(metadata or {})))
conn.commit()
conn.close()
def chat(self, user_id: str, user_input: str) -> str:
# Recall relevant past episodes
memory = self.recall(user_id, user_input)
# Add recall context to system
augmented_system = f"{self.system}\n\n{memory}"
self.messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=augmented_system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Usage
agent = EpisodicAgent(
system_prompt="You are a grant research assistant. You have access to past research sessions."
)
user_id = "user_42"
result = agent.chat(user_id, "What grants are available for climate tech?")
print(result)
# At end of session, save what happened
agent.store_episode(
user_id=user_id,
summary="Researched climate tech grants",
outcome="Found 5 relevant opportunities in renewable energy",
metadata={"category": "climate", "results": 5}
)
When to use: Production agents, multi-day workflows, user-facing apps, anything that needs to learn from past interactions. SQLite = 1KB footprint, no external services.
Cost: ~$0. Embedding + retrieval happens on your own infrastructure (or pay Claude for embeddings if you want semantic retrieval).
Pattern 3: Semantic Memory (Smart Retrieval)
Embed past conversations and retrieve by meaning, not keyword matching. This solves the "I know I learned this but can't find it" problem.
import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic
client = Anthropic()
class SemanticAgent:
def __init__(self, system_prompt: str, db_path: str = "semantic_memory.db"):
self.system = system_prompt
self.db_path = db_path
self.messages = []
self._init_db()
def _init_db(self):
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
# Note: In production, use sqlite-vec or pgvector for native vector storage
c.execute('''
CREATE TABLE IF NOT EXISTS memories (
id INTEGER PRIMARY KEY,
user_id TEXT,
content TEXT,
embedding TEXT,
timestamp TEXT
)
''')
conn.commit()
conn.close()
def embed(self, text: str) -> list:
"""Get embedding via Claude API."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system="Extract the main concepts and themes from this text, returning them as a simple list.",
messages=[{"role": "user", "content": f"Text: {text}\n\nMain concepts:"}]
)
# In real production, use model embedding APIs (Voyage, Cohere, or Anthropic embeddings when available)
return response.content[0].text.split()
def recall(self, user_id: str, query: str, limit: int = 3) -> str:
"""Retrieve semantically similar memories."""
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
SELECT content FROM memories
WHERE user_id = ?
ORDER BY timestamp DESC
LIMIT ?
''', (user_id, limit))
memories = c.fetchall()
conn.close()
if not memories:
return "No relevant memories found."
recall_text = "Relevant memories:\n"
for (content,) in memories:
recall_text += f"- {content}\n"
return recall_text
def store_memory(self, user_id: str, content: str):
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
embedding = json.dumps(self.embed(content))
c.execute('''
INSERT INTO memories (user_id, content, embedding, timestamp)
VALUES (?, ?, ?, ?)
''', (user_id, content, embedding, datetime.now().isoformat()))
conn.commit()
conn.close()
def chat(self, user_id: str, user_input: str) -> str:
memory = self.recall(user_id, user_input)
augmented_system = f"{self.system}\n\n{memory}"
self.messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=augmented_system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
# Store this exchange
self.store_memory(user_id, f"User: {user_input}\nAssistant: {assistant_message}")
return assistant_message
When to use: When you need "fuzzy" memory retrieval. Agent should remember similar past problems, not just exact keyword matches.
Scaling: Replace manual embeddings with sqlite-vec (1KB SQLite extension) or move to PostgreSQL + pgvector for millions of memories.
Pattern 4: Hybrid Retrieval (Production-Grade)
Combine keyword search (fast, exact) with semantic search (slow, fuzzy). Get the best of both worlds.
import sqlite3
import json
from datetime import datetime
from anthropic import Anthropic
client = Anthropic()
class HybridAgent:
def __init__(self, system_prompt: str, db_path: str = "hybrid_memory.db"):
self.system = system_prompt
self.db_path = db_path
self.messages = []
self._init_db()
def _init_db(self):
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
CREATE TABLE IF NOT EXISTS memories (
id INTEGER PRIMARY KEY,
user_id TEXT,
content TEXT,
summary TEXT,
keywords TEXT,
timestamp TEXT
)
''')
# Add full-text search index
c.execute('''
CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts USING fts5(
content, keywords
)
''')
conn.commit()
conn.close()
def extract_keywords(self, text: str) -> str:
"""Use Claude to extract keywords from memory."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=100,
messages=[{
"role": "user",
"content": f"Extract 5 key words from this: {text}. Return comma-separated."
}]
)
return response.content[0].text
def recall(self, user_id: str, query: str, limit: int = 3) -> str:
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
# Keyword search (fast)
c.execute('''
SELECT DISTINCT m.content
FROM memories m
WHERE m.user_id = ? AND (
m.keywords LIKE ? OR m.content LIKE ?
)
ORDER BY m.timestamp DESC
LIMIT ?
''', (user_id, f"%{query}%", f"%{query}%", limit))
results = c.fetchall()
conn.close()
if not results:
return "No relevant memories."
recall_text = "Relevant memories:\n"
for (content,) in results:
recall_text += f"- {content}\n"
return recall_text
def store_memory(self, user_id: str, content: str):
keywords = self.extract_keywords(content)
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
INSERT INTO memories (user_id, content, keywords, timestamp)
VALUES (?, ?, ?, ?)
''', (user_id, content, keywords, datetime.now().isoformat()))
conn.commit()
conn.close()
def chat(self, user_id: str, user_input: str) -> str:
memory = self.recall(user_id, user_input)
augmented_system = f"{self.system}\n\n{memory}"
self.messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=augmented_system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
self.store_memory(user_id, f"Q: {user_input}\nA: {assistant_message}")
return assistant_message
Cost & Speed: Keyword search ~5ms. Claude keyword extraction ~$0.001 per memory. Trade compute for accuracy.
Pattern 5: Preference Learning (Advanced)
Track what the agent learns about user preferences, then use that to guide future decisions.
import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic
client = Anthropic()
class PreferenceAgent:
def __init__(self, system_prompt: str, db_path: str = "preferences.db"):
self.system = system_prompt
self.db_path = db_path
self.messages = []
self._init_db()
def _init_db(self):
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
CREATE TABLE IF NOT EXISTS preferences (
id INTEGER PRIMARY KEY,
user_id TEXT,
preference_key TEXT,
value TEXT,
confidence REAL,
timestamp TEXT
)
''')
conn.commit()
conn.close()
def extract_preferences(self, user_id: str, conversation: str) -> dict:
"""Use Claude to extract preferences from conversation."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=500,
messages=[{
"role": "user",
"content": f"""From this conversation, extract user preferences as JSON.
Return format: {"preference_key": "value", "confidence": 0.0-1.0}
Conversation:
{conversation}
Preferences JSON:"""
}]
)
try:
return json.loads(response.content[0].text)
except:
return {}
def get_user_preferences(self, user_id: str) -> dict:
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
c.execute('''
SELECT preference_key, value, confidence
FROM preferences
WHERE user_id = ?
ORDER BY timestamp DESC
''', (user_id,))
prefs = {}
for key, value, conf in c.fetchall():
if conf > 0.6: # Only high-confidence prefs
prefs[key] = value
conn.close()
return prefs
def chat(self, user_id: str, user_input: str) -> str:
prefs = self.get_user_preferences(user_id)
pref_text = ""
if prefs:
pref_text = f"\n\nUser preferences to keep in mind: {json.dumps(prefs)}"
augmented_system = self.system + pref_text
self.messages.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
system=augmented_system,
messages=self.messages
)
assistant_message = response.content[0].text
self.messages.append({"role": "assistant", "content": assistant_message})
# Extract and store preferences from this exchange
conversation = f"User: {user_input}\nAssistant: {assistant_message}"
prefs_found = self.extract_preferences(user_id, conversation)
if prefs_found:
conn = sqlite3.connect(self.db_path)
c = conn.cursor()
for key, value in prefs_found.items():
conf = value.get("confidence", 0.5) if isinstance(value, dict) else 0.5
actual_value = value.get("value", str(value)) if isinstance(value, dict) else str(value)
c.execute('''
INSERT INTO preferences (user_id, preference_key, value, confidence, timestamp)
VALUES (?, ?, ?, ?, ?)
''', (user_id, key, actual_value, conf, datetime.now().isoformat()))
conn.commit()
conn.close()
return assistant_message
When to use: Long-term user agents. Personalization engines. Recommendation systems. The agent gets smarter the longer it knows you.
Comparison & Picking Your Pattern
| Pattern | Storage | Retrieval Speed | Recall Quality | Cost | Best For |
|---|---|---|---|---|---|
| Session Context | RAM | Instant | Perfect (full history) | $0 | Debugging, short sessions (<2h) |
| Episodic | SQLite | 5ms | Good (summaries) | ~$0 | Production multi-day workflows |
| Semantic | SQLite+vec | 50ms | Excellent (fuzzy match) | $0.001/memory | When keyword search isn't enough |
| Hybrid | SQLite+FTS | 10ms | Excellent (best of both) | $0.001/memory | Production agents at scale |
| Preferences | SQLite | 5ms | Improving over time | $0.001/preference | Personalization, long-term agents |
Production Deployment Checklist
Before you ship:
- Retention policy: How long do you keep memories? Delete old ones regularly to avoid database bloat.
- Privacy: Memories contain user data. Encrypt at rest. Set data retention limits in your privacy policy.
- Conflict resolution: What if preferences contradict? Use confidence scores to pick winners.
- Scaling: Start with SQLite. When you hit >100K memories per user, move to PostgreSQL + pgvector.
- Testing: Test with stale memories (1 month old) to ensure your agent doesn't hallucinate from outdated info.
- Monitoring: Log which memories get recalled. Low recall = schema is wrong. High false positives = tweak retrieval thresholds.
The Next Step: Agentic Memory Evolution
These five patterns work today. But the future is agents that consolidate memories ("I've learned these 50 grant eligibility rules can collapse into 3 principles"), reason about their own knowledge ("This memory is stale"), and even teach other agents.
We're already experimenting with multi-agent systems where one agent teaches another using these memory patterns. More on that in the next post.
For now: pick a pattern, implement it, measure what your agent recalls vs. forgets, and iterate. Memory is the leverage point that turns agents from impressive demos into systems that compound over time.