How much does persistent memory cost?

SQLite storage is free. If you use Claude to embed or extract preferences, expect ~$0.001-0.005 per memory operation. For a high-volume agent processing 1000 interactions/day, budget $1-5/month for embeddings alone.

Can I use this with the Agent SDK or just the API?

All patterns work with both. The Agent SDK's tool_use makes it easy to query your memory as a tool. Example: agent calls `retrieve_memory(query)` as a built-in action. The API examples above adapt directly to Agent SDK by adding memory retrieval as a tool.

What if my memories get too large?

Implement a retention policy: delete memories older than N days, or keep only the top K by relevance. In production, we delete memories >90 days old and keep high-confidence preferences indefinitely. Test with old memories to ensure your agent doesn't hallucinate from stale data.

How do I know which pattern to start with?

Start with Episodic (Pattern 2). It's simple, requires no external services, and covers 80% of production use cases. Add Semantic retrieval (Pattern 3) only if keyword search isn't finding what you need. Layer Preferences (Pattern 5) once you have multi-week user data.

Can multiple agents share the same memory database?

Yes. Store memories with `agent_id` in addition to `user_id`. One user might have separate agents for research, writing, and task management — each agent recalls memories relevant to its domain. They can also cross-reference when needed (e.g., writing agent asks research agent for past findings).

Persistent Memory for Claude Agents: 5 Patterns That Work

Build stateful agents that remember across sessions. Memory patterns: sessions, episodic storage, semantic embeddings, hybrid retrieval, and preference learning.

The Claude Agent SDK makes it easy to build autonomous agents that reason, plan, and execute. But without persistent memory, every conversation starts from zero. Your agent forgets what it learned, who it talked to, and what worked last time. It becomes reactive instead of truly autonomous.

We built a live production agent system for Like One that handles grant applications, funding research, and donor communication. It needs to remember: grant eligibility rules, past donor preferences, proposal templates that worked, and what failed. That's not just better UX — it's the difference between a toy chatbot and a system that gets smarter over time.

This guide shows five memory patterns we use in production. Pick the one that fits your agent's job, or layer multiple patterns. All examples run on the Claude API with no external services required (though we show how to scale each one).

Pattern 1: Session Context (Simplest)

Store the entire conversation history in your agent's context window. This works for short sessions (under ~50K tokens) and requires zero infrastructure.

from anthropic import Anthropic

client = Anthropic()

class SessionAgent:
    def __init__(self, system_prompt: str):
        self.system = system_prompt
        self.messages = []
    
    def chat(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system=self.system,
            messages=self.messages
        )
        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

agent = SessionAgent(
    system_prompt="You are a grant research assistant. Help find funding opportunities."
)
print(agent.chat("What grants exist for AI nonprofits?"))
print(agent.chat("Show me the ones that fund open source."))

When to use: Debugging, short workflows, internal tools, anything under 2-hour session length. No infrastructure cost.

When NOT to use: Multi-day workflows, user-facing products, anything requiring conversation history after restart.

Pattern 2: Episodic Memory (Practical)

Store summaries of past conversations in a database. Before each new session, retrieve relevant past interactions. This is what we use in production.

import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic

client = Anthropic()

class EpisodicAgent:
    def __init__(self, system_prompt: str, db_path: str = "agent_memory.db"):
        self.system = system_prompt
        self.db_path = db_path
        self.messages = []
        self._init_db()
    
    def _init_db(self):
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            CREATE TABLE IF NOT EXISTS episodes (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                timestamp TEXT,
                summary TEXT,
                outcome TEXT,
                metadata TEXT
            )
        ''')
        conn.commit()
        conn.close()
    
    def recall(self, user_id: str, query: str, limit: int = 3) -> str:
        """Retrieve relevant past episodes."""
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            SELECT summary, outcome FROM episodes 
            WHERE user_id = ? 
            ORDER BY timestamp DESC 
            LIMIT ?
        ''', (user_id, limit))
        episodes = c.fetchall()
        conn.close()
        
        if not episodes:
            return "No prior interactions found."
        
        recall_text = "Past interactions:\n"
        for summary, outcome in episodes:
            recall_text += f"- {summary}\n  Outcome: {outcome}\n"
        return recall_text
    
    def store_episode(self, user_id: str, summary: str, outcome: str, metadata: dict = None):
        """Save this session's summary to memory."""
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            INSERT INTO episodes (user_id, timestamp, summary, outcome, metadata)
            VALUES (?, ?, ?, ?, ?)
        ''', (user_id, datetime.now().isoformat(), summary, outcome, json.dumps(metadata or {})))
        conn.commit()
        conn.close()
    
    def chat(self, user_id: str, user_input: str) -> str:
        # Recall relevant past episodes
        memory = self.recall(user_id, user_input)
        
        # Add recall context to system
        augmented_system = f"{self.system}\n\n{memory}"
        
        self.messages.append({"role": "user", "content": user_input})
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system=augmented_system,
            messages=self.messages
        )
        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        return assistant_message

# Usage
agent = EpisodicAgent(
    system_prompt="You are a grant research assistant. You have access to past research sessions."
)

user_id = "user_42"
result = agent.chat(user_id, "What grants are available for climate tech?")
print(result)

# At end of session, save what happened
agent.store_episode(
    user_id=user_id,
    summary="Researched climate tech grants",
    outcome="Found 5 relevant opportunities in renewable energy",
    metadata={"category": "climate", "results": 5}
)

When to use: Production agents, multi-day workflows, user-facing apps, anything that needs to learn from past interactions. SQLite = 1KB footprint, no external services.

Cost: ~$0. Embedding + retrieval happens on your own infrastructure (or pay Claude for embeddings if you want semantic retrieval).

Pattern 3: Semantic Memory (Smart Retrieval)

Embed past conversations and retrieve by meaning, not keyword matching. This solves the "I know I learned this but can't find it" problem.

import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic

client = Anthropic()

class SemanticAgent:
    def __init__(self, system_prompt: str, db_path: str = "semantic_memory.db"):
        self.system = system_prompt
        self.db_path = db_path
        self.messages = []
        self._init_db()
    
    def _init_db(self):
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        # Note: In production, use sqlite-vec or pgvector for native vector storage
        c.execute('''
            CREATE TABLE IF NOT EXISTS memories (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                content TEXT,
                embedding TEXT,
                timestamp TEXT
            )
        ''')
        conn.commit()
        conn.close()
    
    def embed(self, text: str) -> list:
        """Get embedding via Claude API."""
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system="Extract the main concepts and themes from this text, returning them as a simple list.",
            messages=[{"role": "user", "content": f"Text: {text}\n\nMain concepts:"}]
        )
        # In real production, use model embedding APIs (Voyage, Cohere, or Anthropic embeddings when available)
        return response.content[0].text.split()
    
    def recall(self, user_id: str, query: str, limit: int = 3) -> str:
        """Retrieve semantically similar memories."""
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            SELECT content FROM memories 
            WHERE user_id = ? 
            ORDER BY timestamp DESC 
            LIMIT ?
        ''', (user_id, limit))
        memories = c.fetchall()
        conn.close()
        
        if not memories:
            return "No relevant memories found."
        
        recall_text = "Relevant memories:\n"
        for (content,) in memories:
            recall_text += f"- {content}\n"
        return recall_text
    
    def store_memory(self, user_id: str, content: str):
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        embedding = json.dumps(self.embed(content))
        c.execute('''
            INSERT INTO memories (user_id, content, embedding, timestamp)
            VALUES (?, ?, ?, ?)
        ''', (user_id, content, embedding, datetime.now().isoformat()))
        conn.commit()
        conn.close()
    
    def chat(self, user_id: str, user_input: str) -> str:
        memory = self.recall(user_id, user_input)
        augmented_system = f"{self.system}\n\n{memory}"
        
        self.messages.append({"role": "user", "content": user_input})
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system=augmented_system,
            messages=self.messages
        )
        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        # Store this exchange
        self.store_memory(user_id, f"User: {user_input}\nAssistant: {assistant_message}")
        return assistant_message

When to use: When you need "fuzzy" memory retrieval. Agent should remember similar past problems, not just exact keyword matches.

Scaling: Replace manual embeddings with sqlite-vec (1KB SQLite extension) or move to PostgreSQL + pgvector for millions of memories.

Pattern 4: Hybrid Retrieval (Production-Grade)

Combine keyword search (fast, exact) with semantic search (slow, fuzzy). Get the best of both worlds.

import sqlite3
import json
from datetime import datetime
from anthropic import Anthropic

client = Anthropic()

class HybridAgent:
    def __init__(self, system_prompt: str, db_path: str = "hybrid_memory.db"):
        self.system = system_prompt
        self.db_path = db_path
        self.messages = []
        self._init_db()
    
    def _init_db(self):
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            CREATE TABLE IF NOT EXISTS memories (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                content TEXT,
                summary TEXT,
                keywords TEXT,
                timestamp TEXT
            )
        ''')
        # Add full-text search index
        c.execute('''
            CREATE VIRTUAL TABLE IF NOT EXISTS memories_fts USING fts5(
                content, keywords
            )
        ''')
        conn.commit()
        conn.close()
    
    def extract_keywords(self, text: str) -> str:
        """Use Claude to extract keywords from memory."""
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=100,
            messages=[{
                "role": "user",
                "content": f"Extract 5 key words from this: {text}. Return comma-separated."
            }]
        )
        return response.content[0].text
    
    def recall(self, user_id: str, query: str, limit: int = 3) -> str:
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        
        # Keyword search (fast)
        c.execute('''
            SELECT DISTINCT m.content 
            FROM memories m
            WHERE m.user_id = ? AND (
                m.keywords LIKE ? OR m.content LIKE ?
            )
            ORDER BY m.timestamp DESC
            LIMIT ?
        ''', (user_id, f"%{query}%", f"%{query}%", limit))
        
        results = c.fetchall()
        conn.close()
        
        if not results:
            return "No relevant memories."
        
        recall_text = "Relevant memories:\n"
        for (content,) in results:
            recall_text += f"- {content}\n"
        return recall_text
    
    def store_memory(self, user_id: str, content: str):
        keywords = self.extract_keywords(content)
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            INSERT INTO memories (user_id, content, keywords, timestamp)
            VALUES (?, ?, ?, ?)
        ''', (user_id, content, keywords, datetime.now().isoformat()))
        conn.commit()
        conn.close()
    
    def chat(self, user_id: str, user_input: str) -> str:
        memory = self.recall(user_id, user_input)
        augmented_system = f"{self.system}\n\n{memory}"
        
        self.messages.append({"role": "user", "content": user_input})
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system=augmented_system,
            messages=self.messages
        )
        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        self.store_memory(user_id, f"Q: {user_input}\nA: {assistant_message}")
        return assistant_message

Cost & Speed: Keyword search ~5ms. Claude keyword extraction ~$0.001 per memory. Trade compute for accuracy.

Pattern 5: Preference Learning (Advanced)

Track what the agent learns about user preferences, then use that to guide future decisions.

import json
import sqlite3
from datetime import datetime
from anthropic import Anthropic

client = Anthropic()

class PreferenceAgent:
    def __init__(self, system_prompt: str, db_path: str = "preferences.db"):
        self.system = system_prompt
        self.db_path = db_path
        self.messages = []
        self._init_db()
    
    def _init_db(self):
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            CREATE TABLE IF NOT EXISTS preferences (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                preference_key TEXT,
                value TEXT,
                confidence REAL,
                timestamp TEXT
            )
        ''')
        conn.commit()
        conn.close()
    
    def extract_preferences(self, user_id: str, conversation: str) -> dict:
        """Use Claude to extract preferences from conversation."""
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"""From this conversation, extract user preferences as JSON.
                Return format: {"preference_key": "value", "confidence": 0.0-1.0}
                
                Conversation:
                {conversation}
                
                Preferences JSON:"""
            }]
        )
        try:
            return json.loads(response.content[0].text)
        except:
            return {}
    
    def get_user_preferences(self, user_id: str) -> dict:
        conn = sqlite3.connect(self.db_path)
        c = conn.cursor()
        c.execute('''
            SELECT preference_key, value, confidence 
            FROM preferences 
            WHERE user_id = ? 
            ORDER BY timestamp DESC
        ''', (user_id,))
        prefs = {}
        for key, value, conf in c.fetchall():
            if conf > 0.6:  # Only high-confidence prefs
                prefs[key] = value
        conn.close()
        return prefs
    
    def chat(self, user_id: str, user_input: str) -> str:
        prefs = self.get_user_preferences(user_id)
        
        pref_text = ""
        if prefs:
            pref_text = f"\n\nUser preferences to keep in mind: {json.dumps(prefs)}"
        
        augmented_system = self.system + pref_text
        
        self.messages.append({"role": "user", "content": user_input})
        response = client.messages.create(
            model="claude-opus-4-6",
            max_tokens=1024,
            system=augmented_system,
            messages=self.messages
        )
        assistant_message = response.content[0].text
        self.messages.append({"role": "assistant", "content": assistant_message})
        
        # Extract and store preferences from this exchange
        conversation = f"User: {user_input}\nAssistant: {assistant_message}"
        prefs_found = self.extract_preferences(user_id, conversation)
        
        if prefs_found:
            conn = sqlite3.connect(self.db_path)
            c = conn.cursor()
            for key, value in prefs_found.items():
                conf = value.get("confidence", 0.5) if isinstance(value, dict) else 0.5
                actual_value = value.get("value", str(value)) if isinstance(value, dict) else str(value)
                c.execute('''
                    INSERT INTO preferences (user_id, preference_key, value, confidence, timestamp)
                    VALUES (?, ?, ?, ?, ?)
                ''', (user_id, key, actual_value, conf, datetime.now().isoformat()))
            conn.commit()
            conn.close()
        
        return assistant_message

When to use: Long-term user agents. Personalization engines. Recommendation systems. The agent gets smarter the longer it knows you.

Comparison & Picking Your Pattern

Pattern	Storage	Retrieval Speed	Recall Quality	Cost	Best For
Session Context	RAM	Instant	Perfect (full history)	$0	Debugging, short sessions (<2h)
Episodic	SQLite	5ms	Good (summaries)	~$0	Production multi-day workflows
Semantic	SQLite+vec	50ms	Excellent (fuzzy match)	$0.001/memory	When keyword search isn't enough
Hybrid	SQLite+FTS	10ms	Excellent (best of both)	$0.001/memory	Production agents at scale
Preferences	SQLite	5ms	Improving over time	$0.001/preference	Personalization, long-term agents

Production Deployment Checklist

Before you ship:

Retention policy: How long do you keep memories? Delete old ones regularly to avoid database bloat.
Privacy: Memories contain user data. Encrypt at rest. Set data retention limits in your privacy policy.
Conflict resolution: What if preferences contradict? Use confidence scores to pick winners.
Scaling: Start with SQLite. When you hit >100K memories per user, move to PostgreSQL + pgvector.
Testing: Test with stale memories (1 month old) to ensure your agent doesn't hallucinate from outdated info.
Monitoring: Log which memories get recalled. Low recall = schema is wrong. High false positives = tweak retrieval thresholds.

The Next Step: Agentic Memory Evolution

These five patterns work today. But the future is agents that consolidate memories ("I've learned these 50 grant eligibility rules can collapse into 3 principles"), reason about their own knowledge ("This memory is stale"), and even teach other agents.

We're already experimenting with multi-agent systems where one agent teaches another using these memory patterns. More on that in the next post.

For now: pick a pattern, implement it, measure what your agent recalls vs. forgets, and iterate. Memory is the leverage point that turns agents from impressive demos into systems that compound over time.