Give Your AI Coding Agent Persistent Memory with cavemem

April 21, 2026

Every time you start a new Claude Code or Cursor session, your agent starts from zero. It has no idea what you fixed yesterday, which architectural decisions you made last week, or why you abandoned that refactoring branch. You end up re-explaining context on every session, and the agent repeats the same mistakes it already made.

This tutorial walks through installing and configuring cavemem (v0.1.3): a local, cross-agent persistent memory layer built on SQLite, MCP, and a deterministic compression grammar. By the end your agent will automatically record what happened in each session and query that history on demand, with no cloud services involved.

Requirements: Node.js 18+, one of: Claude Code, Cursor, Gemini CLI, OpenCode, or Codex.

The Problem: Agents Are Stateless by Design

LLM context windows are ephemeral. When a session ends, everything in it disappears. For short tasks this is fine, but for ongoing projects it creates real friction: latency from re-explanation, duplicate debugging, and agents that confidently repeat approaches you already tried and rejected.

The typical workaround is manually maintaining a CONTEXT.md or AGENTS.md file and pasting relevant sections into each session. That is brittle, easy to forget, and does not scale as a project grows.

The Common Pitfall: Giant Context Files

Most developers reach for a catch-all notes file. The problem is that agents are not great at filtering signal from noise inside a large unstructured document, and you still have to remember to update it manually after every session. The file drifts out of date and becomes less useful over time, not more.

What you actually want is a system that captures observations automatically at session boundaries, compresses them efficiently for storage, and lets the agent retrieve only the relevant slice on demand. That is exactly what cavemem does.

The Solution: cavemem

cavemem hooks into your IDE's session lifecycle events. When a session ends, a hook fires, compresses the agent's output using the caveman grammar (roughly 75% fewer prose tokens, with code, paths, and identifiers preserved byte-for-byte), and writes the result to a local SQLite database with FTS5 full-text search and an optional vector index. Three MCP tools then expose that store back to the agent at query time.

session end  →  redact <private>  →  compress  →  SQLite + FTS5
                                                         ↑
                                              MCP queries on demand

The compression is deterministic and round-trip-safe: humans can always read the stored text in expanded form via the web viewer. Hook handlers complete in under 150ms. Nothing leaves your machine by default.

Step 1: Install cavemem

Install the package globally with npm (or your preferred package manager):

npm install -g cavemem

Verify the install:

cavemem --version

Then run the installer for your IDE. The installer registers the session hooks and wires the MCP server config in one command:

# Claude Code
cavemem install

# Cursor
cavemem install --ide cursor

# OpenCode
cavemem install --ide opencode

# Gemini CLI
cavemem install --ide gemini-cli

# Codex
cavemem install --ide codex

No daemon to start manually. The background worker that builds vector embeddings auto-spawns on the first hook write and self-exits when idle.

Step 2: Verify the Wiring

Run status to confirm hooks are registered, the database exists, and embedding backfill (if any) is progressing:

cavemem status

You should see output similar to:

IDE:        claude-code
Hooks:      ✓ registered
DB:         ~/.cavemem/memory.db  (0 observations)
Embeddings: local  ·  worker idle

If hooks are not registered, re-run cavemem install and check that the IDE config file was written correctly. Use cavemem doctor for a detailed diagnostic:

cavemem doctor

Step 3: Run Your First Session

Start a normal coding session in your IDE. Work on something concrete: fix a bug, refactor a module, make an architectural decision. When the session ends, the hook fires automatically.

Confirm the observation was stored:

cavemem search "refactor"

Or open the web viewer to browse all sessions in human-readable form:

cavemem viewer
# Opens http://127.0.0.1:37777

The viewer expands compressed observations back to natural language so you can audit exactly what was recorded.

Step 4: How the MCP Tools Work

cavemem exposes four MCP tools that your agent can call during a session. These are registered automatically when you ran cavemem install.

| Tool | What it returns | | ------------------------------------------ | ------------------------------------------------------------------------- | | search(query, limit?) | [{id, score, snippet, session_id, ts}] - BM25 + optional cosine re-rank | | timeline(session_id, around_id?, limit?) | [{id, kind, ts}] - ordered events for a session | | get_observations(ids[], expand?) | Full observation bodies, expanded to natural language by default | | list_sessions(limit?) | [{id, ide, cwd, started_at, ended_at}] |

The retrieval is progressive: search and timeline return compact results cheaply, and get_observations fetches full bodies only for the IDs you actually need. This keeps token usage low even as your memory store grows.

In practice you can instruct your agent to call search at the start of a session on the current project directory or task description. It will surface relevant past observations without you having to remember to copy-paste anything.

Step 5: Configure Privacy and Compression

cavemem's settings live at ~/.cavemem/settings.json. View the current config:

cavemem config show

Redacting sensitive content is built into the write path. Wrap any text you never want stored in <private> tags inside your prompts or agent instructions:

<private>My API key is sk-...</private>

The content between those tags is stripped before the observation ever reaches the database.

Excluding directories prevents cavemem from capturing observations about files in sensitive paths:

cavemem config set privacy.excludePatterns '["**/secrets/**", "**/.env*"]'

Compression intensity has three levels:

cavemem config set compression.intensity lite    # light compression
cavemem config set compression.intensity full    # default, ~75% token reduction
cavemem config set compression.intensity ultra   # maximum compression

full is the right default for most projects. Use lite if you find the stored observations too terse for your taste.

Step 6: Enable Semantic Search (Optional)

By default, cavemem uses SQLite FTS5 keyword search (BM25). For better recall on paraphrased queries, you can enable a local vector index backed by a small embedding model:

# Check embedding status
cavemem status

# Reindex with vectors after enabling
cavemem reindex

For OpenAI embeddings (useful if you want higher-quality vectors):

cavemem config set embedding.provider openai
# Set OPENAI_API_KEY in your environment

The search.alpha setting controls the blend between BM25 and cosine similarity scores, defaulting to 0.5. Tuning it toward 1.0 weights vector results more heavily:

cavemem config set search.alpha 0.7

Why It Works: The Architecture

The core design decision is storing observations as compressed text in SQLite rather than raw embeddings in a vector database. This gives you several properties that matter for a long-lived development tool:

Durability. SQLite is a single file. It survives IDE reinstalls, machine migrations, and version upgrades with zero effort. There is no service to keep running.

Auditability. You can always read what is stored. The web viewer at port 37777 expands every observation. You are never locked into an opaque embedding store.

Hybrid retrieval. FTS5 BM25 search handles exact term matches (error messages, function names, file paths) with excellent latency. The optional vector index handles semantic similarity for cases where you remember the concept but not the exact words. Combining them via a tunable alpha gives you the best of both.

Token efficiency. The caveman grammar operates on prose only. Code blocks, URLs, file paths, version numbers, and identifiers are passed through byte-for-byte. The compression is lossy for filler words and verbose prose, but lossless for everything a developer actually searches for. A 150-token observation becomes roughly 38 tokens at rest, so a large memory store does not explode your retrieval costs.

The search → timeline → get_observations progression mirrors how you would use a search engine: get ranked results, understand temporal context, then fetch full content only for the hits worth reading. This keeps the agent's context window usage proportional to relevance, not to the total size of your history.

Exporting and Maintaining the Store

Export all observations to JSONL for backup or offline analysis:

cavemem export ~/cavemem-backup.jsonl

If you upgrade cavemem or change embedding providers, rebuild the index:

cavemem reindex

To remove cavemem from an IDE without deleting the database:

cavemem uninstall --ide cursor

The SQLite file at ~/.cavemem/memory.db is never touched by uninstall. Your history persists.

▓▒░█▓▒░█▓▒░█▓▒