Memory Hierarchy

Five tiers of persistent memory – from MEMORY.md project files to server-backed team sync – and the cache-hierarchy analogy that explains the design

memory
context-engineering
persistence
team-sync

Introduction: Beyond the Context Window

Posts 03 and 04 dissected how Claude Code manages the ephemeral state inside a single conversation: prompt assembly fills the context window, compaction evicts stale messages when the window overflows. But ephemeral state has a fundamental limitation – it dies with the session. Close the terminal, and the agent forgets everything: who the user is, what approaches failed, which external systems matter, and what conventions the team agreed on last week.

Production agents need persistent memory – state that survives across sessions, accumulates knowledge over time, and, ideally, shares that knowledge across a team. Claude Code implements this through a five-tier memory hierarchy spread across 1,736 lines of code in src/memdir/ (8 files) and four dedicated service modules. The hierarchy mirrors, with striking fidelity, the CPU cache hierarchy that every computer architecture course teaches: each tier trades scope for persistence, and latency for sharing.

This post maps the full hierarchy from the fastest, most local tier (project memory files on disk) to the slowest, most broadly shared tier (server-backed team memory), examining the data structures, safety constraints, and scheduling gates at each level.

Source files covered in this post:

File Purpose Size
src/memdir/memdir.ts Memory constraints, caps, and truncation ~300 LOC
src/memdir/memoryTypes.ts Four-type taxonomy (user, feedback, project, reference) ~100 LOC
src/memdir/memoryScan.ts Memory file discovery and scanning ~200 LOC
src/memdir/paths.ts Memory directory path resolution ~100 LOC
src/memdir/findRelevantMemories.ts Relevance-based memory retrieval ~200 LOC
src/services/extractMemories/extractMemories.ts End-of-turn memory extraction agent ~300 LOC
src/services/extractMemories/prompts.ts Extraction prompt templates (10 sections) ~200 LOC
src/services/autoDream/autoDream.ts Cross-session memory consolidation ~300 LOC
src/services/autoDream/consolidationPrompt.ts Consolidation phase instructions ~150 LOC
src/services/autoDream/consolidationLock.ts Consolidation lock management ~100 LOC
src/services/teamMemorySync/index.ts Team memory sync protocol (ETags, checksums) ~400 LOC
src/services/MagicDocs/ Automatic documentation maintenance (internal-only) 2 files
src/services/teamMemorySync/secretScanner.ts Secret scanning before team upload ~200 LOC

The Five-Tier Hierarchy at a Glance

The memory system comprises five distinct tiers, each with a different trigger, scope, writer mechanism, and persistence model. The analogy to hardware caches is intentional and illuminating:

Tier Name Analogy Scope Writer Trigger
T1 Project Memory (MEMORY.md) L1 cache Per-project Main agent Explicit user request
T2 Session Memory L2 cache Per-session Background subagent Token/tool-call thresholds
T3 End-of-Turn Extraction Write-back buffer Per-project (durable) Restricted forked agent End of query loop
T4 Dream Consolidation Nightly batch archival Per-project (archival) Restricted forked agent Time + session-count gates
T5 Team Memory Sync Shared L3 / distributed cache Per-repo (team-wide) Server-backed sync File-watcher + pull/push
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
  T1["<b>T1: Project Memory</b> (MEMORY.md)<br><i>Per-project -- Explicit save -- 200 lines / 25KB cap</i>"]
  T2["<b>T2: Session Memory</b><br><i>Per-session -- Background extraction -- 12K token cap</i>"]
  T3["<b>T3: End-of-Turn Extraction</b><br><i>Per-project durable -- Restricted subagent -- Max 5 turns</i>"]
  T4["<b>T4: Dream Consolidation</b><br><i>Per-project archival -- 24h + 5 sessions gate -- Lock-protected</i>"]
  T5["<b>T5: Team Memory Sync</b><br><i>Per-repo team-wide -- Server-backed -- Delta upload -- Secret scanning</i>"]

  T1 --> T2 --> T3 --> T4 --> T5

  style T1 fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style T2 fill:#9CAF88,color:#fff,stroke:#7A8D68
  style T3 fill:#C2856E,color:#fff,stroke:#A06A54
  style T4 fill:#B39EB5,color:#fff,stroke:#8E7A93
  style T5 fill:#C4A882,color:#fff,stroke:#A08562
Figure 1: The five-tier memory hierarchy arranged top-to-bottom from fastest/most-local to slowest/most-shared. T1: Project Memory (MEMORY.md, per-project, explicit save, 200-line/25KB cap). T2: Session Memory (per-session, background extraction, 12K token cap). T3: End-of-Turn Extraction (per-project durable, restricted subagent, max 5 turns). T4: Dream Consolidation (per-project archival, 24h + 5 sessions gate, lock-protected). T5: Team Memory Sync (per-repo team-wide, server-backed, delta upload, secret scanning – highlighted in terracotta). Arrows indicate primary data flow: knowledge migrates outward from T1 to T5 as it is consolidated and shared.

How to read this diagram. Read top to bottom, from the fastest and most local tier to the slowest and most shared. T1 (Project Memory) at the top is loaded into every API call; T5 (Team Memory Sync) at the bottom is server-backed and shared across an entire team. The downward arrows represent the primary data flow: knowledge migrates outward from local project memory through session memory, end-of-turn extraction, and dream consolidation, ultimately reaching team-wide sync. Each tier’s label includes its scope, writer mechanism, and size constraints.


Tier 1: Project Memory – The L1 Cache

The fastest, most local tier is the MEMORY.md file and its associated topic files, stored in the auto-memory directory at ~/.claude/projects/<sanitized-git-root>/memory/. This tier is the agent’s equivalent of an L1 cache: small, fast to read, always loaded into context.

Structure and Constraints

MEMORY.md is not a freeform notebook. It is an index – a line-per-entry manifest that points to topic files:

- [User role](user_role.md) -- senior backend engineer, Go specialist
- [Testing policy](feedback_testing.md) -- always use real DB, never mock
- [Auth rewrite](project_auth.md) -- driven by compliance, not tech debt

The index is subject to hard caps defined in memdir.ts:

export const MAX_ENTRYPOINT_LINES = 200
export const MAX_ENTRYPOINT_BYTES = 25_000  // ~25KB

When either cap is exceeded, the truncateEntrypointContent function truncates at the cap boundary and appends a warning instructing the agent to keep entries concise. The byte cap targets a specific failure mode: long-line indexes that slip past the line cap (observed at p100: 197KB under 200 lines).

Four-Type Taxonomy

Memories are constrained to a closed four-type taxonomy defined in memoryTypes.ts:

  1. user – the user’s role, goals, preferences, expertise level
  2. feedback – guidance on approach, both corrections and confirmations
  3. project – ongoing work, deadlines, decisions not derivable from code
  4. reference – pointers to external systems (Linear projects, Grafana dashboards)

The taxonomy is deliberately exclusionary. Information that is derivable from the current project state – code patterns, architecture, git history, file structure – is explicitly banned. The rationale is that derived information decays: a memory claiming “the auth module is in src/auth/” becomes dangerously stale after a refactor, while grep and git log remain authoritative. Each topic file carries typed frontmatter:


---
name: Testing policy
description: Integration tests must hit a real database, not mocks
type: feedback

---

Integration tests must use a real database connection.

**Why:** Prior incident where mock/prod divergence masked a broken migration.
**How to apply:** Any test file touching DB access must connect to the test database.

Recall-Side Safety

A dedicated section in the memory prompt (TRUSTING_RECALL_SECTION in memoryTypes.ts) guards against a subtle failure mode: the agent recommending something from memory that no longer exists. The instruction is precise – if a memory names a file path, check the file exists; if it names a function, grep for it. The section was eval-validated: without it, the agent went 0/3 on cases where a remembered function had been renamed; with it, 3/3.


Tier 2: Session Memory – The L2 Cache

Session memory is a within-session scratchpad maintained by a background subagent. While T1 project memory persists across sessions, session memory captures the state of the current conversation: what is being worked on, what commands were run, what errors were encountered. It feeds directly into the compaction system (Post 04): when context is compacted, session memory provides the structured summary that replaces the lost messages.

Threshold-Based Triggering

Session memory extraction does not run on every turn. It is gated by a dual-threshold system defined in sessionMemoryUtils.ts:

export const DEFAULT_SESSION_MEMORY_CONFIG: SessionMemoryConfig = {
  minimumMessageTokensToInit: 10000,   // ~10K tokens before first extraction
  minimumTokensBetweenUpdate: 5000,    // ~5K token growth between updates
  toolCallsBetweenUpdates: 3,          // at least 3 tool calls between updates
}

The shouldExtractMemory function implements a conjunction-with-fallback policy: extraction triggers when both thresholds are met (tokens AND tool calls), OR when the token threshold is met and the last assistant turn has no tool calls (a natural conversation break). The token threshold is always required – even if the tool-call count is met, extraction does not fire until enough new context has accumulated. This prevents excessive extractions during rapid tool-use sequences.

Structured Template

The session memory file is written to a fixed path at ~/.claude/sessions/{session}/session_memory.md and follows a fixed template with ten sections (defined in prompts.ts):

Section Purpose
Session Title 5–10 word distinctive title
Current State What is actively being worked on
Task Specification What the user asked to build
Files and Functions Important files and their roles
Workflow Bash commands and their interpretation
Errors & Corrections Failures and fixes
Codebase and System Documentation System components and architecture
Learnings Insights and patterns discovered during the session
Key Results Important outputs and deliverables produced
Worklog Step-by-step terse summary

Each section has a per-section cap of 2,000 tokens and the total file is capped at 12,000 tokens. When a section exceeds its cap, the extraction prompt includes explicit instructions to condense. The subagent is restricted to a single tool: FileEditTool targeting only the session memory file path. It cannot read arbitrary files, run commands, or write anywhere else.


Tier 3: End-of-Turn Extraction – The Write-Back Buffer

The most architecturally interesting tier is the end-of-turn extraction system in extractMemories.ts. It runs once at the end of each complete query loop – when the model produces a final response with no tool calls – and writes durable memories to the auto-memory directory. Think of it as a write-back buffer: the agent accumulates observations during the conversation (in its ephemeral context), and the extraction agent periodically flushes the worth-remembering subset to persistent storage.

The Forked Agent Pattern

Extraction uses the runForkedAgent pattern – a perfect fork of the main conversation that shares the parent’s prompt cache. This is a cost optimization: the forked agent inherits the system prompt and tool definitions from the parent’s cached prefix, so it pays only for the extraction prompt and any new tokens it generates. The fork is configured with:

  • skipTranscript: true – the extraction agent’s messages are not recorded to the session transcript (avoiding race conditions with the main thread)
  • maxTurns: 5 – a hard cap preventing verification rabbit-holes from burning turns

Restricted Tool Permissions

The extraction agent operates under a strict permission sandbox defined by createAutoMemCanUseTool:

%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
  subgraph UNRESTRICTED["Unrestricted (read-only)"]
    READ["<b>Read</b> -- unrestricted"]
    GREP["<b>Grep</b> -- unrestricted"]
    GLOB["<b>Glob</b> -- unrestricted"]
  end

  subgraph SCOPED["Scoped (memory dir only)"]
    EDIT["<b>Edit</b> -- memory dir only"]
    WRITE["<b>Write</b> -- memory dir only"]
    BASH_RO["<b>Bash</b> -- read-only only"]
  end

  subgraph DENIED["Denied"]
    BASH_W["<b>Bash (write)</b> -- denied"]
    OTHER["<b>All other tools</b> -- denied"]
  end

  style READ fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style GREP fill:#9CAF88,color:#fff,stroke:#7A8D68
  style GLOB fill:#C2856E,color:#fff,stroke:#A06A54
  style EDIT fill:#B39EB5,color:#fff,stroke:#8E7A93
  style WRITE fill:#C4A882,color:#fff,stroke:#A08562
  style BASH_RO fill:#8E9B7A,color:#fff,stroke:#6E7B5A
  style BASH_W fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style OTHER fill:#9CAF88,color:#fff,stroke:#7A8D68
Figure 2: Tool permissions for the end-of-turn extraction subagent, organized into three tiers. Unrestricted (read-only): Read, Grep, and Glob operate without restrictions. Scoped (memory directory only): Edit and Write are limited to the auto-memory directory via isAutoMemPath() validation; Bash is restricted to read-only commands (ls, find, grep, cat, stat, wc, head, tail). Denied (terracotta): write-capable Bash and all other tools are blocked entirely. This same permission sandbox is shared between T3 extraction and T4 dream consolidation agents.

How to read this diagram. The three subgroups represent three permission tiers, arranged from most permissive (top) to most restrictive (bottom). The “Unrestricted” group grants full read-only access to Read, Grep, and Glob. The “Scoped” group allows Edit, Write, and Bash only within narrow bounds – the memory directory for Edit/Write, and read-only commands for Bash. The “Denied” group blocks write-capable Bash and all other tools entirely. This same three-tier sandbox applies to both T3 extraction and T4 dream consolidation agents.

The permission function (createAutoMemCanUseTool) is shared between T3 extraction and T4 dream consolidation. Read/Grep/Glob are unrestricted (inherently read-only). Bash is allowed only when tool.isReadOnly(parsed.data) returns true – limiting it to commands like ls, find, grep, cat, stat, wc, head, tail. Edit and Write are allowed only when the target file_path passes isAutoMemPath(), which verifies (after normalization) that the path starts with the auto-memory directory prefix.

Mutual Exclusion with the Main Agent

A subtle but important design choice: when the main agent writes memories directly (because the user explicitly said “remember this”), the extraction agent skips that range entirely. The hasMemoryWritesSince function scans assistant messages after the cursor for Edit/Write tool-use blocks targeting auto-memory paths. If found, the extraction cursor advances past those messages without spawning a forked agent. This prevents duplicate memories and wasted API calls.

Coalescing and Trailing Runs

If a new extraction request arrives while one is already in progress, the system does not queue it. Instead, it stashes the latest context and runs a single trailing extraction after the current one finishes. The trailing run computes newMessageCount relative to the cursor the first run advanced, so it processes only the delta. This is a classic coalescing optimization – the same pattern used in I/O schedulers to merge adjacent write requests.


Tier 4: Dream Consolidation – Nightly Batch Archival

The fourth tier is autoDream – a background consolidation pass that reviews memories accumulated across multiple sessions and reorganizes them. The name is apt: like sleep consolidation in neuroscience, it operates when the agent is not actively processing tasks, synthesizing fragmented observations into coherent long-term knowledge.

Note

Rollout status: T4 Dream Consolidation is fully implemented in v2.1.88 and available via the /dream slash command. Auto-dream (the automatic background trigger) is gated by a server-side GrowthBook flag (tengu_onyx_plover) that defaults to off. Users can opt in via the autoDreamEnabled setting in settings.json.

Triple Gate

AutoDream fires only when all three gates pass, evaluated in cheapest-first order to minimize per-turn cost:

  1. Time gate: hours since last consolidation >= minHours (default: 24 hours). This requires only one stat call on the lock file.
  2. Session gate: number of transcript files with mtime > lastConsolidatedAt >= minSessions (default: 5 sessions). This requires a directory scan, so it is gated behind a 10-minute scan throttle to avoid repeated scanning when the time gate passes but the session gate does not.
  3. Lock gate: no other process is mid-consolidation. The lock file (~/.claude/projects/<slug>/memory/.consolidate-lock) stores the holder’s PID. A stale-PID reclamation mechanism handles crashes: if the PID is dead or the lock is older than 1 hour, the next process reclaims it.
const DEFAULTS: AutoDreamConfig = {
  minHours: 24,
  minSessions: 5,
}

Consolidation Prompt

The consolidation prompt (consolidationPrompt.ts) instructs the forked agent through four phases:

  1. Orientls the memory directory, read MEMORY.md, skim existing topic files
  2. Gather recent signal – review daily logs, check for drifted memories, grep transcripts narrowly
  3. Consolidate – merge new signal into existing topic files (not create duplicates), convert relative dates to absolute, delete contradicted facts
  4. Prune and index – keep MEMORY.md under 200 lines and 25KB, demote verbose entries, resolve contradictions

The agent operates under the same createAutoMemCanUseTool permission sandbox as T3 extraction. Bash is restricted to read-only commands – the prompt explicitly states this to prevent the agent from probing.

Rollback on Failure

If the forked agent fails, the lock file’s mtime is rolled back to its pre-acquisition value via rollbackConsolidationLock. This ensures the time gate passes again on the next turn, rather than the failure appearing as a successful consolidation that delays the next attempt by 24 hours.


Tier 5: Team Memory Sync – The Distributed Cache

The outermost tier is team memory sync (teamMemorySync/index.ts), a server-backed system that shares memories across all authenticated organization members working on the same repository. This is the distributed shared cache of the hierarchy – the slowest, most broadly scoped, and most complex tier.

Note

Rollout status: T5 Team Memory Sync is fully implemented in v2.1.88 but not yet publicly announced or documented. It is double-gated: a compile-time feature('TEAMMEM') flag (enabled in this build) and a server-side GrowthBook flag (tengu_herring_clock) that defaults to off. It requires OAuth authentication and a GitHub remote. Production bug fixes in the code (e.g., a device that emitted 167K push events over 2.5 days) suggest it has been active for internal or early access testing.

Sync Semantics

The API contract is built around a single endpoint:

GET  /api/claude_code/team_memory?repo={owner/repo}     → pull
GET  ...?repo={owner/repo}&view=hashes                   → checksums only
PUT  /api/claude_code/team_memory?repo={owner/repo}      → push (upsert)

The semantics are deliberately asymmetric:

  • Pull is server-wins: remote entries overwrite local files. If a teammate pushed a correction, the next pull replaces the local version unconditionally.
  • Push is local-wins-on-conflict: if a 412 (Precondition Failed) occurs during push, the system probes server checksums, recomputes the delta excluding keys where the teammate’s push matches ours, and retries. This preserves the local user’s active edit.
  • Deletions do not propagate: deleting a local file does not remove it from the server; the next pull restores it locally.

Delta Upload and Batching

Push does not upload all local files. It computes a delta by comparing sha256:<hex> content hashes of local files against serverChecksums (populated from the server’s entryChecksums response). Only keys whose hash differs are included in the PUT. This is analogous to a cache write-back policy: only dirty lines are flushed.

Large deltas are split into batches under a 200KB body-size cap (MAX_PUT_BODY_BYTES) to stay under the API gateway’s limit. Each batch is an independent PUT with upsert semantics – if batch N fails, batches 1..N-1 are already committed. The serverChecksums map is updated after each successful batch, so a retry naturally resumes from the uncommitted tail.

Secret Scanning

The upload format is JSON, not the local markdown files directly. Memory entries are serialized into a JSON payload before transmission, which decouples the on-disk representation from the wire protocol and allows the server to store and index entries in a structured format.

Before any file is uploaded, it passes through a secret scanner (secretScanner.ts) using patterns derived from gitleaks. Files containing detected secrets are silently excluded from the upload – they never leave the machine. Only the rule ID (not the secret value or file path) is logged for analytics.

const secretMatches = scanForSecrets(content)
if (secretMatches.length > 0) {
  skippedSecrets.push({
    path: relPath,
    ruleId: firstMatch.ruleId,
    label: firstMatch.label,
  })
  return  // file excluded from upload
}

Conflict Resolution

On a 412 conflict, the push logic executes a lightweight resolution cycle:

  1. Probe GET ?view=hashes to refresh per-key checksums (no bodies – saves bandwidth)
  2. Recompute the delta against the refreshed checksums (keys where a teammate’s concurrent push matches ours are naturally excluded)
  3. Retry the PUT with the tighter delta

This cycle repeats up to MAX_CONFLICT_RETRIES = 2 times. The probe-and-recompute approach avoids full content downloads during conflict resolution – a significant optimization when the team memory contains hundreds of kilobytes of content.


How Memory Feeds Into Context

The memory hierarchy does not exist in isolation – it feeds directly into the prompt assembly pipeline described in Post 03. The integration points are:

  1. T1 (MEMORY.md): loaded into the system prompt via loadMemoryPrompt() during prompt assembly. The truncated index content is included verbatim in every API call. This is the “always-hot” cache line.

  2. T2 (Session Memory): consumed by the compaction system (Post 04). When auto-compact triggers, it reads the session memory file and includes its content as a structured summary in the compacted conversation prefix. This is how compacted sessions retain continuity despite losing the original messages.

  3. T3 (Extracted Memories): written to disk as topic files. On the next session, these files are discoverable via scanMemoryFiles and can be recalled by the agent when relevant. The formatMemoryManifest function formats the directory listing as a text manifest (one line per file with type tag, filename, timestamp, and description) that the agent can search.

  4. T4 (Dream Consolidation): operates on the same files as T3 but reorganizes them. Its output feeds into T1 (updated MEMORY.md index) and T3 (merged topic files).

  5. T5 (Team Memory): pulled to disk before prompt assembly. Team memory files are loaded alongside individual memory files via the combined prompt builder (buildCombinedMemoryPrompt), which presents both directories with scope guidance (private vs. team).


Safety Across Tiers

Memory writers at every tier operate under principle-of-least-privilege constraints:

Tier Writer Constraint
T1 Main agent Full tool access but explicit-save-only policy; even explicit requests are filtered (no activity logs, no derivable facts)
T2 Session memory subagent Single tool: FileEditTool on exactly one file path
T3 Extraction forked agent Read/Grep/Glob unrestricted; Bash read-only; Edit/Write scoped to memory dir
T4 Dream forked agent Same as T3; additionally lock-gated and abort-controllable
T5 Team sync service Secret scanning before upload; path traversal validation on pull; 250KB per-file cap; server-side entry-count limits

The path validation in isAutoMemPath deserves emphasis. It normalizes the candidate path before comparison, preventing .. traversal attacks. The settings-level override (autoMemoryDirectory) is restricted to trusted sources – policy, local, and user settings – explicitly excluding projectSettings (committed .claude/settings.json), because a malicious repository could otherwise set autoMemoryDirectory: "~/.ssh" and gain write access to sensitive directories through the auto-memory file-write carve-out.


The Cache Hierarchy Analogy Revisited

The hardware cache analogy is not just pedagogical – it reveals structural properties of the design:

Cache Property Memory Hierarchy Analogue
L1 is always consulted first MEMORY.md is loaded into every API call
L2 is filled from L1 misses Session memory captures context that overflows the window
Write-back coalesces writes Extraction coalesces adjacent turns before flushing
Nightly compaction/defrag Dream consolidation merges and prunes across sessions
Cache coherence protocol Team sync’s server-wins pull + delta push maintain consistency
Inclusion property MEMORY.md index is a strict subset of topic file knowledge
Cold start penalty First session in a new project has empty memory; subsequent sessions benefit from accumulated tiers

The hierarchy also exhibits the characteristic miss penalty asymmetry of hardware caches. A T1 miss (memory not in MEMORY.md) is cheap – the agent can grep the memory directory. A T5 miss (knowledge not shared with the team) is expensive – someone else must discover it independently. The system is designed so that each tier’s background processes progressively push knowledge outward, reducing the probability and cost of misses at outer levels.


Summary

Claude Code’s memory hierarchy is a case study in applying classical systems design to LLM agents. The five tiers – project memory, session memory, end-of-turn extraction, dream consolidation, and team sync – form a coherent system where each level trades scope for persistence, mirroring the cache hierarchies that have organized computer memory for decades. The safety model is equally deliberate: restricted tool permissions, path validation, secret scanning, and size caps ensure that background memory writers cannot escape their sandbox. Together with the compaction system (Post 04) and prompt assembly pipeline (Post 03), the memory hierarchy completes the context engineering story – managing not just what the model sees in this conversation, but what it remembers across all conversations.


Appendix: MagicDocs — Automatic Documentation Maintenance

MagicDocs is an internal-only (USER_TYPE === 'ant') background agent that automatically keeps markdown files up-to-date with information from conversations. It is a specialized memory writer that targets documentation rather than the MEMORY.md system.

Detection and Trigger

When FileReadTool reads a file, MagicDocs checks for a header pattern: # MAGIC DOC: <title>. If detected, the file is registered for tracking. An optional italicized line immediately below the header provides custom instructions for the update agent.

Updates are triggered via a post-sampling hook — they run after the model produces output, never during active conversation. Updates only fire when the conversation is idle (no pending tool calls in the last assistant turn) and the query originates from the main REPL thread (not subagents).

Update Process

For each tracked document, MagicDocs:

  1. Clones the FileStateCache to isolate its reads from the main session
  2. Re-reads the document to verify the header still exists (removes tracking if not)
  3. Builds an update prompt using a template (overridable at ~/.claude/magic-docs/prompt.md)
  4. Runs a forked Sonnet subagent with access to only FileEditTool, restricted to editing only the target file

The update prompt enforces a current-state documentation philosophy: update information in-place, remove outdated content rather than appending “Previously…” notes, fix typos and formatting, and never reference “documentation updates” or “magic docs” in the output. A sequential() wrapper prevents race conditions when multiple magic docs exist.

Source files: src/services/MagicDocs/magicDocs.ts (detection, tracking, update lifecycle), src/services/MagicDocs/prompts.ts (update prompt template with {variable} substitution).