%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
subgraph VIS["<b>Visible to User</b>"]
V1["Your message"]
V2["Conversation history"]
end
subgraph HID["<b>Invisible to User</b>"]
H1["System prompt (4-6K tokens)"]
H2["Tool descriptions (73 fragments)"]
H3["CLAUDE.md files (varies)"]
H4["System reminders (50 types)"]
H5["MEMORY.md (200 lines)"]
H6["MCP instructions (volatile)"]
end
API["<b>Actual API Call</b><br>(all of the above)"]
V1 --> API
V2 --> API
H1 --> API
H2 --> API
H3 --> API
H4 --> API
H5 --> API
H6 --> API
style V1 fill:#8B9DAF,color:#fff,stroke:#6E7F91
style V2 fill:#9CAF88,color:#fff,stroke:#7A8D68
style H1 fill:#C2856E,color:#fff,stroke:#A06A54
style H2 fill:#B39EB5,color:#fff,stroke:#8E7A93
style H3 fill:#C4A882,color:#fff,stroke:#A08562
style H4 fill:#8E9B7A,color:#fff,stroke:#6E7B5A
style H5 fill:#8B9DAF,color:#fff,stroke:#6E7F91
style H6 fill:#9CAF88,color:#fff,stroke:#7A8D68
style API fill:#C2856E,color:#fff,stroke:#A06A54
style VIS fill:#9CAF8822,stroke:#7A8D68
style HID fill:#C2856E22,stroke:#A06A54
Transparency & Trust
Invisible tokens, remote feature flags, and opt-out telemetry
Introduction: The Invisible Tax
Every time you press Enter in Claude Code, you pay for tokens you never typed. Before the model reads your message, a middleware pipeline has already injected 4,000–6,000 tokens of system prompt, appended <system-reminder> XML tags to your conversation, loaded CLAUDE.md files you may not know exist, and fetched feature flags from a remote CDN that can silently change the agent’s behavior. You see none of this. You pay for all of it.
This post documents the mechanisms by which Anthropic manipulates the context window behind the scenes – not as a conspiracy theory, but as an engineering analysis. Most of these mechanisms exist for legitimate reasons (safety, quality, cost optimization). But the cumulative effect is that users pay for a significant volume of invisible tokens, and Anthropic retains remote control over agent behavior through feature flags, telemetry, and auto-updating. Understanding these mechanisms is prerequisite to making informed decisions about cost, privacy, and trust.
How to read this diagram. Start with the two boxes on the left: the green “Visible to User” group contains only your typed message and conversation history. The terracotta “Invisible to User” group shows six additional components – system prompt, tool descriptions, CLAUDE.md files, system reminders, MEMORY.md, and MCP instructions – that are silently concatenated. All eight boxes feed into the single “Actual API Call” node at the bottom, illustrating that the request sent to the API is far larger than what the user typed.
Source files covered in this post:
| File | Purpose | Size |
|---|---|---|
src/constants/prompts.ts |
403 prompt string templates (~728 KB) | ~18,000 LOC |
src/services/analytics/growthbook.ts |
GrowthBook feature flag client (757 flag names) | ~300 LOC |
src/services/analytics/ |
Telemetry pipeline (Datadog, event logging, sinks) | 9 files |
src/memdir/ |
Auto-memory system (extraction, consolidation, scanning) | 8 files |
src/services/autoDream/ |
Cross-session memory consolidation agent | 4 files |
src/utils/autoUpdater.ts |
Binary auto-update mechanism | ~300 LOC |
src/services/remoteManagedSettings/ |
Remote enterprise settings overrides | 5 files, ~1,000 LOC |
src/services/policyLimits/ |
Organization-level feature restrictions | 2 files, ~700 LOC |
src/services/settingsSync/ |
Cross-environment settings replication | 2 files, ~600 LOC |
The System Prompt: 15–25 KB You Never See
The main agent’s system prompt, assembled by the DX() function, concatenates 17 sections into a single prompt of approximately 15–25 KB depending on configuration. At roughly 4 characters per token, this is 4,000–6,000 tokens injected on every API call.
The full extraction from v2.1.88 yielded 403 prompt strings totaling 728 KB across 18,000 lines. Of these, approximately 171 are active prompts (~321 KB), 143 are SDK reference data (~343 KB loaded on-demand), and 84 are assembly glue (~46 KB).
What is in those 17 sections?
| Section | Type | Content | Token Impact |
|---|---|---|---|
| S1: Identity | Static | “You are Claude Code, Anthropic’s official CLI” | ~200 tokens |
| S2: Tool Policy | Static | 10 rules (use Read not cat, Edit not sed, etc.) | ~400 tokens |
| S3: Anti-Patterns | Static | 6 explicit anti-patterns + 40 “NEVER” rules | ~600 tokens |
| S4: Reversibility | Static | Blast-radius framework for destructive operations | ~300 tokens |
| S5: Tool Rules | Static | Per-tool guidance, subagent delegation, skills | ~800 tokens |
| S6: Efficiency | Static | Output optimization rules | ~200 tokens |
| S7: Tone | Static | No emojis, file:line references, concise | ~150 tokens |
| S8: Cache Control | Static | Prompt cache optimization (feature-flagged) | ~100 tokens |
| D1: Memory | Dynamic | MEMORY.md (first 200 lines) + topic files | 0–2,000 tokens |
| D2: Environment | Dynamic | CWD, git status, platform, shell, OS, model | ~300 tokens |
| D3: Language | Dynamic | User language preference | ~50 tokens |
| D4: Output Style | Dynamic | Custom output configuration | ~50 tokens |
| D5: MCP | Dynamic | MCP server instructions (now migrated to reminders) | 0–5,000 tokens |
| D6: Scratchpad | Dynamic | Temp directory pointer | ~50 tokens |
| D7: Fork Config | Dynamic | A/B flags (frc) |
~100 tokens |
| D8: Summarize | Dynamic | Tool result hints | ~100 tokens |
| D9: Brief Mode | Dynamic | Terse mode flag | ~50 tokens |
The static sections (S1–S8) total approximately 2,750 tokens. With prompt caching, these are processed once and served at a 90% cost reduction on subsequent turns. The dynamic sections add 700–7,700 tokens depending on configuration.
The D5: MCP section was historically the most significant cache cost driver — and the reason is specific. Regular tool responses (the output of Bash, Read, Grep, etc.) live in the conversation message stream, which comes after the cached prefix. A Bash command returning different output every turn has zero effect on the system prompt cache. But MCP server instructions were placed inside the system prompt itself — part of the cached prefix. Every other system prompt section was stable: identity text is compiled into the binary, built-in tool schemas are static, CLAUDE.md changes only on user edits, and the date uses month-level granularity. The MCP section was the only system prompt section whose bytes depended on the runtime state of external processes — servers that can crash, restart, or be added mid-session. A change to D5 invalidated the entire prefix from that point forward.
Anthropic has since mitigated this by moving MCP instructions into system reminder attachments (mcp_instructions_delta) rather than rebuilding the system prompt — so D5 may now be empty or stable. However, MCP tool schemas still enter the tool definition array, and each new MCP server adds tool descriptions that can change the tool block. The placement of MCP-related content at the end of both the system prompt and the tool array minimizes the blast radius of these changes.
In a 50-turn session with no MCP servers, prompt caching saves ~675,000 tokens. With MCP servers connected, the tool schema block grows (each MCP tool adds ~200–500 tokens of schema), and MCP server connect/disconnect events trigger mcp_instructions_delta reminders that add tokens to the conversation stream. The difference can be 1.5–3x higher input token cost depending on the number and chattiness of connected MCP servers.
System Reminders: 50 Invisible Injections
Beyond the system prompt, Claude Code injects 50 distinct system-reminder fragments into conversation messages as XML tags. These are appended to user and assistant messages without the user seeing them:
<system-reminder>
Plan mode step 3 of 7: implementing the auth module
</system-reminder>The injection function is minimal: function vv(text) { return '<system-reminder>\n${text}\n</system-reminder>' }. But the cumulative effect is significant – reminders are injected into existing messages, meaning they inflate the conversation history on every subsequent API call.
The 50 reminder types cluster into ten categories (see the full inventory in Post 4):
| Category | Reminders | Token Impact Per Injection |
|---|---|---|
| Plan mode | 3 variants (active, step, replanning) | ~100–300 tokens |
| File changes | Modified files, git status | ~50–200 tokens |
| Token budget | Usage warning, USD budget | ~50–100 tokens |
| Hook results | Success, blocking error, stopped, context | ~50–500 tokens |
| Memory | MEMORY.md contents, recalled facts | ~100–2,000 tokens |
| IDE context | Open file, cursor position, diagnostics | ~100–500 tokens |
| Team coordination | Teammate messages, shutdown, task status | ~100–500 tokens |
In a typical 20-turn session with plan mode active, system reminders add an estimated 2,000–5,000 cumulative tokens to the conversation history. Unlike the system prompt (which is cached), reminders are scattered throughout messages and cannot be cached separately. They are invisible to the user but visible on the API bill.
Remote Control: 757 Feature Flags and A/B Tests
A feature flag (also called a feature toggle) is a runtime boolean or configuration value fetched from a remote server that enables or disables functionality without deploying new code. The technique is standard in web services – Netflix, GitHub, and Google use feature flags to roll out features gradually, run A/B experiments, and kill misbehaving code instantly. What is unusual here is that the same mechanism is applied to a local CLI tool running on the user’s machine: Anthropic’s servers can change how your locally-installed Claude Code behaves, without shipping an update.
Claude Code fetches feature flags from GrowthBook (cdn.growthbook.io) on startup. The report identifies 757 distinct tengu_* event and flag names. These flags control:
- Agent behavior:
tengu_system_prompt_global_cache(caching strategy),tengu_tight_weave(subagent response format),tengu_ultrathink(extended thinking) - Feature gating:
tengu_experimental_agent_teams(teammate multi-agent),tengu_structured_output_enabled(structured output) - Safety overrides:
tengu_disable_bypass_permissions_mode(can remotely disable YOLO mode) - Compaction strategy: Feature flags can silently switch between three different compaction analysis prompts
- Plan mode:
tengu_plan_mode_interview_phasechanges how plan mode works
Many flags use obscured codenames – tengu_marble_anvil, tengu_pewter_ledger, tengu_coral_fern, tengu_cobalt_lantern – that the analysis identifies as likely A/B tests or unreleased features. The frc (feature-related context) section of the system prompt is explicitly labeled as “A/B flags” in the implementation.
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart LR
GB["<b>GrowthBook CDN</b><br><i>cdn.growthbook.io</i>"]
F1["Prompt content"]
F2["Compaction strategy"]
F3["Feature gating"]
F4["Safety overrides"]
F5["A/B experiments"]
AGENT["<b>Your Claude Code<br>Agent</b>"]
GB --> F1
GB --> F2
GB --> F3
GB --> F4
GB --> F5
F1 --> AGENT
F2 --> AGENT
F3 --> AGENT
F4 --> AGENT
F5 --> AGENT
style GB fill:#8B9DAF,color:#fff,stroke:#6E7F91
style F1 fill:#9CAF88,color:#fff,stroke:#7A8D68
style F2 fill:#C2856E,color:#fff,stroke:#A06A54
style F3 fill:#B39EB5,color:#fff,stroke:#8E7A93
style F4 fill:#C4A882,color:#fff,stroke:#A08562
style F5 fill:#8E9B7A,color:#fff,stroke:#6E7B5A
style AGENT fill:#8B9DAF,color:#fff,stroke:#6E7F91
How to read this diagram. Follow the flow left to right: the GrowthBook CDN on the left distributes 757 feature flags that fan out into five categories – prompt content, compaction strategy, feature gating, safety overrides, and A/B experiments. All five converge on the “Your Claude Code Agent” node on the right, showing that a single remote source can modify agent behavior across multiple dimensions without shipping a new version.
The implications are significant:
- Anthropic can change your agent’s behavior without shipping a new version. A flag flip on
cdn.growthbook.iotakes effect the next time Claude Code starts. - Users may be in A/B tests without knowing it. The obscured codenames suggest experimental treatments. Two users running the same version of Claude Code may get different behavior.
- Safety settings can be overridden remotely. The
tengu_disable_bypass_permissions_modeflag means Anthropic can remotely disable YOLO mode – a safety feature, but also a demonstration of remote control over user-configured behavior.
Telemetry: 757 Events to Six Services
Claude Code reports telemetry to six separate services:
| Service | What It Collects | Protocol |
|---|---|---|
| Segment | 757 event types: tool calls, API calls, permissions, UI interactions, errors | HTTPS |
| Datadog | Operational logs, performance metrics | HTTPS |
| Anthropic Metrics | Usage metrics (api.anthropic.com/api/claude_code/metrics) |
HTTPS |
| Anthropic Feedback | User feedback (api.anthropic.com/api/claude_cli_feedback) |
HTTPS |
| Session Sharing | Full conversation transcripts (api.anthropic.com/api/claude_code_shared_session_transcripts) |
HTTPS |
| OpenTelemetry | Optional distributed tracing | OTLP |
The 757 tengu_* events track everything: every tool call (name, duration, success/failure), every API call (model, tokens, cost, latency), every permission decision, session lifecycle events, UI interactions (mode cycling, copy, paste, keyboard shortcuts), and all errors.
Telemetry does not inflate the conversation context – these are out-of-band HTTP calls. But they consume bandwidth, add startup latency (GrowthBook fetch), and transmit detailed usage patterns. The CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 environment variable (referenced 14 times in the codebase) disables non-essential telemetry, but discovering this requires reading documentation or source code.
Auto-Compact: Silent History Rewriting
When the conversation approaches the context window limit, Claude Code triggers auto-compaction – a process that silently rewrites the conversation history. The compaction system:
- Takes the entire message history.
- Strips trailing incomplete tool_use blocks.
- Sends all messages plus a summary prompt to a cheaper model (Haiku) for summarization.
- Replaces the entire conversation history with a single user message containing the summary.
The API call uses a special header: x-stainless-helper: compaction. After compaction, the original conversation is gone from the active context. A system-reminder-compact-file-reference is injected pointing to the raw transcript file.
Users receive a tengu_post_compact_survey_event feedback request after compaction, but the rewrite itself happens without approval. The model that summarizes your conversation is Haiku – the cheapest model – meaning nuances, specific code snippets, and detailed reasoning may be lost in the compression.
Three different compaction strategies exist, controlled by feature flags: full-conversation analysis, minimal analysis (feature-flag-controlled), and recent-messages-only. The active strategy can change based on GrowthBook flags without user notification.
The Privacy Controls That Exist
To be fair, Claude Code does provide opt-out mechanisms:
| Control | What It Disables |
|---|---|
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 |
GrowthBook flags, Segment telemetry, auto-updates |
DISABLE_TELEMETRY=1 |
Segment analytics events |
CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 |
Automatic memory writes |
/privacy-settings slash command |
View and toggle data sharing preferences |
autoUpdates: false in settings |
Disables auto-binary replacement |
The /privacy-settings command provides a UI for viewing and updating data sharing and telemetry controls. But the defaults are all opt-in – telemetry, auto-updates, memory writes, and GrowthBook flag fetching are all active unless explicitly disabled.
Summary
Most of these mechanisms exist for defensible reasons:
- System prompts define safety behavior and prevent the model from generating harmful output.
- System reminders counteract instruction decay, keeping the model aligned across long sessions.
- Feature flags enable continuous deployment without breaking changes.
- Telemetry helps Anthropic improve the product.
- Auto-compaction prevents sessions from crashing when the context window fills.
The concern is not that these mechanisms exist – it is that they operate without informed consent. Users do not see the system prompt, are not told about A/B tests, cannot easily audit which feature flags are active, and may not realize that their conversation history is being silently rewritten by a cheaper model during compaction.
The privacy controls exist (CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC, DISABLE_TELEMETRY, /privacy-settings), but they are opt-out rather than opt-in, and discovering them requires reading documentation or source code. A more transparent approach would surface the hidden token overhead in the UI (how many tokens are system prompt vs. user content), make feature flag assignments visible, and require explicit consent before rewriting conversation history.
Every API call carries 4,000–6,000 tokens of invisible system prompt. Prompt caching reduces the cost by 90%, but the tokens are still there.
50 system reminders inject context into your messages without appearing in the conversation UI. They accumulate across turns and cannot be cached.
757 feature flags fetched from GrowthBook allow Anthropic to change agent behavior remotely, including disabling user-configured safety settings and switching A/B test treatments.
Auto-compaction silently rewrites your conversation using a cheaper model. The original history is only preserved in local transcript files.
The privacy controls are opt-out, not opt-in. Telemetry, auto-updates, memory writes, and flag fetching are all active by default.
MCP servers remain a significant cost driver, though mitigated. Instructions have moved to system reminders (preserving the prompt cache), but each connected server still adds tool schema tokens to every turn, and connect/disconnect events inject delta reminders into the conversation stream.
Appendix: The Enterprise Remote Control Plane
Beyond GrowthBook feature flags, Claude Code implements a three-subsystem enterprise control plane that allows organizations to centrally manage agent behavior across their deployments. These subsystems are decoupled by design but share common patterns: eligibility checks, ETag-based HTTP caching, file persistence, hourly background polling, and fail-open resilience.
Remote Managed Settings
What it does. Any setting from ~/.claude/settings.json can be remotely overridden by enterprise admins. Settings are fetched from /api/claude_code/settings on startup and applied with highest priority — above user, project, and local settings. Changes trigger hot-reload of environment variables and feature gates without restart.
Control flow. On startup, loadRemoteManagedSettings() checks eligibility (API key users always eligible; OAuth requires Enterprise/Team subscription; 3P providers excluded), loads cached settings from ~/.claude/remote-settings.json, fetches from the API with ETag validation and up to 5 retries, then starts hourly background polling for mid-session changes.
Security check. When settings contain potentially dangerous overrides (e.g., redirecting ANTHROPIC_BASE_URL), a blocking UI dialog (ManagedSettingsSecurityDialog) requires explicit user approval before applying. Rejection triggers graceful shutdown.
Policy Limits
What it does. Organization-level feature restrictions via a policy matrix. Unlike settings (which configure behavior), policies enable or disable entire features. Three known policies:
| Policy | Controls |
|---|---|
allow_remote_sessions |
Remote session spawning, teleport, bridge initialization |
allow_remote_control |
Bridge mode REPL initialization |
allow_product_feedback |
Feedback surveys, /feedback command |
Fail-closed for HIPAA. Most policies fail open (unknown policy = allowed). But allow_product_feedback is in a special ESSENTIAL_TRAFFIC_DENY_ON_MISS set: if the policy cache is unavailable and essential-traffic-only mode is active, the feature is denied by default to prevent accidental data exfiltration in healthcare/compliance environments.
Settings Sync
What it does. Cross-environment settings replication between the interactive CLI and the Cloud Control Room (CCR). The interactive CLI uploads local settings in the background; CCR downloads them before plugin installation. Four file types are synced: user settings.json, user CLAUDE.md, and project-scoped variants of each.
Eligibility. OAuth only (API key users cannot sync). Files are capped at 500 KB. Project isolation uses a hash of the git remote URL.
Eligibility Matrix
| User Type | Remote Settings | Policy Limits | Settings Sync |
|---|---|---|---|
| API key (Console) | Yes | Yes | No |
| OAuth Enterprise/Team | Yes | Yes | Yes |
| OAuth Paid/Plus | Yes | No | Yes |
| 3P Provider (Bedrock/Vertex/Foundry) | No | No | No |
Custom ANTHROPIC_BASE_URL |
No | No | No |
Source files: src/services/remoteManagedSettings/ (7 files, ~1,000 LOC), src/services/policyLimits/ (2 files, ~700 LOC), src/services/settingsSync/ (2 files, ~600 LOC).
For the prompt assembly pipeline that generates these hidden fragments, see Part III.1: Prompt Assembly Pipeline. For the compaction system that rewrites conversation history, see Part III.2: Context Compaction. For the hook system that intercepts notifications, see Part III.4: Hooks & Lifecycle Events.
All analysis based on source extracted from the v2.1.88 source map. Numbers are estimates based on static analysis; actual token counts vary by session configuration.