Transparency & Trust

Invisible tokens, remote feature flags, and opt-out telemetry

privacy

telemetry

transparency

trust

Introduction: The Invisible Tax

Every time you press Enter in Claude Code, you pay for tokens you never typed. Before the model reads your message, a middleware pipeline has already injected 4,000–6,000 tokens of system prompt, appended <system-reminder> XML tags to your conversation, loaded CLAUDE.md files you may not know exist, and fetched feature flags from a remote CDN that can silently change the agent’s behavior. You see none of this. You pay for all of it.

This post documents the mechanisms by which Anthropic manipulates the context window behind the scenes – not as a conspiracy theory, but as an engineering analysis. Most of these mechanisms exist for legitimate reasons (safety, quality, cost optimization). But the cumulative effect is that users pay for a significant volume of invisible tokens, and Anthropic retains remote control over agent behavior through feature flags, telemetry, and auto-updating. Understanding these mechanisms is prerequisite to making informed decisions about cost, privacy, and trust.

%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
  subgraph VIS["<b>Visible to User</b>"]
    V1["Your message"]
    V2["Conversation history"]
  end
  subgraph HID["<b>Invisible to User</b>"]
    H1["System prompt (4-6K tokens)"]
    H2["Tool descriptions (73 fragments)"]
    H3["CLAUDE.md files (varies)"]
    H4["System reminders (50 types)"]
    H5["MEMORY.md (200 lines)"]
    H6["MCP instructions (volatile)"]
  end
  API["<b>Actual API Call</b><br>(all of the above)"]

  V1 --> API
  V2 --> API
  H1 --> API
  H2 --> API
  H3 --> API
  H4 --> API
  H5 --> API
  H6 --> API

  style V1 fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style V2 fill:#9CAF88,color:#fff,stroke:#7A8D68
  style H1 fill:#C2856E,color:#fff,stroke:#A06A54
  style H2 fill:#B39EB5,color:#fff,stroke:#8E7A93
  style H3 fill:#C4A882,color:#fff,stroke:#A08562
  style H4 fill:#8E9B7A,color:#fff,stroke:#6E7B5A
  style H5 fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style H6 fill:#9CAF88,color:#fff,stroke:#7A8D68
  style API fill:#C2856E,color:#fff,stroke:#A06A54
  style VIS fill:#9CAF8822,stroke:#7A8D68
  style HID fill:#C2856E22,stroke:#A06A54

Figure 1: The invisible token tax: what the user sees versus what the API actually receives. The visible portion (green) consists only of the user’s message and conversation history. The hidden portion (terracotta) includes the system prompt (4-6K tokens), 73 tool description fragments, CLAUDE.md files, 50 system reminder types, MEMORY.md (up to 200 lines), and volatile MCP instructions. All six hidden components are concatenated into every API call, inflating the actual request far beyond what the user typed.

How to read this diagram. Start with the two boxes on the left: the green “Visible to User” group contains only your typed message and conversation history. The terracotta “Invisible to User” group shows six additional components – system prompt, tool descriptions, CLAUDE.md files, system reminders, MEMORY.md, and MCP instructions – that are silently concatenated. All eight boxes feed into the single “Actual API Call” node at the bottom, illustrating that the request sent to the API is far larger than what the user typed.

Source files covered in this post:

File	Purpose	Size
`src/constants/prompts.ts`	403 prompt string templates (~728 KB)	~18,000 LOC
`src/services/analytics/growthbook.ts`	GrowthBook feature flag client (757 flag names)	~300 LOC
`src/services/analytics/`	Telemetry pipeline (Datadog, event logging, sinks)	9 files
`src/memdir/`	Auto-memory system (extraction, consolidation, scanning)	8 files
`src/services/autoDream/`	Cross-session memory consolidation agent	4 files
`src/utils/autoUpdater.ts`	Binary auto-update mechanism	~300 LOC
`src/services/remoteManagedSettings/`	Remote enterprise settings overrides	5 files, ~1,000 LOC
`src/services/policyLimits/`	Organization-level feature restrictions	2 files, ~700 LOC
`src/services/settingsSync/`	Cross-environment settings replication	2 files, ~600 LOC

The System Prompt: 15–25 KB You Never See

The main agent’s system prompt, assembled by the DX() function, concatenates 17 sections into a single prompt of approximately 15–25 KB depending on configuration. At roughly 4 characters per token, this is 4,000–6,000 tokens injected on every API call.

The full extraction from v2.1.88 yielded 403 prompt strings totaling 728 KB across 18,000 lines. Of these, approximately 171 are active prompts (~321 KB), 143 are SDK reference data (~343 KB loaded on-demand), and 84 are assembly glue (~46 KB).

What is in those 17 sections?

Section	Type	Content	Token Impact
S1: Identity	Static	“You are Claude Code, Anthropic’s official CLI”	~200 tokens
S2: Tool Policy	Static	10 rules (use Read not cat, Edit not sed, etc.)	~400 tokens
S3: Anti-Patterns	Static	6 explicit anti-patterns + 40 “NEVER” rules	~600 tokens
S4: Reversibility	Static	Blast-radius framework for destructive operations	~300 tokens
S5: Tool Rules	Static	Per-tool guidance, subagent delegation, skills	~800 tokens
S6: Efficiency	Static	Output optimization rules	~200 tokens
S7: Tone	Static	No emojis, file:line references, concise	~150 tokens
S8: Cache Control	Static	Prompt cache optimization (feature-flagged)	~100 tokens
D1: Memory	Dynamic	MEMORY.md (first 200 lines) + topic files	0–2,000 tokens
D2: Environment	Dynamic	CWD, git status, platform, shell, OS, model	~300 tokens
D3: Language	Dynamic	User language preference	~50 tokens
D4: Output Style	Dynamic	Custom output configuration	~50 tokens
D5: MCP	Dynamic	MCP server instructions (now migrated to reminders)	0–5,000 tokens
D6: Scratchpad	Dynamic	Temp directory pointer	~50 tokens
D7: Fork Config	Dynamic	A/B flags (`frc`)	~100 tokens
D8: Summarize	Dynamic	Tool result hints	~100 tokens
D9: Brief Mode	Dynamic	Terse mode flag	~50 tokens

The static sections (S1–S8) total approximately 2,750 tokens. With prompt caching, these are processed once and served at a 90% cost reduction on subsequent turns. The dynamic sections add 700–7,700 tokens depending on configuration.

The D5: MCP section was historically the most significant cache cost driver — and the reason is specific. Regular tool responses (the output of Bash, Read, Grep, etc.) live in the conversation message stream, which comes after the cached prefix. A Bash command returning different output every turn has zero effect on the system prompt cache. But MCP server instructions were placed inside the system prompt itself — part of the cached prefix. Every other system prompt section was stable: identity text is compiled into the binary, built-in tool schemas are static, CLAUDE.md changes only on user edits, and the date uses month-level granularity. The MCP section was the only system prompt section whose bytes depended on the runtime state of external processes — servers that can crash, restart, or be added mid-session. A change to D5 invalidated the entire prefix from that point forward.

Anthropic has since mitigated this by moving MCP instructions into system reminder attachments (mcp_instructions_delta) rather than rebuilding the system prompt — so D5 may now be empty or stable. However, MCP tool schemas still enter the tool definition array, and each new MCP server adds tool descriptions that can change the tool block. The placement of MCP-related content at the end of both the system prompt and the tool array minimizes the blast radius of these changes.

Cost Impact

In a 50-turn session with no MCP servers, prompt caching saves ~675,000 tokens. With MCP servers connected, the tool schema block grows (each MCP tool adds ~200–500 tokens of schema), and MCP server connect/disconnect events trigger mcp_instructions_delta reminders that add tokens to the conversation stream. The difference can be 1.5–3x higher input token cost depending on the number and chattiness of connected MCP servers.

System Reminders: 50 Invisible Injections

Beyond the system prompt, Claude Code injects 50 distinct system-reminder fragments into conversation messages as XML tags. These are appended to user and assistant messages without the user seeing them:

<system-reminder>
Plan mode step 3 of 7: implementing the auth module
</system-reminder>

The injection function is minimal: function vv(text) { return '<system-reminder>\n${text}\n</system-reminder>' }. But the cumulative effect is significant – reminders are injected into existing messages, meaning they inflate the conversation history on every subsequent API call.

The 50 reminder types cluster into ten categories (see the full inventory in Post 4):

Category	Reminders	Token Impact Per Injection
Plan mode	3 variants (active, step, replanning)	~100–300 tokens
File changes	Modified files, git status	~50–200 tokens
Token budget	Usage warning, USD budget	~50–100 tokens
Hook results	Success, blocking error, stopped, context	~50–500 tokens
Memory	MEMORY.md contents, recalled facts	~100–2,000 tokens
IDE context	Open file, cursor position, diagnostics	~100–500 tokens
Team coordination	Teammate messages, shutdown, task status	~100–500 tokens

In a typical 20-turn session with plan mode active, system reminders add an estimated 2,000–5,000 cumulative tokens to the conversation history. Unlike the system prompt (which is cached), reminders are scattered throughout messages and cannot be cached separately. They are invisible to the user but visible on the API bill.

Remote Control: 757 Feature Flags and A/B Tests

A feature flag (also called a feature toggle) is a runtime boolean or configuration value fetched from a remote server that enables or disables functionality without deploying new code. The technique is standard in web services – Netflix, GitHub, and Google use feature flags to roll out features gradually, run A/B experiments, and kill misbehaving code instantly. What is unusual here is that the same mechanism is applied to a local CLI tool running on the user’s machine: Anthropic’s servers can change how your locally-installed Claude Code behaves, without shipping an update.

Claude Code fetches feature flags from GrowthBook (cdn.growthbook.io) on startup. The report identifies 757 distinct tengu_* event and flag names. These flags control:

Agent behavior: tengu_system_prompt_global_cache (caching strategy), tengu_tight_weave (subagent response format), tengu_ultrathink (extended thinking)
Feature gating: tengu_experimental_agent_teams (teammate multi-agent), tengu_structured_output_enabled (structured output)
Safety overrides: tengu_disable_bypass_permissions_mode (can remotely disable YOLO mode)
Compaction strategy: Feature flags can silently switch between three different compaction analysis prompts
Plan mode: tengu_plan_mode_interview_phase changes how plan mode works

Many flags use obscured codenames – tengu_marble_anvil, tengu_pewter_ledger, tengu_coral_fern, tengu_cobalt_lantern – that the analysis identifies as likely A/B tests or unreleased features. The frc (feature-related context) section of the system prompt is explicitly labeled as “A/B flags” in the implementation.

%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart LR
  GB["<b>GrowthBook CDN</b><br><i>cdn.growthbook.io</i>"]
  F1["Prompt content"]
  F2["Compaction strategy"]
  F3["Feature gating"]
  F4["Safety overrides"]
  F5["A/B experiments"]
  AGENT["<b>Your Claude Code<br>Agent</b>"]

  GB --> F1
  GB --> F2
  GB --> F3
  GB --> F4
  GB --> F5
  F1 --> AGENT
  F2 --> AGENT
  F3 --> AGENT
  F4 --> AGENT
  F5 --> AGENT

  style GB fill:#8B9DAF,color:#fff,stroke:#6E7F91
  style F1 fill:#9CAF88,color:#fff,stroke:#7A8D68
  style F2 fill:#C2856E,color:#fff,stroke:#A06A54
  style F3 fill:#B39EB5,color:#fff,stroke:#8E7A93
  style F4 fill:#C4A882,color:#fff,stroke:#A08562
  style F5 fill:#8E9B7A,color:#fff,stroke:#6E7B5A
  style AGENT fill:#8B9DAF,color:#fff,stroke:#6E7F91

Figure 2: Remote control surface: 757 feature flags fetched from the GrowthBook CDN (cdn.growthbook.io) on startup. Five categories of remote modification are shown: prompt content, compaction strategy, feature gating, safety overrides (including remote disable of YOLO mode), and A/B experiments (including obscured codenames like tengu_marble_anvil). All five flow into the local agent instance, allowing Anthropic to modify agent behavior without shipping a new version and without user notification.

How to read this diagram. Follow the flow left to right: the GrowthBook CDN on the left distributes 757 feature flags that fan out into five categories – prompt content, compaction strategy, feature gating, safety overrides, and A/B experiments. All five converge on the “Your Claude Code Agent” node on the right, showing that a single remote source can modify agent behavior across multiple dimensions without shipping a new version.

The implications are significant:

Anthropic can change your agent’s behavior without shipping a new version. A flag flip on cdn.growthbook.io takes effect the next time Claude Code starts.
Users may be in A/B tests without knowing it. The obscured codenames suggest experimental treatments. Two users running the same version of Claude Code may get different behavior.
Safety settings can be overridden remotely. The tengu_disable_bypass_permissions_mode flag means Anthropic can remotely disable YOLO mode – a safety feature, but also a demonstration of remote control over user-configured behavior.

Telemetry: 757 Events to Six Services

Claude Code reports telemetry to six separate services:

Service	What It Collects	Protocol
Segment	757 event types: tool calls, API calls, permissions, UI interactions, errors	HTTPS
Datadog	Operational logs, performance metrics	HTTPS
Anthropic Metrics	Usage metrics (`api.anthropic.com/api/claude_code/metrics`)	HTTPS
Anthropic Feedback	User feedback (`api.anthropic.com/api/claude_cli_feedback`)	HTTPS
Session Sharing	Full conversation transcripts (`api.anthropic.com/api/claude_code_shared_session_transcripts`)	HTTPS
OpenTelemetry	Optional distributed tracing	OTLP

The 757 tengu_* events track everything: every tool call (name, duration, success/failure), every API call (model, tokens, cost, latency), every permission decision, session lifecycle events, UI interactions (mode cycling, copy, paste, keyboard shortcuts), and all errors.

Telemetry does not inflate the conversation context – these are out-of-band HTTP calls. But they consume bandwidth, add startup latency (GrowthBook fetch), and transmit detailed usage patterns. The CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 environment variable (referenced 14 times in the codebase) disables non-essential telemetry, but discovering this requires reading documentation or source code.

Auto-Compact: Silent History Rewriting

When the conversation approaches the context window limit, Claude Code triggers auto-compaction – a process that silently rewrites the conversation history. The compaction system:

Takes the entire message history.
Strips trailing incomplete tool_use blocks.
Sends all messages plus a summary prompt to a cheaper model (Haiku) for summarization.
Replaces the entire conversation history with a single user message containing the summary.

The API call uses a special header: x-stainless-helper: compaction. After compaction, the original conversation is gone from the active context. A system-reminder-compact-file-reference is injected pointing to the raw transcript file.

Users receive a tengu_post_compact_survey_event feedback request after compaction, but the rewrite itself happens without approval. The model that summarizes your conversation is Haiku – the cheapest model – meaning nuances, specific code snippets, and detailed reasoning may be lost in the compression.

Three different compaction strategies exist, controlled by feature flags: full-conversation analysis, minimal analysis (feature-flag-controlled), and recent-messages-only. The active strategy can change based on GrowthBook flags without user notification.

Other Hidden Behaviors

Several additional mechanisms operate behind the scenes:

Auto-updating. Claude Code automatically downloads and replaces its own binary from storage.googleapis.com/claude-code-dist-* without user initiation. The autoUpdates setting defaults to true. This means the tool you ran yesterday may not be the tool you run today.

Auto-memory writes. Claude Code automatically writes to MEMORY.md when it discovers patterns or user preferences. Each memory entry increases the system prompt size for all future sessions. Disableable via CLAUDE_CODE_DISABLE_AUTO_MEMORY=1, but on by default.

Dream memory consolidation. The dream-memory-consolidation agent reads past session transcripts to consolidate cross-session learning. Past conversation content is reprocessed – meaning old sessions influence new ones in ways the user may not expect.

Environment variable scanning. Claude Code reads 447 environment variables from the user’s shell. Some of these (AWS credentials, API keys, database URLs) could contain sensitive information that enters the process context.

SDK reference injection. When the claude-api skill triggers (detecting anthropic imports in code), it loads language-specific SDK reference data – up to 110 KB for the Python SDK alone – as prompt fragments. This is injected silently and can dramatically increase input token cost for that turn.

The Privacy Controls That Exist

To be fair, Claude Code does provide opt-out mechanisms:

Control	What It Disables
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1`	GrowthBook flags, Segment telemetry, auto-updates
`DISABLE_TELEMETRY=1`	Segment analytics events
`CLAUDE_CODE_DISABLE_AUTO_MEMORY=1`	Automatic memory writes
`/privacy-settings` slash command	View and toggle data sharing preferences
`autoUpdates: false` in settings	Disables auto-binary replacement

The /privacy-settings command provides a UI for viewing and updating data sharing and telemetry controls. But the defaults are all opt-in – telemetry, auto-updates, memory writes, and GrowthBook flag fetching are all active unless explicitly disabled.

Summary

Most of these mechanisms exist for defensible reasons:

System prompts define safety behavior and prevent the model from generating harmful output.
System reminders counteract instruction decay, keeping the model aligned across long sessions.
Feature flags enable continuous deployment without breaking changes.
Telemetry helps Anthropic improve the product.
Auto-compaction prevents sessions from crashing when the context window fills.

The concern is not that these mechanisms exist – it is that they operate without informed consent. Users do not see the system prompt, are not told about A/B tests, cannot easily audit which feature flags are active, and may not realize that their conversation history is being silently rewritten by a cheaper model during compaction.

The privacy controls exist (CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC, DISABLE_TELEMETRY, /privacy-settings), but they are opt-out rather than opt-in, and discovering them requires reading documentation or source code. A more transparent approach would surface the hidden token overhead in the UI (how many tokens are system prompt vs. user content), make feature flag assignments visible, and require explicit consent before rewriting conversation history.

Key Takeaways

Every API call carries 4,000–6,000 tokens of invisible system prompt. Prompt caching reduces the cost by 90%, but the tokens are still there.
50 system reminders inject context into your messages without appearing in the conversation UI. They accumulate across turns and cannot be cached.
757 feature flags fetched from GrowthBook allow Anthropic to change agent behavior remotely, including disabling user-configured safety settings and switching A/B test treatments.
Auto-compaction silently rewrites your conversation using a cheaper model. The original history is only preserved in local transcript files.
The privacy controls are opt-out, not opt-in. Telemetry, auto-updates, memory writes, and flag fetching are all active by default.
MCP servers remain a significant cost driver, though mitigated. Instructions have moved to system reminders (preserving the prompt cache), but each connected server still adds tool schema tokens to every turn, and connect/disconnect events inject delta reminders into the conversation stream.

Appendix: The Enterprise Remote Control Plane

Beyond GrowthBook feature flags, Claude Code implements a three-subsystem enterprise control plane that allows organizations to centrally manage agent behavior across their deployments. These subsystems are decoupled by design but share common patterns: eligibility checks, ETag-based HTTP caching, file persistence, hourly background polling, and fail-open resilience.

Remote Managed Settings

What it does. Any setting from ~/.claude/settings.json can be remotely overridden by enterprise admins. Settings are fetched from /api/claude_code/settings on startup and applied with highest priority — above user, project, and local settings. Changes trigger hot-reload of environment variables and feature gates without restart.

Control flow. On startup, loadRemoteManagedSettings() checks eligibility (API key users always eligible; OAuth requires Enterprise/Team subscription; 3P providers excluded), loads cached settings from ~/.claude/remote-settings.json, fetches from the API with ETag validation and up to 5 retries, then starts hourly background polling for mid-session changes.

Security check. When settings contain potentially dangerous overrides (e.g., redirecting ANTHROPIC_BASE_URL), a blocking UI dialog (ManagedSettingsSecurityDialog) requires explicit user approval before applying. Rejection triggers graceful shutdown.

Policy Limits

What it does. Organization-level feature restrictions via a policy matrix. Unlike settings (which configure behavior), policies enable or disable entire features. Three known policies:

Policy	Controls
`allow_remote_sessions`	Remote session spawning, teleport, bridge initialization
`allow_remote_control`	Bridge mode REPL initialization
`allow_product_feedback`	Feedback surveys, `/feedback` command

Fail-closed for HIPAA. Most policies fail open (unknown policy = allowed). But allow_product_feedback is in a special ESSENTIAL_TRAFFIC_DENY_ON_MISS set: if the policy cache is unavailable and essential-traffic-only mode is active, the feature is denied by default to prevent accidental data exfiltration in healthcare/compliance environments.

Settings Sync

What it does. Cross-environment settings replication between the interactive CLI and the Cloud Control Room (CCR). The interactive CLI uploads local settings in the background; CCR downloads them before plugin installation. Four file types are synced: user settings.json, user CLAUDE.md, and project-scoped variants of each.

Eligibility. OAuth only (API key users cannot sync). Files are capped at 500 KB. Project isolation uses a hash of the git remote URL.

Eligibility Matrix

User Type	Remote Settings	Policy Limits	Settings Sync
API key (Console)	Yes	Yes	No
OAuth Enterprise/Team	Yes	Yes	Yes
OAuth Paid/Plus	Yes	No	Yes
3P Provider (Bedrock/Vertex/Foundry)	No	No	No
Custom `ANTHROPIC_BASE_URL`	No	No	No

Source files: src/services/remoteManagedSettings/ (7 files, ~1,000 LOC), src/services/policyLimits/ (2 files, ~700 LOC), src/services/settingsSync/ (2 files, ~600 LOC).

For the prompt assembly pipeline that generates these hidden fragments, see Part III.1: Prompt Assembly Pipeline. For the compaction system that rewrites conversation history, see Part III.2: Context Compaction. For the hook system that intercepts notifications, see Part III.4: Hooks & Lifecycle Events.

All analysis based on source extracted from the v2.1.88 source map. Numbers are estimates based on static analysis; actual token counts vary by session configuration.