%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
OAuth["<b>OAuth and Auth Flows</b>"]
Prov["<b>Multi-Provider Adapters</b>"]
Flags["<b>Feature Flags (88+50+)</b>"]
Unified["<b>Unified API Call</b>"]
OAuth --> Unified
Prov --> Unified
Flags --> Unified
style OAuth fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Prov fill:#9CAF88,color:#fff,stroke:#7A8D68
style Flags fill:#C2856E,color:#fff,stroke:#A06A54
style Unified fill:#B39EB5,color:#fff,stroke:#8E7A93
Auth, Providers & Feature Flags
OAuth, Adapters, and Continuous Delivery Infrastructure
How do you log a user into a terminal application that has no browser? How does one codebase talk to four different cloud providers, each with its own authentication scheme, model ID format, and feature timeline? And how do you ship 88 experimental features inside a CLI tool without breaking production? These three problems – authentication, multi-provider support, and feature flags – form the invisible infrastructure layer that separates a demo from a product. None of them involve clever prompts or agent loops, but all of them are essential to making Claude Code work at enterprise scale.
This post covers the three pillars of Claude Code’s infrastructure: OAuth flows adapted for terminal environments, the Adapter pattern applied to LLM providers, and a two-tier feature flag system that enables continuous delivery in a CLI tool.
How to read this diagram. Three boxes at the top represent the three infrastructure pillars – OAuth and Auth Flows, Multi-Provider Adapters, and Feature Flags – each solving a distinct enterprise requirement. All three arrows converge on the single Unified API Call box at the bottom, showing that every API request must pass through all three systems. Removing any one pillar would break the production chain.
Source files covered in this post:
| File | Purpose | Size |
|---|---|---|
src/utils/auth.ts |
Authentication utilities (token management, keyring) | ~800 LOC |
src/services/oauth/ |
OAuth 2.0 PKCE + device code flows | 5 files |
src/utils/model/model.ts |
Model selection and routing logic | ~400 LOC |
src/utils/model/providers.ts |
Multi-provider support (Anthropic, Bedrock, Vertex, Azure) | ~300 LOC |
src/utils/model/bedrock.ts |
AWS Bedrock-specific adapter | ~200 LOC |
src/services/api/getModel.ts |
Runtime model selection (getRuntimeMainLoopModel()) |
~200 LOC |
src/services/analytics/growthbook.ts |
GrowthBook feature flag client | ~300 LOC |
src/services/analytics/ |
Telemetry, Datadog metrics, event logging | 9 files |
src/utils/settings/settings.ts |
Settings management (user/project/local/managed) | ~500 LOC |
OAuth for Terminal Apps – When There Is No Browser
CLI applications face a unique authentication challenge: the standard “redirect to browser, get a callback” flow assumes a GUI that terminal apps do not have.
Think about how you log into a web application. You click “Sign in with Google,” a browser opens, you approve, and the browser redirects back to the application. This works because the web app has a URL the authorization server can redirect to. A terminal application has no URL. It has stdin and stdout. This mismatch between OAuth’s browser-centric design and the terminal’s text-only interface is the core UX challenge Claude Code must solve.
Claude Code implements two OAuth flows, each targeting a different environment. The choice is not a preference – it is dictated by whether a browser is physically available.
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
Start["User runs claude<br><i>no valid API key</i>"]
Check{"Browser<br>available?"}
PKCE["PKCE Flow<br><i>Browser opens locally</i>"]
Device["Device Code Flow<br><i>Any device with a browser</i>"]
Start --> Check
Check -- "Yes" --> PKCE
Check -- "No" --> Device
style Start fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Check fill:#9CAF88,color:#fff,stroke:#7A8D68
style PKCE fill:#C2856E,color:#fff,stroke:#A06A54
style Device fill:#B39EB5,color:#fff,stroke:#8E7A93
How to read this diagram. Start at the top where the user runs claude without a valid API key. The diamond decision node asks a single question: is a browser available? The “Yes” branch leads to the PKCE flow (browser opens locally on the same machine), while the “No” branch leads to the Device Code flow (user enters a short code on any device with a browser). This binary decision ensures every environment – from GUI laptops to headless SSH sessions – has a viable authentication path.
PKCE: The Localhost Trick
PKCE (Proof Key for Code Exchange) – a protocol that lets public clients (applications that cannot store secrets, like CLI tools) authenticate securely – is the primary flow for developers at their laptops.
The trick is simple but clever: Claude Code spins up a temporary HTTP server on localhost, then opens the user’s browser to the authorization URL. When the user approves, the authorization server redirects back to http://localhost:{PORT}/callback, where Claude Code’s ephemeral server is listening. The CLI captures the authorization code, shuts down the server, and exchanges the code for tokens.
The “proof key” part adds a critical security layer. Before opening the browser, Claude Code generates a random code_verifier and sends only its SHA-256 hash (the code_challenge) to the authorization server. When exchanging the authorization code for tokens, Claude Code proves it originated the request by presenting the original verifier. Even if an attacker intercepts the authorization code, they cannot exchange it without the verifier.
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
sequenceDiagram
participant CC as Claude Code
participant BR as Browser
participant AUTH as Auth Server
CC->>CC: Generate verifier + SHA-256 challenge
CC->>BR: Open browser (+ challenge)
BR->>AUTH: GET /authorize
AUTH->>BR: Consent screen
BR->>AUTH: User approves
AUTH->>BR: Redirect (code + state)
BR->>CC: localhost callback (auth code)
CC->>AUTH: POST /token + code verifier
AUTH->>CC: access token + refresh token
CC->>CC: Store in ~/.claude/oauth/
How to read this diagram. Time flows downward in this sequence diagram. The three participants (Claude Code, Browser, Auth Server) exchange messages in a seven-step handshake. Start at the top where Claude Code generates the verifier and challenge, then follow the arrows down through the browser redirect flow. The critical security property is visible in the asymmetry: the SHA-256 challenge is sent early (step 2), but the original verifier is only revealed at the token exchange (step 8), preventing interception attacks. The flow ends with tokens stored locally for silent re-authentication. ### Device Code Flow: For Headless Environments
The PKCE flow requires a local browser. For SSH sessions, remote servers, CI pipelines, and Docker containers, Claude Code falls back to the Device Code flow – a protocol designed for devices with limited input, like smart TVs, that maps perfectly to “I am SSHed into a server with no GUI.”
The flow decouples the two parties entirely. Claude Code requests a device code from the server, then displays a URL and a short code in the terminal:
Visit: https://claude.ai/device
Enter code: ABCD-1234
The user opens that URL on any device with a browser – their phone, another laptop, a tablet – and enters the code. Meanwhile, Claude Code polls the token endpoint at regular intervals, waiting for approval. Once the user approves on their phone, the next poll returns the tokens.
This is the same pattern Netflix uses when you sign in on a new TV. The insight is that authentication does not require the authenticating device and the authorizing device to be the same machine.
Credential Storage
Both flows store credentials identically: ~/.claude/oauth/credentials.json with atomic writes and file permission checks. A refresh token enables silent re-authentication when the access token expires. For containerized environments where mounting files is awkward, CLAUDE_CODE_OAUTH_TOKEN_FILE_DESCRIPTOR passes the token via a file descriptor – a Unix-native pattern that avoids touching the filesystem entirely.
Multi-Provider Support – The Adapter Pattern in Action
Claude Code supports four API providers through a single internal interface. This is the Adapter pattern from your design patterns textbook, applied at the scale of cloud infrastructure.
The motivation is competitive, not just technical. Enterprise customers do not want to create new vendor relationships when they already have AWS or Google Cloud contracts with negotiated pricing, compliance certifications, and existing billing. Multi-provider support lets Claude Code meet customers where their infrastructure already lives.
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
Core["<b>Claude Code Core</b><br><i>model, messages, tools</i>"]
Iface["<b>Unified Interface: createMessage()</b>"]
Core --> Iface
Anth["<b>Anthropic API</b><br>API key / OAuth<br>claude-opus-4-6"]
Bed["<b>AWS Bedrock</b><br>IAM / SSO<br>anthropic.claude-opus-4-6-v1:0"]
Vert["<b>Google Vertex</b><br>Google OAuth<br>claude-opus-4-6@20251001"]
Azure["<b>Azure Foundry</b><br>Azure AD<br>claude-opus-4-6"]
Iface --> Anth
Iface --> Bed
Iface --> Vert
Iface --> Azure
style Core fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Iface fill:#9CAF88,color:#fff,stroke:#7A8D68
style Anth fill:#C2856E,color:#fff,stroke:#A06A54
style Bed fill:#B39EB5,color:#fff,stroke:#8E7A93
style Vert fill:#C4A882,color:#fff,stroke:#A08562
style Azure fill:#8E9B7A,color:#fff,stroke:#6E7B5A
How to read this diagram. Start at the top with Claude Code Core, which produces a generic request (model, messages, tools). The arrow passes through the Unified Interface (createMessage()), which is the single point of abstraction. From there, four arrows fan out to the four provider backends, each showing its distinct authentication scheme and model ID format. The core engine never sees provider-specific differences – all translation happens at this adapter boundary.
Provider Selection: A Priority Chain
The selection logic is a simple priority chain, implemented in getAPIProvider():
function getAPIProvider(): 'anthropic' | 'bedrock' | 'vertex' | '3p' {
if (process.env.CLAUDE_CODE_USE_BEDROCK) return 'bedrock'
if (process.env.CLAUDE_CODE_USE_VERTEX) return 'vertex'
if (process.env.ANTHROPIC_BASE_URL) return '3p'
return 'anthropic' // default
}The order matters. If both CLAUDE_CODE_USE_BEDROCK and CLAUDE_CODE_USE_VERTEX are set (a misconfiguration), Bedrock wins. The 3p (third-party) provider is a catch-all for any Anthropic-compatible API – local proxies, compliance gateways, or alternative deployments.
This is the Chain of Responsibility pattern. Each provider check is a handler in a chain. The first handler that matches takes ownership. Compare this to how Express middleware resolves routes, or how Java exception handlers walk up the catch chain.
Model ID Normalization: The Translation Layer
Each provider uses a different format for model identifiers. The internal name claude-opus-4-6 must be translated per-provider before every API call:
| Internal ID | Anthropic | Bedrock | Vertex |
|---|---|---|---|
claude-opus-4-6 |
claude-opus-4-6 |
anthropic.claude-opus-4-6-v1:0 |
claude-opus-4-6@20251001 |
claude-sonnet-4-6 |
claude-sonnet-4-6 |
anthropic.claude-sonnet-4-6-v1:0 |
claude-sonnet-4-6@20251001 |
The normalization layer also handles fallback chains: Opus falls back to Sonnet, Sonnet to Haiku. A configuration that works on the Anthropic API still functions on Bedrock even if the exact model version is not yet available.
This normalization function (normalizeModelStringForAPI()) is the core of the Adapter pattern. Claude Code’s internal code never thinks about provider-specific formats. It uses canonical model names everywhere, and the adapter layer translates at the boundary. ### Intelligent Model Selection
Claude Code does not use a single model for all operations. The getRuntimeMainLoopModel() function implements cost-aware routing:
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
Start["<b>Model Selection</b>"]
Q1{"Is this<br>compaction?"}
Haiku["Haiku<br><i>cheapest</i>"]
Q2{"Subagent model<br>override?"}
Override["Agent's specified model"]
Q3{"Plan mode +<br>over 200K tokens?"}
Sonnet["Sonnet<br><i>demoted</i>"]
Default["User's configured model<br><i>default: Sonnet</i>"]
Start --> Q1
Q1 -- "Yes" --> Haiku
Q1 -- "No" --> Q2
Q2 -- "Yes" --> Override
Q2 -- "No" --> Q3
Q3 -- "Yes" --> Sonnet
Q3 -- "No" --> Default
style Start fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Q1 fill:#9CAF88,color:#fff,stroke:#7A8D68
style Haiku fill:#C2856E,color:#fff,stroke:#A06A54
style Q2 fill:#B39EB5,color:#fff,stroke:#8E7A93
style Override fill:#C4A882,color:#fff,stroke:#A08562
style Q3 fill:#8E9B7A,color:#fff,stroke:#6E7B5A
style Sonnet fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Default fill:#9CAF88,color:#fff,stroke:#7A8D68
How to read this diagram. Start at the top “Model Selection” node and follow the decision tree downward through three diamond-shaped decision points. At each branch, the “Yes” path exits to a specific model (Haiku for compaction, the agent’s override model, Sonnet for long planning sessions), while the “No” path continues to the next check. If all three checks fail, the flow reaches the default: the user’s configured model. The key takeaway is that cost optimization happens automatically – cheaper models are selected whenever the task does not require frontier reasoning.
The plan-mode demotion is a pragmatic cost decision. Long planning sessions accumulate hundreds of thousands of tokens, and paying Opus pricing for every turn would be prohibitively expensive. Sonnet handles plan reasoning at a fraction of the cost. Compaction always uses Haiku – summarizing conversation history is a well-structured task that does not require deep reasoning.
This tiered model routing trades optimal quality for cost predictability. A user on Opus still gets Haiku for compaction and Sonnet for long planning sessions. The system optimizes globally (minimize total session cost) rather than locally (use the best model for every turn).
Feature Flags as Deployment Infrastructure
Claude Code ships 88+ build-time feature flags and 50+ runtime flags. This is not technical debt – it is the continuous delivery infrastructure that lets a small team ship weekly to millions of users without breaking production.
Feature flags are standard in web applications – Netflix showing you a redesigned homepage while your neighbor sees the old one. What is unusual about Claude Code is the scale of feature flagging inside a CLI tool and the two-tier architecture that makes it work.
Tier 1: Build-Time Flags – Dead Code Elimination
Build-time flags are evaluated by the Bun bundler at compilation time. They are not just conditional checks – they are tree-shaking boundaries. When a flag evaluates to false, the bundler eliminates the entire code path, including all imports, string literals, and side effects:
if (feature('VOICE_MODE')) {
// When VOICE_MODE is false, this block AND the
// ./voice module are eliminated from the bundle
const voice = await import('./voice')
voice.startStreaming()
}This is more aggressive than runtime feature flags. A runtime flag keeps the code in the bundle and skips it at execution time. A build-time flag removes the code entirely, reducing bundle size and ensuring that unreleased features cannot be reverse-engineered from the shipped binary.
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
subgraph T1["TIER 1: BUILD-TIME (bundler)"]
Src["Source Code"]
BF{"flag = true?"}
InB["Code IN the bundle"]
Rem["Code REMOVED entirely"]
Src --> BF
BF -- "Yes" --> InB
BF -- "No" --> Rem
end
subgraph T2["TIER 2: RUNTIME (feature service)"]
App["Running App"]
RF{"User in rollout?"}
Act["Feature ACTIVE<br>+ metrics collected"]
Inact["Feature INACTIVE<br>default value"]
App --> RF
RF -- "Yes" --> Act
RF -- "No" --> Inact
end
style Src fill:#8B9DAF,color:#fff,stroke:#6E7F91
style BF fill:#9CAF88,color:#fff,stroke:#7A8D68
style InB fill:#C2856E,color:#fff,stroke:#A06A54
style Rem fill:#B39EB5,color:#fff,stroke:#8E7A93
style App fill:#C4A882,color:#fff,stroke:#A08562
style RF fill:#8E9B7A,color:#fff,stroke:#6E7B5A
style Act fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Inact fill:#9CAF88,color:#fff,stroke:#7A8D68
How to read this diagram. The two subgraphs represent the two tiers operating at different stages of the software lifecycle. In Tier 1 (left, build-time), source code passes through a bundler flag check: “true” includes the code in the binary, “false” removes it entirely via tree-shaking. In Tier 2 (right, runtime), the running app checks whether the user is in the rollout: “yes” activates the feature with metrics, “no” falls back to the default. The two tiers are complementary – Tier 1 controls what ships in the binary, Tier 2 controls what activates for each user.
| Result (Tier 1) | Result (Tier 2) |
|---|---|
| Smaller bundle | Per-user targeting |
| No dead code paths | Gradual rollout (5% to 50%) |
| Features invisible | A/B testing |
| 88+ flags | Instant rollback, 50+ flags |
The most heavily gated features (in a v2.1.88 snapshot; counts shift between releases) reveal Claude Code’s roadmap:
| Flag | Approx. References | What It Gates |
|---|---|---|
| KAIROS | ~154 | Asynchronous background agent work |
| TRANSCRIPT_CLASSIFIER | ~107 | ML-based auto-mode decision making |
| TEAMMEM | ~51 | Team memory synchronization |
| VOICE_MODE | ~46 | Speech-to-text streaming input |
| PROACTIVE | ~37 | Agent suggests actions unprompted |
| COORDINATOR_MODE | ~32 | Multi-agent swarm orchestration |
KAIROS, at roughly 154 references in that snapshot, touches the agent loop, the UI, session management, and the SDK. Its deep integration suggests a major unreleased capability: agents that work in the background while the developer does something else.
Tier 2: Runtime Flags – Gradual Rollout
Runtime flags complement build-time flags by controlling behavior after the code has shipped. They are evaluated against user identity, organization membership, and rollout percentages:
getFeatureValue_CACHED_MAY_BE_STALE('tengu_fast_mode', false)The function name is deliberately verbose. CACHED_MAY_BE_STALE warns callers that the returned value might be slightly outdated. Flag values are fetched from the feature service and cached locally with a staleness tolerance. This prioritizes latency (no network call on every flag check) over strict consistency (a rollout change might take a few minutes to propagate).
Runtime flags enable capabilities that build-time flags cannot: gradual rollout (5% of users, then 25%, then 100%), A/B testing, per-organization targeting, and instant rollback without a new deployment. ### The Interaction Between Tiers
A feature might live behind both tiers simultaneously. The build-time flag ensures the code is not shipped to users who should never see it. The runtime flag controls gradual rollout among users who have the code. This layered gating is how Anthropic can safely experiment with major features like voice input or coordinator mode without risking the stability of the core product.
Cost Tracking – One Interface, Four Pricing Models
Every LLM provider charges differently, but users need a single, consistent view of what they are spending. The cost model abstraction normalizes diverse pricing into one interface.
Each provider has its own pricing for input tokens, output tokens, and cached tokens. The cost tracking system must handle all of these transparently. Every API response includes input_tokens and output_tokens counts. The client tracks these per-request and aggregates them per-session, enabling the cost display in the terminal UI and the token budget enforcement system.
The abstraction looks simple from the outside – a single cost number in the status bar – but behind it is a normalization layer that maps provider-specific usage data to a uniform cost model. Different providers may also report usage differently (some include thinking tokens separately, some bundle them with output), so the normalization is not just about pricing but about what counts as a “token” in the first place.
The Configuration Hierarchy – Five Levels of Override
Claude Code’s configuration is a five-level priority chain that balances team conventions, personal preferences, and deployment requirements.
The model follows the same precedence pattern as CSS specificity cascades, DNS resolution, or Git config (repository, then global, then system). Each level can override the level below it:
%%{init: {'theme': 'neutral', 'flowchart': {'useMaxWidth': false, 'htmlLabels': true, 'padding': 20, 'nodeSpacing': 30, 'rankSpacing': 40}, 'themeVariables': {'primaryColor': '#8B9DAF', 'primaryTextColor': '#ffffff', 'primaryBorderColor': '#6E7F91', 'secondaryColor': '#9CAF88', 'secondaryTextColor': '#ffffff', 'secondaryBorderColor': '#7A8D68', 'tertiaryColor': '#C2856E', 'tertiaryTextColor': '#ffffff', 'tertiaryBorderColor': '#A06A54', 'lineColor': '#B5A99A', 'textColor': '#4A4A4A', 'mainBkg': '#8B9DAF', 'nodeBorder': '#6E7F91', 'clusterBkg': 'rgba(139,157,175,0.12)', 'clusterBorder': '#B5A99A', 'edgeLabelBackground': 'transparent'}}}%%
flowchart TD
Env["<b>1. Environment Variables (447 total)</b><br><i>CI/CD, Docker, debugging</i>"]
Proj["<b>2. Project Settings</b><br><i>.claude/settings.json -- version-controlled, team-shared</i>"]
Local["<b>3. Project Local Settings</b><br><i>.claude/settings.local.json -- gitignored, personal</i>"]
User["<b>4. User Settings</b><br><i>~/.claude/settings.json -- global preferences</i>"]
Def["<b>5. Built-in Defaults</b><br><i>Hardcoded sensible fallbacks</i>"]
Env --> Proj --> Local --> User --> Def
style Env fill:#8B9DAF,color:#fff,stroke:#6E7F91
style Proj fill:#9CAF88,color:#fff,stroke:#7A8D68
style Local fill:#C2856E,color:#fff,stroke:#A06A54
style User fill:#B39EB5,color:#fff,stroke:#8E7A93
style Def fill:#C4A882,color:#fff,stroke:#A08562
How to read this diagram. Read top to bottom as a priority chain: the highest-priority source (Environment Variables, level 1) is at the top, and the lowest (Built-in Defaults, level 5) is at the bottom. Arrows indicate fallback order – the system checks each level in sequence and uses the first defined value it finds. Levels 1-2 are typically set by teams and CI systems, level 3 is a personal gitignored escape hatch, level 4 holds global user preferences, and level 5 provides hardcoded fallbacks when nothing else is configured.
The gitignored settings.local.json is a small but important design detail. It acknowledges that developers need escape hatches – personal MCP servers, relaxed permissions during debugging, alternative API keys for testing – without polluting the team configuration.
CLAUDE.md files follow a separate discovery mechanism, walking up the directory tree from the current working directory. This supports monorepo architectures where instructions cascade from the repository root through workspace directories to individual packages. External includes from directories outside the project require explicit user approval, a security measure against malicious dependencies injecting instructions into the agent’s system prompt.
Retry and Error Recovery – Not All Failures Are Equal
The retry system distinguishes between errors that might resolve on their own and errors that require a fundamentally different strategy.
This distinction is critical. A 529 (Overloaded) error is transient – wait and retry with exponential backoff. A 413 (Prompt Too Long) error will never succeed on retry – the request itself must change.
| Error Type | Strategy | Analogy |
|---|---|---|
| 529 Overloaded | Exponential backoff with jitter | Traffic jam: wait and retry |
| Network errors | Quick retry (often resolves in seconds) | Dropped call: redial |
| 413 Prompt Too Long | Trigger reactive compaction, then retry | Suitcase too full: repack |
| 401/403 Auth error | Attempt token refresh, else re-authenticate | Expired badge: get a new one |
| 400 Bad Request | Do not retry (bug in request construction) | Wrong address: retrying won’t help |
The 413 recovery path is elegant. When the API reports the prompt is too long, the retry handler invokes reactive compaction (covered in Part III.2), which summarizes older messages to reduce token count. The request is reconstructed with the compacted history and retried. This creates a self-healing loop where Claude Code automatically manages its own context window rather than failing and asking the user to trim manually.
The streaming fallback is also notable. When a streaming response fails mid-way, the system can switch to a fallbackModel rather than retrying the same model. Crucially, this fallback is non-recursive: if the fallback model also fails, the error propagates to the user. This prevents cascading retries that could consume API credits while producing nothing useful.
Summary
The infrastructure layer of Claude Code reveals principles that apply far beyond AI agents:
OAuth for CLI tools is a solved problem with two complementary flows. PKCE works when a browser is available (the localhost server trick). Device Code works everywhere else (decouple the authenticating device from the authorizing device). Together, they cover every developer environment from laptops to SSH sessions to CI containers.
Multi-provider support is the Adapter pattern at cloud scale. One canonical internal representation, translation at the boundaries. The same principle that drives character encoding normalization and database abstraction layers applies to LLM API providers. The key insight: normalize not just the API but the metrics, model IDs, and error codes too.
Two-tier feature flags combine safety with flexibility. Build-time flags eliminate unreleased code from the binary (safe). Runtime flags enable gradual rollout and instant rollback (flexible). Neither tier alone is sufficient. Together, they let a team ship weekly to millions of users without breaking production.
Cost-aware model routing is resource scheduling in disguise. Using Haiku for compaction, Sonnet for long planning sessions, and Opus for complex reasoning is the same resource allocation problem as scheduling CPU-bound tasks on fast cores and I/O-bound tasks on efficient cores.
Configuration hierarchies should respect both teams and individuals. The five-level priority chain gives teams enforcement (version-controlled project settings) while giving individuals escape hatches (gitignored local settings). Environment variables serve as the ultimate override for automation.
The invisible infrastructure is what makes the visible agent experience possible. Every token Claude Code generates has been authenticated, routed to the correct provider, shaped by active feature flags, and configured through a five-level priority chain. When it all works – which is nearly always – nobody thinks about it. That is the highest compliment an infrastructure layer can receive.
Next: Part VI.1: Model Context Protocol – where Claude Code connects to external tools and services through a universal protocol, extending the agent’s capabilities beyond its built-in tool set.