Context, not memory
When people talk about AI "remembering" things, they're using the wrong word. AI language models don't have memory the way you do. They don't store information between sessions. They don't have a filing cabinet somewhere with notes about past conversations.
What they have is a context window: a block of text that they can see at any given moment. Everything the AI "knows" during a session is in that block — your messages, its responses, and any files you've shared. When the session ends, that block is gone.
What fits in a context window
Context windows are measured in tokens — roughly, chunks of text. A typical long-form article is about 2,000–3,000 tokens. Current Claude models have context windows large enough to hold hundreds of pages of text.
The practical limit isn't the window size — it's your attention to what you're putting in it. A context window loaded with irrelevant background, repeated instructions, and filler gives the AI more noise to work through.
The quality of the AI's output depends heavily on the quality of what's in the context window.
A filing analogy
Imagine you're briefing a researcher on a complex investigation. You could hand them a folder with your most relevant documents on top — or you could hand them an unsorted box of everything you've ever written, including grocery lists.
They're equally capable either way. But one setup produces better research faster. The context window is the desk they're working at. You decide what's on it.
How the cache works — and why order matters
When you send a message to Claude, it processes everything in the context window. For long sessions, this processing has a cost — both in time and, for API users, in money. To reduce that cost, Claude caches parts of the context that haven't changed.
The cache works by prefix: it stores the beginning of the context window and reuses it for subsequent messages. If the beginning of your context doesn't change between messages, Claude doesn't have to reprocess it — it reads from cache.
This means order matters. Put stable content — your context file, system instructions, background documents — at the top. Put dynamic content — today's task, this specific question — at the bottom. A timestamp or session-specific note at the top busts the cache on every message, which slows things down.
Four rules for efficient sessions
- 1. Stable context first, current task last. Your briefing document goes at the top. The specific question you're asking right now goes at the bottom.
- 2. Don't switch models mid-session. Each model has its own cache. Switching from Claude Sonnet to Claude Opus mid-session discards the cached context and starts fresh.
- 3. Add new context by appending, not rewriting. Adding a note at the end of your context file preserves the cached prefix. Rewriting the existing briefing forces Claude to reprocess everything from the edit point down.
- 4. Plan your session setup before starting. Five minutes spent thinking about what context you need saves a scattered session where you keep adding background mid-conversation.
What this means for journalism workflows
If you're building an automated pipeline — a script that processes new documents daily, or a scheduled task that checks sources — session setup matters a lot. A pipeline that starts each run with a stable briefing document is faster and cheaper than one that reconstructs context each time.
For one-off research sessions, the practical rule is simpler: start with context, end with task. Your CLAUDE.md file handles the first part automatically.
SOURCE: The cache behavior described here was documented by @trq212 (Anthropic), based on analysis of Claude Code's internals.
NEXT: Head to Module 4 to build your first CLI workflow pipeline.