Context Engineering | Human.MD

From Prompts to Context

Prompt engineering is about what you say. Context engineering is about what you give.

This distinction matters more than most people realize. You can write the most perfectly crafted prompt in the world, but if the AI does not have the right information to work with, the output will be limited. The shift from "how do I ask better questions?" to "how do I architect better information?" is what separates competent AI users from truly effective ones.

Think of it this way: a brilliant consultant who shows up to a meeting with no background on your company, no access to your data, and no understanding of your industry will still give generic advice -- no matter how good your question is. Context engineering is about making sure the AI has what it needs before you ask.

Understanding Context Windows

Every AI model has a context window -- the total amount of text it can process in a single conversation. Think of it as the model's working memory: everything it can "see" at once, including your instructions, any documents you provide, the conversation history, and its own responses.

Context windows are measured in tokens, which are roughly pieces of words. A token is about 3-4 characters in English -- so "artificial intelligence" is about 4 tokens. A page of text is roughly 500-750 tokens.

Current Context Window Sizes (2026)

Claude (Anthropic) -- 200K tokens standard, up to 1M tokens with Opus 4.6 and Sonnet 4.6. That is roughly 150,000-750,000 words, or approximately 500-2,500 pages of text.
GPT (OpenAI) -- Up to 1M tokens with GPT-5.4, which unifies the Codex and GPT lines into a single frontier model. Earlier models had smaller windows: 400K with GPT-5.2, 196K with GPT-5 in thinking mode. The GPT-4.1 series supported up to 1M tokens but was retired in Feb 2026.
Gemini (Google) -- Up to 1M tokens with Gemini 3.1 Pro, 2.5 Pro, and Flash models. Gemini 3.1 Pro also supports up to 64K-token output. Google has pushed context size further than anyone, supporting roughly the equivalent of 700,000 words.

What Happens When You Exceed the Window

When a conversation exceeds the context window, the model must drop older information. In most chat interfaces, this happens silently -- the AI simply "forgets" the earliest parts of the conversation. This is why long conversations can feel like the AI loses track of things you discussed earlier. It literally has.

Some systems handle this with compaction -- automatically summarizing older context to fit more conversation within the window. Others use sliding windows that drop the oldest messages. Understanding which approach your tool uses is important for managing long sessions effectively.

The Information Architecture

Context engineering is fundamentally about information architecture -- deciding what to include, what to leave out, and how to organize what you provide. The goal is maximum relevance with minimum noise.

The Newspaper Editor Approach

Think like a newspaper editor writing a front page story. The most important information goes first. Details come later. Background fills in gaps. This is the inverted pyramid, and it works exceptionally well for AI context:

Lead with the goal -- what you need the AI to accomplish
Essential context -- the specific information required to do this task well
Supporting details -- background, constraints, preferences
Reference material -- documents, examples, data that the AI may need to consult

Poorly Structured Context

Unstructured

Here's our company wiki page about returns... [2000 words]

And here's our customer service manual chapter 4... [3000 words]

And here's the email from the customer... [200 words]

Oh and we recently changed our policy... [500 words]

What should we tell this customer?

Well-Structured Context

Structured

TASK: Draft a reply to a customer requesting a return outside our normal window. Be empathetic but follow current policy.

Current Policy (updated Jan 2026)

Standard returns: 30 days, no questions
Extended returns: 31-60 days with manager approval
Beyond 60 days: store credit only, case-by-case

Customer Situation

Purchased: Nov 15, 2025 (67 days ago)
Product: Wireless headphones, $89
Issue: Left earphone stopped working
Tone: Frustrated but polite
Previous interactions: None

Guidelines

Our brand voice is warm and solution-oriented
For defective products, we're more flexible on timing
Offer the best resolution we can within policy

The second version contains less text but far more useful information. Every detail is relevant, clearly labeled, and organized by importance. The AI does not have to wade through thousands of words to find the three facts that actually matter.

RAG -- Retrieval-Augmented Generation

RAG is one of the most important concepts in practical AI today, and it is simpler than it sounds. Here is the core idea: instead of relying only on what the AI learned during training, you give it access to specific documents and data at the time of your conversation.

Think of the difference between asking someone a question from memory versus asking them the same question while they have the relevant files open on their desk. That is the difference between a plain AI query and a RAG-powered one.

How RAG Works (Conceptually)

You ask a question -- "What's our refund policy for enterprise clients?"
The system searches -- it looks through your company's documents, finds the relevant policy pages
Relevant content is retrieved -- the specific paragraphs about enterprise refunds are pulled out
The AI reads and answers -- it generates a response based on the actual policy documents, not its general training

The critical benefit is accuracy. Without RAG, the AI answers from its general knowledge -- which might be outdated, incomplete, or just wrong for your specific situation. With RAG, it answers from your actual source material.

Why RAG Matters

Current information -- AI training data has a cutoff date. RAG lets the model access information from today.
Your specific data -- the AI was not trained on your company's internal documents, but RAG lets it use them.
Reduced hallucination -- when the AI has the source material right in front of it, it is far less likely to make things up.
Verifiability -- you can check the AI's answer against the source documents it referenced.

Long-Form Context Strategies

As context windows grow larger, you increasingly need strategies for working with substantial amounts of information -- long documents, multiple files, extended conversation histories. Here are the most effective approaches.

Chunking

When working with long documents, break them into logical sections rather than feeding everything at once. This is especially important when different parts of the document are relevant to different questions.

The Summary-Then-Detail Approach

Setup

I'm going to share a 50-page contract for review. Here's my approach:

Summary Layer

First, here's a one-page executive summary of the key terms:

[summary]

Detail Layer

The specific sections I need you to focus on are:

Section 4.2 (Liability): [paste relevant section]
Section 7.1 (Termination): [paste relevant section]
Section 9 (IP Rights): [paste relevant section]

Questions

Are there any unusual liability clauses compared to standard SaaS agreements?
What are the termination notice requirements?
Who owns derivative works created using the platform?

This approach gives the AI the big picture (the summary) and the specific details (the relevant sections) without overwhelming it with 50 pages of boilerplate that is irrelevant to your questions.

Multi-Conversation Strategies

Sometimes the best approach is not one long conversation but several shorter, focused ones:

Analysis first, synthesis second -- use one conversation to analyze a document, save the key findings, then start a new conversation for synthesis or recommendations
Parallel tracks -- analyze different aspects (financial, legal, technical) in separate conversations, then combine findings
Progressive refinement -- start broad, identify the most important areas, then dive deep in focused follow-up conversations

Dynamic Context and Tool Use

Everything we have discussed so far involves static context -- information you manually provide to the AI. But the frontier of context engineering is dynamic context: systems where the AI can pull in information on its own, in real time, as it needs it.

How Dynamic Context Works

Instead of you anticipating every piece of information the AI might need and pasting it into the prompt, dynamic context systems let the AI:

Search -- query databases, knowledge bases, or the web
Read files -- access documents on disk or in cloud storage
Call APIs -- pull live data from external services
Execute code -- run calculations or data analysis on the fly

This is the difference between handing someone a stack of documents and giving them a library card. With dynamic context, the AI can find what it needs when it needs it.

MCP: The Emerging Standard

The Model Context Protocol (MCP) is an open standard -- originally introduced by Anthropic and now maintained by the Linux Foundation -- that standardizes how AI models connect to external data sources and tools. Think of it as a universal adapter: instead of building custom integrations for every tool an AI might use, MCP provides a common interface.

MCP has been adopted by major AI providers including OpenAI and Google, making it the closest thing to a universal standard for AI tool use. It defines how models can:

Discover available tools and data sources
Request specific information or actions
Receive structured responses
Chain multiple tool calls together

Putting It All Together

Context engineering is a mindset shift. Instead of asking "how do I phrase this question better?", you ask "what information does the AI need to give me an excellent answer?" The question itself is often the easiest part. The hard -- and valuable -- work is curating, structuring, and delivering the right context.

As you work with AI more, you will develop an intuition for this. You will start noticing when output quality drops because of missing context rather than poor prompting. You will learn which types of information improve results the most for your specific use cases. And you will start thinking of AI not as a question-answering machine but as a reasoning engine that is only as good as the information you feed it.

Key Takeaways

Context engineering is about architecting the information you give AI, not just the questions you ask
Context windows are the AI's working memory -- organize for relevance, not volume
Lead with the most important information (inverted pyramid) and cut anything that fails the relevance test
RAG gives AI access to your specific, current data -- you're likely already using it when you upload documents
For long documents, use summary-then-detail and chunking rather than dumping everything in at once
Dynamic context (tools, MCP, APIs) is shifting AI from static question-answering to live information access
The quality of AI output is directly proportional to the quality of context you provide