Beyond the Chat Window
Chat interfaces are training wheels. They are great for learning and exploration, but they are fundamentally limited: you type, the AI responds, you copy-paste the result somewhere else. That workflow tops out fast.
APIs are where AI becomes a building block for whatever you can imagine. Instead of a conversation, you send a structured request and get a structured response -- programmatically, at scale, integrated into your own applications.
Want to automatically summarize every support ticket that comes in? That is an API call. Want to generate product descriptions for 10,000 items in your catalog? That is a loop with an API call inside it. Want to build an internal tool that answers employee questions using your company's documentation? That is a few hundred lines of code and an API.
The jump from chat to API is smaller than you think. If you can write a prompt in a chat window, you already know the hard part. The rest is plumbing.
Module 7 taught you to configure AI behavior. APIs let you do it programmatically -- at scale, integrated into your own applications.
Why APIs Matter
The limitations of chat interfaces become obvious once you try to use AI for anything beyond one-off tasks:
- No automation. You cannot schedule a chat conversation to run at 3 AM every night.
- No integration. Chat output lives in the chat window. Getting it into your database, spreadsheet, or application requires manual copy-paste.
- No scale. Processing 100 documents means 100 manual conversations.
- No customization. Chat interfaces give you the provider's UI. APIs let you build your own.
APIs solve all of these. They turn AI from a tool you use into a capability you build with. The difference is like the difference between using a calculator and embedding a calculation engine into your software.
API calls are simpler than most people expect. If you can follow a recipe, you can make an API call. This module shows you both curl (command-line) and Python examples. You can start with curl to understand the concept, then move to Python when you want to build something more substantial. AI itself can help you write the code.
Getting Started with the Claude API
The Claude API uses a messages format. You send a list of messages (with optional system instructions), and Claude responds with a message of its own. Here is the complete flow:
- Sign up at console.anthropic.com and create an API key
- Install the Python SDK (
pip install anthropic) or use curl directly - Send a request with your model, messages, and parameters
- Process the response in your application
Your First API Call with curl
curl is a command-line tool that sends HTTP requests. It is the fastest way to test an API without writing any code:
curl https://api.anthropic.com/v1/messages \
-H "content-type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-sonnet-4-6",
"max_tokens": 1024,
"system": "You are a concise technical writer. Explain concepts in plain English.",
"messages": [
{
"role": "user",
"content": "What is an API?"
}
]
}' Never hardcode API keys in your source code. Use environment variables (export ANTHROPIC_API_KEY=sk-ant-...) or a secrets manager. If a key ends up in a git commit, rotate it immediately. This is the single most common security mistake beginners make with APIs.
Your First API Call with Python
Python is the most popular language for working with AI APIs. The official Anthropic SDK makes it clean and straightforward:
import anthropic
client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from environment
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a concise technical writer. Explain concepts in plain English.",
messages=[
{"role": "user", "content": "What is an API?"}
]
)
print(message.content[0].text) Understanding the Request
Every API call has the same core components:
model-- which AI model to use. Claude Opus 4.6 (claude-opus-4-6) for the most capable reasoning, Sonnet 4.6 (claude-sonnet-4-6) for the best balance of speed and capability, Haiku 4.5 (claude-haiku-4-5-20251001) for fast, cheap tasks.max_tokens-- the maximum length of the response. One token is roughly 3/4 of a word.system-- your system prompt (the instructions from Module 7).messages-- the conversation history. Each message has arole("user" or "assistant") andcontent.
The response comes back as a structured object with the model's reply, token usage statistics, and metadata about why the model stopped generating (hit the token limit, finished naturally, or used a tool).
The OpenAI API
The OpenAI API follows the same concept with slightly different syntax. If you understand one AI API, you understand them all -- the patterns are nearly identical.
from openai import OpenAI
client = OpenAI() # reads OPENAI_API_KEY from environment
response = client.chat.completions.create(
model="gpt-5",
messages=[
{"role": "system", "content": "You are a concise technical writer."},
{"role": "user", "content": "What is an API?"}
]
)
print(response.choices[0].message.content) The key differences from Claude's API:
- The system prompt is a message in the
messagesarray (withrole: "system") rather than a separate parameter - The response structure uses
choices[0].message.contentinstead ofcontent[0].text - OpenAI calls this endpoint "Chat Completions" while Anthropic calls theirs "Messages"
The conceptual model is the same: send messages, get a response, process the result. Once you learn one, switching to another takes minutes.
Start with whichever provider you use most in chat. If you use Claude daily, start with the Claude API. If you use ChatGPT, start with OpenAI's. The skills transfer directly. Many production applications use multiple providers -- choosing the right model for each task based on capability, speed, and cost.
Key Concepts
Tokens and Pricing
AI APIs charge by the token. A token is a chunk of text -- roughly 3/4 of a word in English. The sentence "The quick brown fox" is about 5 tokens. Pricing is split between input tokens (what you send) and output tokens (what the model generates). Output tokens typically cost 3-5x more than input tokens.
For estimation: 1,000 tokens is approximately 750 words. A typical API call that sends a short prompt and gets a paragraph back might use 200-500 tokens total, costing fractions of a cent with most models.
Rate Limits
Every API has rate limits -- caps on how many requests you can make per minute or how many tokens you can process per day. These exist to prevent abuse and ensure fair access. When you hit a rate limit, the API returns a 429 error ("Too Many Requests"). The standard fix is exponential backoff: wait 1 second, then 2, then 4, then 8, doubling each time until the request succeeds.
Streaming
By default, API calls wait for the complete response before returning anything. Streaming sends the response back token-by-token as it is generated -- the same experience you see in chat interfaces where text appears word by word. Use streaming when you want to display results to a user in real time. Use non-streaming when you are processing results programmatically and just need the final output.
Temperature
Temperature controls randomness. At temperature=0, the model gives the most predictable response. At temperature=1, it gets more creative and varied. For factual tasks (summarization, extraction, analysis), use low temperature. For creative tasks (brainstorming, writing fiction), use higher temperature. Most use cases work well at the default (usually 1.0).
import anthropic
import time
client = anthropic.Anthropic()
def call_with_retry(messages, max_retries=5):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-haiku-4-5-20251001",
max_tokens=512,
messages=messages
)
except anthropic.RateLimitError:
wait_time = 2 ** attempt # 1, 2, 4, 8, 16 seconds
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
raise Exception("Max retries exceeded") Building Your First AI-Powered Tool
Let's put this together by building something useful: a command-line tool that summarizes documents. You give it a file, it gives you a summary. Simple, practical, and a pattern that scales to much more complex applications.
import anthropic
import sys
def summarize(file_path):
# Read the document
with open(file_path, "r") as f:
content = f.read()
# Initialize the client
client = anthropic.Anthropic()
# Send to Claude for summarization
message = client.messages.create(
model="claude-haiku-4-5-20251001", # fast and cheap for summarization
max_tokens=1024,
system="Summarize the following document in 3-5 bullet points. "
"Focus on the key facts and conclusions. Be concise.",
messages=[
{"role": "user", "content": content}
]
)
return message.content[0].text
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python summarize.py <file_path>")
sys.exit(1)
result = summarize(sys.argv[1])
print(result) That is the entire tool -- under 30 lines of code. Let's break down what is happening:
- Read the file passed as a command-line argument
- Initialize the client (it picks up your API key from the environment)
- Send the content to Claude with a system prompt that defines the summarization task
- Print the result
This pattern -- read input, send to AI, process output -- is the foundation of every AI-powered tool. The only things that change are the input source (files, databases, APIs, user input), the system prompt (what you want the AI to do), and the output destination (terminal, file, database, another API).
Notice the example uses Haiku 4.5, not Opus 4.6. For straightforward tasks like summarization, smaller models work just as well and cost a fraction of the price. Save the more capable (and expensive) models for tasks that genuinely require deep reasoning, complex analysis, or long-form generation. Your API bill will thank you.
API Best Practices
Security
- Never expose API keys in client-side code. Browser JavaScript, mobile apps, and public repositories are all visible to anyone. API calls with your key should happen on your server, never in the user's browser.
- Use environment variables for all secrets. Never hardcode keys in source files.
- Rotate keys if you suspect they have been exposed. Most providers let you create multiple keys and revoke them individually.
Cost Management
- Choose the right model. Use Haiku for simple tasks, Sonnet for most work, Opus only when you need maximum capability. The cost difference between tiers can be 10-50x.
- Set max_tokens appropriately. If you only need a one-sentence answer, do not set max_tokens to 4096.
- Cache responses when possible. If the same prompt with the same input will always produce a similar result, store it instead of calling the API again.
- Monitor usage. Set up billing alerts on your API provider dashboard. Runaway scripts with bugs can burn through budget quickly.
Structured Outputs
When you need machine-readable responses (not human-readable text), ask the model to respond in JSON. This makes it easy to parse the response and use it in your application:
import anthropic
import json
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="Extract structured data from the text. Respond with valid JSON only, no other text.",
messages=[
{"role": "user", "content": "John Smith, age 34, works at Acme Corp as a senior engineer. He started in 2019 and manages a team of 5."}
]
)
data = json.loads(message.content[0].text)
print(data)
# {"name": "John Smith", "age": 34, "company": "Acme Corp", ...} Sign up for an API key at console.anthropic.com (Anthropic) or platform.openai.com (OpenAI). Both offer free trial credits. Then run this curl command from your terminal (replace the API key placeholder with your actual key):
curl https://api.anthropic.com/v1/messages -H "content-type: application/json" -H "x-api-key: YOUR_KEY" -H "anthropic-version: 2023-06-01" -d '...'
When you see the JSON response come back, you have just made your first AI API call. Everything from here builds on this exact pattern.
- APIs turn AI from a chat tool into a building block you can integrate into any application, workflow, or automation
- All AI APIs follow the same pattern: send messages in, get a response out -- learn one and the rest are variations
- Store API keys in environment variables, never in source code or client-side applications
- Choose the right model for the job: use cheaper, faster models for simple tasks and reserve powerful models for complex reasoning
- Tokens are the unit of measurement -- roughly 3/4 of a word -- and you pay for both input and output tokens
- Handle rate limits with exponential backoff and monitor your usage to avoid surprise bills
- The jump from chat to API is smaller than it looks -- if you can write a prompt, you can build an AI-powered tool