Skip to main content
Advanced Module 8 of 12

AI Tools & APIs

Building with AI programmatically

35 min read

Beyond the Chat Window

Chat interfaces are training wheels. They are great for learning and exploration, but they are fundamentally limited: you type, the AI responds, you copy-paste the result somewhere else. That workflow tops out fast.

APIs are where AI becomes a building block for whatever you can imagine. Instead of a conversation, you send a structured request and get a structured response -- programmatically, at scale, integrated into your own applications.

Want to automatically summarize every support ticket that comes in? That is an API call. Want to generate product descriptions for 10,000 items in your catalog? That is a loop with an API call inside it. Want to build an internal tool that answers employee questions using your company's documentation? That is a few hundred lines of code and an API.

The jump from chat to API is smaller than you think. If you can write a prompt in a chat window, you already know the hard part. The rest is plumbing.

Why APIs Matter

The limitations of chat interfaces become obvious once you try to use AI for anything beyond one-off tasks:

  • No automation. You cannot schedule a chat conversation to run at 3 AM every night.
  • No integration. Chat output lives in the chat window. Getting it into your database, spreadsheet, or application requires manual copy-paste.
  • No scale. Processing 100 documents means 100 manual conversations.
  • No customization. Chat interfaces give you the provider's UI. APIs let you build your own.

APIs solve all of these. They turn AI from a tool you use into a capability you build with. The difference is like the difference between using a calculator and embedding a calculation engine into your software.

Getting Started with the Claude API

The Claude API uses a messages format. You send a list of messages (with optional system instructions), and Claude responds with a message of its own. Here is the complete flow:

  1. Sign up at console.anthropic.com and create an API key
  2. Install the Python SDK (pip install anthropic) or use curl directly
  3. Send a request with your model, messages, and parameters
  4. Process the response in your application

Your First API Call with curl

curl is a command-line tool that sends HTTP requests. It is the fastest way to test an API without writing any code:

Claude API Call (curl) bash
curl https://api.anthropic.com/v1/messages \
  -H "content-type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-4-6",
    "max_tokens": 1024,
    "system": "You are a concise technical writer. Explain concepts in plain English.",
    "messages": [
      {
        "role": "user",
        "content": "What is an API?"
      }
    ]
  }'

Your First API Call with Python

Python is the most popular language for working with AI APIs. The official Anthropic SDK makes it clean and straightforward:

Claude API Call (Python) python
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from environment

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a concise technical writer. Explain concepts in plain English.",
    messages=[
        {"role": "user", "content": "What is an API?"}
    ]
)

print(message.content[0].text)

Understanding the Request

Every API call has the same core components:

  • model -- which AI model to use. Claude Opus 4.6 (claude-opus-4-6) for the most capable reasoning, Sonnet 4.6 (claude-sonnet-4-6) for the best balance of speed and capability, Haiku 4.5 (claude-haiku-4-5-20251001) for fast, cheap tasks.
  • max_tokens -- the maximum length of the response. One token is roughly 3/4 of a word.
  • system -- your system prompt (the instructions from Module 7).
  • messages -- the conversation history. Each message has a role ("user" or "assistant") and content.

The response comes back as a structured object with the model's reply, token usage statistics, and metadata about why the model stopped generating (hit the token limit, finished naturally, or used a tool).

The OpenAI API

The OpenAI API follows the same concept with slightly different syntax. If you understand one AI API, you understand them all -- the patterns are nearly identical.

OpenAI Chat Completions (Python) python
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from environment

response = client.chat.completions.create(
    model="gpt-5",
    messages=[
        {"role": "system", "content": "You are a concise technical writer."},
        {"role": "user", "content": "What is an API?"}
    ]
)

print(response.choices[0].message.content)

The key differences from Claude's API:

  • The system prompt is a message in the messages array (with role: "system") rather than a separate parameter
  • The response structure uses choices[0].message.content instead of content[0].text
  • OpenAI calls this endpoint "Chat Completions" while Anthropic calls theirs "Messages"

The conceptual model is the same: send messages, get a response, process the result. Once you learn one, switching to another takes minutes.

Key Concepts

Tokens and Pricing

AI APIs charge by the token. A token is a chunk of text -- roughly 3/4 of a word in English. The sentence "The quick brown fox" is about 5 tokens. Pricing is split between input tokens (what you send) and output tokens (what the model generates). Output tokens typically cost 3-5x more than input tokens.

For estimation: 1,000 tokens is approximately 750 words. A typical API call that sends a short prompt and gets a paragraph back might use 200-500 tokens total, costing fractions of a cent with most models.

Rate Limits

Every API has rate limits -- caps on how many requests you can make per minute or how many tokens you can process per day. These exist to prevent abuse and ensure fair access. When you hit a rate limit, the API returns a 429 error ("Too Many Requests"). The standard fix is exponential backoff: wait 1 second, then 2, then 4, then 8, doubling each time until the request succeeds.

Streaming

By default, API calls wait for the complete response before returning anything. Streaming sends the response back token-by-token as it is generated -- the same experience you see in chat interfaces where text appears word by word. Use streaming when you want to display results to a user in real time. Use non-streaming when you are processing results programmatically and just need the final output.

Temperature

Temperature controls randomness. At temperature=0, the model gives the most predictable response. At temperature=1, it gets more creative and varied. For factual tasks (summarization, extraction, analysis), use low temperature. For creative tasks (brainstorming, writing fiction), use higher temperature. Most use cases work well at the default (usually 1.0).

Handling Rate Limits with Exponential Backoff python
import anthropic
import time

client = anthropic.Anthropic()

def call_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return client.messages.create(
                model="claude-haiku-4-5-20251001",
                max_tokens=512,
                messages=messages
            )
        except anthropic.RateLimitError:
            wait_time = 2 ** attempt  # 1, 2, 4, 8, 16 seconds
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
    raise Exception("Max retries exceeded")

Building Your First AI-Powered Tool

Let's put this together by building something useful: a command-line tool that summarizes documents. You give it a file, it gives you a summary. Simple, practical, and a pattern that scales to much more complex applications.

Document Summarizer (Python) python
import anthropic
import sys

def summarize(file_path):
    # Read the document
    with open(file_path, "r") as f:
        content = f.read()

    # Initialize the client
    client = anthropic.Anthropic()

    # Send to Claude for summarization
    message = client.messages.create(
        model="claude-haiku-4-5-20251001",  # fast and cheap for summarization
        max_tokens=1024,
        system="Summarize the following document in 3-5 bullet points. "
               "Focus on the key facts and conclusions. Be concise.",
        messages=[
            {"role": "user", "content": content}
        ]
    )

    return message.content[0].text

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python summarize.py <file_path>")
        sys.exit(1)

    result = summarize(sys.argv[1])
    print(result)

That is the entire tool -- under 30 lines of code. Let's break down what is happening:

  1. Read the file passed as a command-line argument
  2. Initialize the client (it picks up your API key from the environment)
  3. Send the content to Claude with a system prompt that defines the summarization task
  4. Print the result

This pattern -- read input, send to AI, process output -- is the foundation of every AI-powered tool. The only things that change are the input source (files, databases, APIs, user input), the system prompt (what you want the AI to do), and the output destination (terminal, file, database, another API).

API Best Practices

Security

  • Never expose API keys in client-side code. Browser JavaScript, mobile apps, and public repositories are all visible to anyone. API calls with your key should happen on your server, never in the user's browser.
  • Use environment variables for all secrets. Never hardcode keys in source files.
  • Rotate keys if you suspect they have been exposed. Most providers let you create multiple keys and revoke them individually.

Cost Management

  • Choose the right model. Use Haiku for simple tasks, Sonnet for most work, Opus only when you need maximum capability. The cost difference between tiers can be 10-50x.
  • Set max_tokens appropriately. If you only need a one-sentence answer, do not set max_tokens to 4096.
  • Cache responses when possible. If the same prompt with the same input will always produce a similar result, store it instead of calling the API again.
  • Monitor usage. Set up billing alerts on your API provider dashboard. Runaway scripts with bugs can burn through budget quickly.

Structured Outputs

When you need machine-readable responses (not human-readable text), ask the model to respond in JSON. This makes it easy to parse the response and use it in your application:

Getting JSON Responses python
import anthropic
import json

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="Extract structured data from the text. Respond with valid JSON only, no other text.",
    messages=[
        {"role": "user", "content": "John Smith, age 34, works at Acme Corp as a senior engineer. He started in 2019 and manages a team of 5."}
    ]
)

data = json.loads(message.content[0].text)
print(data)
# {"name": "John Smith", "age": 34, "company": "Acme Corp", ...}
Key Takeaways
  • APIs turn AI from a chat tool into a building block you can integrate into any application, workflow, or automation
  • All AI APIs follow the same pattern: send messages in, get a response out -- learn one and the rest are variations
  • Store API keys in environment variables, never in source code or client-side applications
  • Choose the right model for the job: use cheaper, faster models for simple tasks and reserve powerful models for complex reasoning
  • Tokens are the unit of measurement -- roughly 3/4 of a word -- and you pay for both input and output tokens
  • Handle rate limits with exponential backoff and monitor your usage to avoid surprise bills
  • The jump from chat to API is smaller than it looks -- if you can write a prompt, you can build an AI-powered tool