Claude Workflows: When to Build an Agent, and When Not To

A support-triage agent is the canonical first project: hand Claude the inbox, a few tools, and the instruction to sort tickets and draft replies. It demos beautifully. In production it does things the demo never showed you — re-reads a ticket it already read, calls the refund API on a billing question because the API was in scope, spends nine model round-trips on what a switch statement does in one.

None of that is a malfunction. The agent is doing exactly what you built it to do: given a goal and a set of tools, it decides its own steps from scratch, every time. The trouble is that triage doesn't need decided steps — it has known ones. You reached for an agent when the task was a workflow.

What separates a workflow from an agent

The difference is not size or sophistication. It is who decides what happens next.

In a workflow, you decide. The steps live in your code — fetch, classify, route, draft — and the model supplies the fuzzy judgment inside a step (which category is this ticket?) while the control flow stays yours. In an agent, the model decides. You give it tools and a goal; it chooses each step at runtime and loops until it judges the task complete.

Triage has a fixed shape: read, categorize, route, reply. You know those steps before a single ticket arrives. Building it as an agent asks the model to rediscover that shape on every ticket, and it will — slightly differently each time. That variability is what you paid for flexibility you did not need.

🧠

Mental Model A workflow is a railway and an agent is a taxi. The railway runs a fixed route, cheaply and predictably; the taxi will reach anywhere, but it costs more and varies from trip to trip. Take the railway when it already goes where you need to go.

Why an agent's loop is expensive

An agent's capability and its risk come from the same mechanism: the loop. The model reasons, calls a tool, reads the result, and reasons again, repeating until it decides to stop.

For open-ended work, that loop is the point. But every turn is another chance to go wrong, and the errors accumulate: a small misreading on the third turn becomes a confident mistake on the seventh. Token cost rises with each turn, and because the path differs between runs, a failure you saw once can be hard to reproduce.

🏢

Enterprise Angle An agent's bill scales with the number of turns it takes — and turns are exactly what you stop controlling once you hand over the loop, a second reason to keep autonomy scoped. Note too that from June 15, 2026, Agent SDK and headless (claude -p) usage on subscription plans draws from a separate monthly Agent SDK credit; budget it apart from interactive use, and authenticate production with an API key rather than a personal login.

So the question to settle before writing any code is concrete: can you list the steps in advance? For triage you can. When you can, you do not need an agent — you need one of five workflow patterns.

Figure 1 — The five workflow patterns. Each is a fixed path; the autonomous agent is a separate category, not a sixth pattern.

The five workflow patterns

These cover most multi-step work. They are ordered by how much judgment each hands to the model, so you take the first one that fits and stop.

Prompt chaining — steps in sequence, each feeding the next, with an optional check between them. Draft a reply, then verify it before sending.
Routing — classify the input, then send it to a handler built for that case. This is the shape triage needs: one model call to label the ticket, then your code selects the path.
Parallelization — split a task into independent parts, or ask the same question several ways and combine the answers. Score a ticket for urgency, sentiment, and topic at once.
Orchestrator-workers — a coordinator decides at runtime how to divide the work. The first pattern with real dynamism, for when the subtasks are not known ahead of time.
Evaluator-optimizer — generate, check against criteria, and revise until it passes. Use it when quality must be reliable rather than hoped for.

Triage is a routing problem. That is the whole diagnosis: an agent was doing a router's job.

Building it

Here is the triage bot rebuilt as the router it should have been. Note where each decision is made:

# A workflow: the control flow is yours; the model handles only the fuzzy step.
category = await classify(ticket)     # one scoped model call -> "billing" | "bug" | "how-to"
handler  = ROUTES[category]           # your code chooses the path, not the model
reply    = await handler(ticket)      # a handler written and tested for that case

The model never touches the routing decision and never sees the refund tool. It does the one thing it is good at — turning fuzzy language into a label — inside a path you control and can test.

When a task genuinely has no fixed shape, you still do not write the loop by hand. The Claude Agent SDK (Python and TypeScript; the same loop that runs Claude Code) runs reason-act-observe for you. You supply configuration, and the configuration is where your control lives:

from claude_agent_sdk import query, ClaudeAgentOptions   # pip install claude-agent-sdk (Python 3.10+)
# allowed_tools is a boundary, not a default: a tool that is not listed cannot be called.
options = ClaudeAgentOptions(allowed_tools=["Read"])      # the refund API is simply absent

🏢

In Practice The same router ships four ways, trading control for convenience. (1) A workflow you own — the three lines above; you run everything. (2) The Agent SDK in your process — for the genuinely agentic steps, the loop runs on your infrastructure. (3) A workflow engine in front — Step Functions, Temporal, or Airflow drives the steps and calls the model per step, giving you retries and observability from tooling you already operate. (4) Managed Agents — Anthropic runs the loop and the sandbox when you would rather not. The axis is ownership: how much of the runtime do you want to run yourself?

⚠️

Gotcha The production failures above came from an unbounded loop holding a tool it should never have had. Two settings prevent most of them: scope allowed_tools to exactly what the step needs, and bound the number of turns. An agent will use every tool and every turn you leave available to it.

The decision

The whole article reduces to one question.

Figure 2 — Measure first, stay on a fixed pattern while the steps are known, and use an agent only when they are not.

Start with a single model call and measure; a surprising amount of "agentic" work is one good prompt. If that falls short and you can list the steps, choose the workflow pattern that fits — it is cheaper, faster, and reproducible. Only when the steps genuinely cannot be known in advance does an agent earn its cost, and even then it runs inside guardrails: scoped tools, bounded loops, explicit permissions.

Applied to triage: the steps are listable, so it is a routing workflow, not an agent.

What it comes down to

Teams that run LLM systems reliably are not the ones with the most autonomous agents. They are the ones who use a workflow wherever the steps are known and an agent only where they are not, treating autonomy as a cost to justify rather than a default to assume. The triage bot did not need to be smarter. It needed to be a router.

System design scenarios

Triage at 50,000 tickets a day

Front the pipeline with a routing workflow: one classify call, then handlers your code selects — most tickets never touch an agent. Run the enrichment lookups as parallel steps, not as an agent deciding what to fetch. Reserve a scoped agent (read-only tools, a small turn cap) for the long tail that fits no known category, and route its output back through the same validated handlers. The design goal is to keep the autonomous fraction small and bounded, because that fraction is where cost and unpredictability concentrate.

A task with four uncertain stages: research, plan, draft, verify

Resist one agent that does all four in a single loop — errors compound across stages and failures won't reproduce. Compose four workflow steps with a check between each (an evaluator-optimizer loop on the draft, for instance). Each stage uses the model freely inside its boundary; the seams between stages stay yours — testable and re-runnable. You keep flexibility where it's needed and reproducibility everywhere else.

Orchestrator-workers, or a fixed pipeline?

Stay on the fixed patterns until the set of subtasks isn't knowable until runtime — a research task where which sources to read depends on what earlier sources reveal. That is the signal for orchestrator-workers: a coordinator that decides the split dynamically. If you can enumerate the subtasks in advance, a parallelization step is cheaper and more predictable. The question is always whether the shape is known before the work starts.

Interview questions

The questions an interviewer actually probes on this topic, and what a strong answer covers.

Explain the difference between a workflow and an agent.

Who decides the steps. In a workflow you do — the control flow lives in your code and the model only fills the fuzzy parts of a step. In an agent the model does — given tools and a goal, it chooses each step at runtime and loops until it judges the task done. Size and sophistication are not the distinction; control is.

When would you choose not to build an agent?

Whenever you can list the steps in advance — which is most tasks. A fixed workflow pattern is cheaper, faster, and reproducible, while an agent adds token cost, latency, and compounding errors for flexibility you don't need. Reaching for the simpler option when it fits is the senior move.

What does autonomy cost, and how do you bound it?

Spend scales with the number of turns, which you stop controlling once you hand over the loop; errors accumulate across turns; and because the path varies between runs, failures are hard to reproduce. You bound it by scoping allowed_tools to exactly what the step needs, capping turns, setting an explicit permission mode, and quarantining autonomy to the smallest step that requires it.

Name the five workflow patterns and when each fits.

Prompt chaining (sequential steps with a gate), routing (classify then dispatch), parallelization (fan out independent parts or vote), orchestrator-workers (a coordinator splits the work at runtime), and evaluator-optimizer (generate, check, revise until it passes). They are ordered by how much judgment each hands to the model — take the first that fits.

How do you decide workflow vs agent for a given task?

Start with a single model call and measure — a lot of "agentic" work is one good prompt. If that falls short, ask whether you can list the steps in advance: if yes, pick the workflow pattern that fits; if no, use an agent, wrapped in scoped tools, bounded loops, and permission gates. The decision hinges on whether the shape is known before the work begins.