Sparksbox
AI Agents Aren't What You Think cover background
Back to The Signal
AI StrategyApril 16, 20269 min read

AI Agents Aren't What You Think

57% of companies say they have AI agents in production, but only 23% are actually scaling them past pilot.

Here's the gap nobody's talking about: 57% of companies told G2 in August 2025 that they have AI agents in production. But McKinsey's November 2025 survey found only 23% are scaling those agents beyond a single use case. No more than 10% have scaled in any single business function.

The other 34% have something running, sure. But "running" and "producing value at scale" are very different things.

Most of what companies call an "AI agent" is a scheduled workflow with one LLM node inside Make or n8n. That's fine. It's useful. But calling it an agent is like calling a calculator a mathematician.

What's the real difference between agents and automations?

An automation follows a predetermined path. Input triggers step one, step one triggers step two, and so on. A workflow in Make.com (250K+ active businesses) or n8n (230K+ users, valued at $1.5B) does exactly what you tell it to do, every time.

An agent makes decisions. It receives a goal, evaluates available tools, decides which ones to use in what order, and adapts when something goes wrong. Anthropic's Claude Managed Agents, launched April 8, 2026, is one example. The agent gets a task, decides how to break it down, calls APIs, reads results, and adjusts its approach.

Automation vs Agent Paths

Here's the practical framework:

Automation (workflow)Agent
Decision-makingNone. Follows fixed path.Dynamic. Chooses tools and order.
Error handlingBreaks or follows pre-built fallbackAttempts to recover or reroute
Setup complexityLow to mediumHigh
Cost$500-$5K/month (off-the-shelf)$40K-$150K initial build; $1.4M-$1.6M fully loaded annual
Reliability at 20 stepsDepends on API uptime36% success rate at 95% per-step reliability
Best forRepetitive, predictable tasksTasks requiring judgment across variable inputs

That reliability number deserves a closer look. At 95% per-step reliability (which sounds great), a 20-step workflow succeeds end-to-end only 36% of the time. That's 0.95 to the 20th power. Tool-calling fails 3-15% of the time in production, and those failures are usually silent.

Why do most AI pilots fail?

The failures weren't caused by the technology being bad. They were caused by two things: poor use-case selection and bad data quality.

Structuring Chaos into Clean Data

Teams pick their most complicated, highest-stakes process for the AI pilot. Then they feed it messy, inconsistent data. When it doesn't work perfectly, they declare AI "not ready" and move on.

The Stanford Digital Economy Lab's Enterprise AI Playbook, published March 2026 after studying 51 successful deployments, found the same pattern in reverse. The companies that succeeded picked boring, repetitive tasks with clean data. They didn't start with "automate our entire sales process." They started with "sort these support tickets into three categories."

McKinsey's research estimates that 85% of operational marketing tasks are automatable with a roughly 32% productivity gain. Agentic AI specifically will power 60%+ of incremental marketing and sales AI value, delivering a 3-5% annual productivity lift. Those numbers are real, but only if the implementation is right.

How should a marketing team start with AI automation?

Start with the workflows, not the agents.

Pick the three tasks your team does most often that require the least judgment. Email list segmentation based on clear rules. Report generation from existing data. Social post scheduling based on a pre-approved content calendar.

Build those as automations in Zapier (8,000+ integrations, largest connector library), Make, or n8n. Get them running reliably for 60 days. Then look at what breaks, what needs human intervention, and where an LLM node could reduce that intervention.

A practical automation stack for a mid-size marketing team:

  1. 1Lead scoring and routing: Form submission triggers CRM update, Slack notification to sales, and email sequence enrollment. No LLM needed.
  2. 2Content repurposing: Blog post published triggers summary generation (one LLM node), creates social post drafts, schedules distribution. One LLM call, surrounded by deterministic steps.
  3. 3Reporting: End of week triggers data pulls from Google Analytics, ad platforms, CRM. LLM writes a summary paragraph. Sends to Slack.
  4. 4Competitive monitoring: Daily RSS check of competitor blogs. LLM flags relevant updates. Sends digest to team.

None of these are agents. They're workflows with one smart node. And they'll handle 70-80% of the busywork.

When do you actually need an agent?

You need an agent when the task involves branching decisions that can't be predetermined. An SDR research workflow where the agent needs to look up a prospect, decide which of six data sources to check, evaluate what it finds, and draft an outreach email customized to what it learned. That's an agent use case.

But go in with realistic expectations. Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025. In the same research, they predicted 40%+ of agentic AI projects will be canceled by the end of 2027. The hype curve is steep on both sides.

IBM's CIO.com analysis put it well: "Agentic AI systems don't fail suddenly. They drift over time." The agent works great in week one. By week eight, the data it's pulling has changed format, the API it's calling has updated, and the edge cases it's encountering have shifted. Without monitoring, it silently degrades.

What does a custom agent actually cost?

The cost gap between off-the-shelf and custom is enormous.

Off-the-shelf (workflow + LLM nodes):

  • Platform: $500-$5K/month depending on volume
  • Setup: 10-40 hours of configuration
  • Maintenance: 2-5 hours/month
  • Time to value: 1-2 weeks

Custom in-house agent:

  • Initial build: $40K-$150K
  • Annual fully loaded cost (engineers, infrastructure, monitoring): $1.4M-$1.6M
  • Setup: 3-6 months to production-ready
  • Maintenance: Continuous (dedicated team)
  • Time to value: 4-8 months
Cost Comparison of AI Scale

Editor's Note: For most marketing teams under $10M in revenue, the custom agent build doesn't make financial sense. The $500/month Make.com workflow handles 80% with 99% reliability versus the custom agent handling 95% with 36% reliability.

What's the right evaluation framework?

Before greenlighting any AI automation project, run it through these five questions:

  1. 1Is the data clean? If you can't describe the input format in one sentence, stop. Fix the data first.
  2. 2Can a human do this task in under 5 minutes? If yes, it's a good automation candidate. If it takes 45 minutes of research and judgment, you're in agent territory and should proceed carefully.
  3. 3What's the cost of a wrong output? For social post scheduling, low. For financial reporting, high. Match your reliability requirements to the stakes.
  4. 4How often does the process change? If the steps change quarterly, a rigid automation breaks constantly. If they're stable, a workflow is perfect.
  5. 5What's the monitoring plan? If you don't have someone checking outputs weekly, don't deploy it. Drift is real.

Where this goes next

Anthropic's Managed Agents, OpenAI's agent APIs, and the growing ecosystem of agent orchestration tools are all pushing toward multi-agent systems where specialized agents hand off to each other. The technology is moving fast.

But the implementation maturity isn't keeping up. Gartner's prediction about 40% cancellation rates isn't pessimism. It's pattern recognition from every previous enterprise technology wave.

The teams that win over the next 18 months won't be the ones with the most sophisticated agent architecture. They'll be the ones with boring, reliable automation workflows that actually run, plus one or two well-scoped agent projects with proper monitoring and clear ROI targets.

Start with the workflow. Earn the agent.

FAQ

Automation follows a fixed, predetermined path (if this, then that). An agent receives a goal and dynamically decides which tools to use, in what order, adapting when it encounters unexpected results. Most "agents" in production today are actually automations with one LLM node, which is perfectly fine for most use cases.

Off-the-shelf platforms like Make.com, n8n, or Zapier cost $500-$5K per month depending on volume. Setup takes 10-40 hours. A custom in-house agent is dramatically more expensive: $40K-$150K to build, with $1.4M-$1.6M in annual fully loaded costs for the engineering team to maintain it.

MIT's 95% failure headline is misleading. The failures trace back to poor use-case selection and bad data quality, not technology limitations. Stanford's March 2026 study of 51 successful deployments found that starting with simple, data-clean tasks and expanding gradually was the common pattern.

Zapier leads on integration count (8,000+) and is best for teams that need broad connectivity. Make.com (250K+ active businesses) offers more visual workflow design and better pricing at higher volumes. n8n (230K+ users) is open-source and gives full control but requires more technical setup.

Tool-calling in production fails 3-15% of the time, usually silently. At 95% per-step reliability, a 20-step agent workflow succeeds end-to-end only 36% of the time. This is why scoping agents to fewer steps with human checkpoints dramatically improves real-world performance.

McKinsey estimates 85% of operational marketing tasks are automatable, with a 32% productivity gain. But "automatable" means the repetitive execution, not the strategy, creative judgment, or relationship management. The better frame is that AI handles the busywork so the team can focus on work that requires thinking.