Drupal AI Agents: A Practical Guide to What Works in Production

Drupal AI Agents — the ai_agents module in the Drupal AI ecosystem — is the most ambitious piece of Drupal’s AI work. It’s designed to let you define AI-powered agents that can call Drupal tools, make decisions, and take actions on behalf of users or editors. The framework is real. The developer experience is still rough. And whether it’s ready for your production use case depends heavily on what you’re trying to build.

This guide walks through what the module actually does, what we’ve shipped using it, and where we’ve had to work around limitations.

For the broader Drupal AI ecosystem context, see our Drupal AI Module guide.

What a “Drupal AI agent” actually is

In the ai_agents module, an “agent” is a structured configuration that combines:

A system prompt that tells the LLM what role it’s playing and what it should or shouldn’t do.
A set of tools — functions the agent can call to interact with Drupal. Tools are Drupal plugins that expose things like “search nodes,” “create a webform submission,” “look up a taxonomy term,” or anything else you define.
A runtime loop that takes a user input, asks the LLM what to do, executes any tool calls the LLM requests, feeds the results back, and repeats until the agent returns a final answer.
Conversation state so the agent can remember what’s been said in this session.

If you’ve used OpenAI’s function calling, Anthropic’s tool use, or the agentic loop in LangChain/LangGraph, the shape is familiar. ai_agents is Drupal’s take on that pattern.

What works ✅

Tool definitions are clean. Writing a custom agent tool in Drupal is a matter of defining a plugin with input/output schemas and a execute() method. The plugin system routes inputs from the LLM into your code and returns results back. This part is well-designed.

Provider swapping. Because ai_agents is built on top of the ai core module, you can point agents at any LLM provider you have configured — OpenAI, Anthropic, Gemini, Ollama — and swap between them without touching agent code. That’s valuable for cost tuning and vendor risk.

Observability hooks. The module fires events for agent decisions, tool calls, and LLM responses. You can wire these into your own logging pipeline so every agent interaction is recorded with its full context. We do this by default on every production deployment.

Drupal integration. The big advantage over using a JavaScript agent framework (LangGraph, Mastra, etc.) is that your agent has native access to Drupal’s entity API, permissions system, and data model. An agent tool that searches your site content is a 20-line plugin, not a custom API wrapper.

What falls short ⚠️

Frontend is up to you. The module provides the backend agent loop but not a chat UI. You can use Drupal’s ai_assistant_api for basic chat flows, but for anything production-ready (streaming responses, conversation history, handoff to humans, analytics) you’re building the frontend from scratch or integrating a third-party chat widget.

Evaluation is missing. Agent frameworks live or die by their eval harness. If you can’t measure whether your agent is getting better or worse with each prompt change, you’re flying blind. ai_agents doesn’t ship with an eval system — you’ll roll your own. We typically build a Drush command that runs a set of test inputs against an agent and compares outputs to a baseline.

Prompt versioning. Agents have a system prompt that you’ll tune constantly. The module stores prompts in config, but there’s no built-in versioning or A/B testing. You’ll either add that yourself or move prompt management outside Drupal (we usually use a simple config-based approach with git as the version control).

Safety guardrails are DIY. The module doesn’t include prompt injection detection, output filtering, or rate limiting per user. For anything public-facing, you need to add these — and they’re not optional. We’ve seen production Drupal AI deployments get prompt-injected within hours of launch.

What doesn’t work yet ❌

Multi-agent orchestration. If you want one agent to call another agent, the module has hooks for it but the ergonomics aren’t there. We usually implement multi-agent patterns at the application level (custom service wrapping multiple ai_agents calls) rather than trying to express them through the module’s config.

Tool chaining with complex state. Simple tool sequences work. Complex workflows where the agent needs to remember state across multiple tool calls (e.g., “search for articles, then summarize the top 3, then create a webform submission with the summary”) are brittle in the current implementation. You’ll fight the framework.

How we build production agents on Drupal

Here’s the playbook we actually use on client projects:

1. Start with a narrow, high-value use case. “Summarize all articles in a taxonomy term and email the digest to editors” is a good first agent. “General customer support chatbot” is not. Narrow scope = fewer failure modes = easier to measure.

2. Write the tools first, not the prompt. Define the 3-5 Drupal functions the agent needs to call. Ship them as plugins, test them directly (not through an LLM), and make sure they work. Only then wire them into an agent.

3. Hard-code the happy path as a baseline. Before letting the LLM drive, build the same feature as a normal Drupal action that runs the tools in a fixed sequence. This gives you a working fallback, and it becomes your eval baseline — “the agent should produce the same output as the hardcoded version on these 10 test inputs.”

4. Use the smallest model that works. Default to Haiku, GPT-4o-mini, or Gemini Flash. Only reach for larger models (Sonnet, GPT-4, Gemini Pro) when evaluation data shows the smaller model isn’t good enough. This cuts costs by 10-20x without meaningful quality loss for most agent use cases.

5. Log everything, evaluate constantly. Every agent call gets logged with input, intermediate tool calls, LLM responses, and final output. Once a week, spot-check recent production traces. Once a month, run the full eval suite.

6. Put a kill switch in front. The agent should be gated behind a Drupal permission that admins can revoke instantly. When (not if) you discover a prompt injection or a hallucination, you want to turn it off in seconds, not hours.

Real cost and timeline expectations

For a narrow-scope production agent on Drupal:

Setup time: 2-4 weeks for a first agent, including tools, prompt tuning, eval harness, logging, and a minimal chat UI.
Ongoing tuning: 4-8 hours a week for the first 2-3 months as you discover edge cases in production. Less after that, but never zero — model updates from providers will shift your agent’s behavior silently.
API costs: Low for editorial agents (editors use them occasionally, small token counts). Higher for public-facing agents (you pay for every visitor interaction, including the ones that bounce). Plan $50-500/month for editorial, $500-5000/month for public chatbots with meaningful traffic.
Engineering cost dominates. The API bill is usually 10-20% of the total cost of running a production agent. The rest is engineering time maintaining it.

Should you use `ai_agents` on your Drupal site?

Yes, if:

You have a specific, narrow editorial or operational use case the agent can automate.
You have engineering capacity to own the integration (not just set it up — maintain it).
You’re willing to invest in evals and observability upfront.
You already run Drupal and want to leverage its entity API for tools.

No, if:

You want “an AI chatbot on our site” without a specific use case. You’ll end up with a demo that nobody uses.
You don’t have engineering capacity to babysit it. Unattended agents rot.
You’re running a static site with no interactive needs. You don’t need an agent framework; you need a content-suggestion module.
You’re on a deadline and haven’t built LLM features before. Pick something simpler for your first Drupal AI project — content suggestions, summarization, or alt text generation.

TL;DR

The ai_agents module is a real framework for building LLM-powered agents that live inside Drupal. It’s well-designed at the tool-plugin layer, rough at the frontend and evals layer, and not a drop-in “install and have a chatbot” solution. If you go in expecting to build a custom thing on top of the framework — tools, evals, observability, kill switches — it’ll serve you well. If you go in expecting a turnkey solution, you’ll be frustrated.

We build production AI agents on Drupal. If you’re planning an agent integration, we’ll give you an honest read on whether your use case is the right fit for ai_agents or if there’s a simpler approach.