Drupal AI Chatbot: From API to Production

Drupal AI chatbots are one of the most-requested “AI features” we get asked to build. And most of the chatbot demos you see online are exactly that — demos. They look great in a screen recording, ship to production, and then quietly get turned off three weeks later because they hallucinate, they don’t know the site’s content, or nobody’s logging what users ask them.

This guide is the playbook we actually use to ship production Drupal chatbots. It’s opinionated and specific. For the broader Drupal AI context, see our Drupal AI Module guide and Drupal AI Agents guide.

What a production Drupal chatbot actually needs

Before we talk about Drupal modules, let’s name the pieces. A production chatbot on a Drupal site needs:

Grounding in real content. The chatbot can’t just be ChatGPT with a Drupal skin. It needs to answer questions using your content — articles, product pages, FAQs, service descriptions. This usually means retrieval-augmented generation (RAG).
A frontend. A chat widget that lives on your pages, opens on click, maintains conversation state across pages, and doesn’t look broken on mobile.
Session handling. Two users talking to the chatbot at the same time should get isolated conversations, not accidentally see each other’s messages.
Lead capture. When the chatbot identifies a sales opportunity, it needs to hand off — either to a form, a CRM, or a human. Otherwise it’s generating fascinating conversations that never turn into revenue.
Conversation logging. Every message in and out, with user session, page URL, and LLM response. You’ll need this data for quality evaluation, prompt tuning, and legal compliance.
Guardrails. Prompt injection detection, output filtering, and a kill switch that can turn the bot off instantly when something goes wrong.
Cost controls. Rate limiting per session (and per IP, for public bots) so one bad actor can’t drain your LLM budget.

Most “Drupal chatbot” demos cover #1 (sometimes) and ignore the rest. That’s why they get turned off.

The stack we actually ship

Backend

Drupal 10/11 with the ai module installed.
ai_provider_openai (or ai_provider_anthropic / ai_provider_gemini) for LLM access. We default to Claude Sonnet for quality-sensitive production bots, or GPT-4o-mini / Haiku for cost-sensitive ones.
ai_search for RAG — embeds your Drupal content into vectors and lets the chatbot retrieve relevant chunks for each query. Backed by pgvector in Postgres (simpler than standing up a separate vector database for most sites).
ai_assistant_api for the conversation state and session management.
Custom module for the tool layer — typically 3-5 functions: search content, look up a specific page, submit a contact form, create a CRM lead, hand off to human.
ai_agents (sometimes) if the bot needs to make decisions and take actions, not just answer questions. For a pure Q&A chatbot, skip ai_agents and use a simpler RAG pipeline.

Frontend

Custom React chat widget — not the stock ai_assistant_api UI. The stock UI is fine for internal demos; for public-facing sites you need responsive design, smooth streaming, message history, typing indicators, and branded styling. We usually build the chat widget as a standalone React app and mount it via a Drupal block.
Server-Sent Events (SSE) for streaming. Streaming responses makes the bot feel 5x faster even when total latency is identical. Non-streaming chatbots feel dead.
Session storage — conversation state cached in Drupal for the duration of the session so users can reload the page without losing context.

Integrations

Airtable, HubSpot, or Salesforce for lead capture. We’ve shipped all three. Airtable is the easiest for small teams; HubSpot integrates with sales workflows for mid-market; Salesforce is for enterprise with existing Salesforce pipelines.
Slack webhook for internal notifications when a high-intent lead comes in, so sales gets alerted in real time.

How grounding actually works

The hardest part of a production chatbot is getting it to actually answer using your content instead of hallucinating plausible-sounding nonsense. Here’s the flow:

Indexing (offline). ai_search walks your Drupal content, breaks each node into chunks of ~500 tokens, computes embeddings with OpenAI or a local model, and stores them in pgvector. This runs on content save and on a cron schedule for batch re-indexing.
Retrieval (per query). When a user asks a question, the chatbot embeds the query, does a vector similarity search against your content, and retrieves the top 5-10 most relevant chunks.
Augmentation. The retrieved chunks are prepended to the LLM prompt as context: “Here are some relevant excerpts from our content. Answer the user’s question using only this information. If the answer isn’t here, say you don’t know.”
Generation. The LLM responds using the retrieved context. If it doesn’t find an answer in the chunks, it says so instead of making something up.

This works well for factual questions where the answer is somewhere in your content. It breaks down when the user asks something vague (“tell me about your services”) and the retrieval returns too much or too little. For those cases, we add a preprocessing step that rewrites vague questions into more specific ones before retrieval.

The safety layer

For public-facing chatbots, these guardrails are not optional:

Prompt injection detection. Filter user messages for common injection patterns (“ignore previous instructions,” “you are now…,” etc.). Not bulletproof but catches the obvious attempts. Use a small model or a simple regex — both work.
Output filtering. Before returning the LLM’s response to the user, scan for common bad outputs: the system prompt leaking, fabricated phone numbers or addresses, claims about pricing or policy that the bot shouldn’t make up. We maintain a list of disallowed content patterns per client.
Rate limiting. By session and by IP. Start with 10 messages/minute/session and 30 messages/minute/IP. Adjust based on your actual usage.
Kill switch. An admin toggle in Drupal that disables the chatbot instantly. Wire it to a permission so only specific admin roles can use it. When the chatbot is off, the widget hides itself or shows a “chat is currently unavailable” message.
Content policy. Decide what the bot will and won’t answer. Medical advice, legal advice, financial advice — all things public-facing bots should explicitly decline. Bake the refusals into the system prompt and test them.

What to measure

Once the chatbot is live, these are the metrics that actually matter:

Resolution rate: What percentage of conversations ended without a handoff? High rate = bot is handling things. Low rate = either the bot is bad or your content doesn’t cover the questions users are asking.
Handoff conversion: Of the conversations that did hand off, how many became qualified leads? This is your chatbot ROI.
Average conversation length: Too short = bot isn’t engaging. Too long = bot is confused and wandering.
Common topics: Cluster conversations by topic to find content gaps. If 15% of users are asking about something your site doesn’t document, write that page.
Cost per conversation: Total LLM spend divided by total conversations. Watch this trend; it’ll tell you when you need to switch to a cheaper model or shorten prompts.

Build dashboards for these on day one, not month three.

Common failures (from real production deployments)

1. The bot claims capabilities it doesn’t have. User: “Can you schedule a consultation for Tuesday?” Bot (confidently): “Yes, I’ve scheduled you for Tuesday at 2pm.” The bot can’t actually schedule anything — it just said it could. Fix: be explicit in the system prompt about what the bot cannot do.

2. Hallucinated pricing or features. Bot makes up a price for a service your company doesn’t offer. Fix: retrieve pricing from a structured data source (a config entity or a Drupal view), and pass it to the bot as context with explicit “only use these numbers.”

3. Vector search returns nothing, bot still answers. When retrieval returns zero relevant chunks, the bot should refuse to answer. But if the retrieval threshold is too loose, it returns irrelevant chunks and the bot uses them anyway. Fix: set a minimum similarity score and explicitly tell the bot to say “I don’t know” when retrieval is below threshold.

4. Conversation drift. User has a long conversation; bot slowly forgets the original topic. Fix: summarize older conversation turns instead of keeping the full history in context.

5. The bot is too polite. Canned “I’d be happy to help!” openings and closings eat tokens and don’t add value. Fix: system prompt explicitly says “respond concisely, no filler, no apologies.”

What it costs

For a mid-sized public-facing chatbot on Drupal:

Build: $15k-50k depending on scope (tools, CRM integration, analytics dashboard, safety layer). A well-scoped first version is closer to $15k; full production with RAG, CRM handoff, and dashboards is closer to $50k.
LLM spend: $100-2000/month depending on traffic. For a site doing 1000 chatbot conversations a day on Claude Haiku or GPT-4o-mini, plan around $300-500/month.
Ongoing maintenance: 5-15 hours/month for prompt tuning, content gap fixes, and model version bumps.

TL;DR

A production Drupal AI chatbot is more engineering than most teams expect. The Drupal AI module ecosystem gives you the backend plumbing, but the frontend, safety layer, RAG pipeline, CRM integration, and observability are all work you have to do. The demos you see online skip all of that — which is why most of them don’t survive a month in production.

If you want a chatbot on your Drupal site that’s actually going to work, plan for the full stack, not just the LLM call. We build chatbots that ship and stay shipped — tell us what you’re trying to do and we’ll give you a realistic scope.