Four failure modes when brands bolt AI chat onto the homepage.

From March 2024 to March 2025 we have observed fourteen B2B and DTC brands ship a homepage AI chat surface. Eleven regressed on at least one metric they were trying to improve. The patterns are consistent enough to name. We thought there were three, and then the fourth one started showing up in 2025.

Failure mode 1. The chat replaces the nav people actually used.

What happens. Product teams, convinced chat will become the primary surface, shrink or hide the pricing page link, the solutions menu, or the comparison pages. Session data three weeks in shows users typing things they would have clicked. Time to first meaningful action goes up. Bounce goes up. Sign-up rate drops.

The fix. Ship the chat alongside navigation for 90 days and measure. In all four cases we have clean before-and-after data for, the chat took 9 to 14% of query volume and the rest stayed on nav. Hiding nav was a regression every time.

The mistake the fix does not solve. Even with nav intact, the homepage chat surfaces the same two or three questions over and over (pricing, integrations, comparison). If the answers to those questions are not on the main marketing site, users who came via chat are a different cohort than users who came via nav, and your funnel metrics need to account for that.

Failure mode 2. The chat answers questions the brand did not want answered.

What happens. A RAG chat trained on the marketing site will cheerfully quote competitor comparisons, old pricing, beta feature names, roadmap items, and anything else in the corpus. One client discovered theirs was directing leads to a 2022 pricing page that had been deprecated but not removed from the crawl index. Another was surfacing a "limitations" doc that had been internal-only for six months but had been indexed by the chat ingestion pipeline.

The fix. Source-control the corpus. Treat what the chat can read as a product surface, not as "everything on the site". Monthly audit of top queries and top responses. Explicit allowlist, not blocklist. Review process for any new source document.

Failure mode 3. The chat answers but does not convert.

What happens. Users get their question answered and leave. The brand has replaced a CTA-oriented nav with a Q&A interface that has no follow-up. In three cases we measured, sign-up rate from users who interacted with chat was 30 to 60% lower than from users who did not touch it. This is the failure mode that gets blamed on the chat's answer quality when the real problem is flow design.

The fix. Build explicit conversion moments into the response flow. Not "would you like to book a demo?" after every answer (worst of both worlds), but contextual: pricing answers end with a starter-plan CTA, comparison answers end with a trial link, technical answers end with a docs link that captures workspace sign-ups. Some of the answers that look like "information" are actually "consideration", and the transition needs to be explicit.

Failure mode 4. The chat hallucinates a commitment that is not yours to make.

What happens. The chat, grounded on old or imprecise marketing copy, invents a guarantee. A "30-day money-back" on a product that no longer offers one. A "free forever" tier that was discontinued. A specific integration that is on the roadmap but not shipped. In the cases we have seen, users took screenshots and escalated to support when the promised thing did not materialise.

This is the failure mode that has serious legal and reputational downside, not just metric downside. Canada's federal court ruling on Air Canada's chatbot in February 2024 made it clear: a brand is responsible for what its chat says, whether the commitment was in the underlying copy or hallucinated.

The fix. Two layers. First, tight grounding: refuse to answer outside the allowlisted corpus, even if the model could. Second, explicit guardrails around commitment-shaped language (refunds, guarantees, pricing, SLAs, free tiers). The cleanest implementation we have seen uses a classifier on the model output that hard-blocks commitment language unless it appears verbatim in the source document.

What we are not seeing

We have not yet seen a homepage chat meaningfully move top-line demo or sign-up volume. The cases where chat paid for itself were all on doc-sites and support surfaces, not marketing sites. If the goal is pipeline, the evidence says put the budget into answer-shaped content (which gets cited in AIO and LLM answers, lifts organic, and improves the base-rate of informed buyers) rather than a chat widget on the homepage.

What a working deployment looks like

The two deployments that produced measurable value in our sample were both doc-site, not homepage. Both had: (a) a narrow, well-maintained corpus with an explicit review process, (b) a conversion moment at the end of the response relevant to what was asked, and (c) a dashboard owned by a product-marketing person, not an engineering manager, that reviewed top queries weekly and fed them back into the content roadmap. The chat became, in effect, a live content gap-finder. That was the actual payoff; the support deflection was secondary.

Basis: fourteen homepage chat deployments observed, March 2024 to March 2025. Metric data available for seven; qualitative observation for the rest. Includes three deployments using Intercom Fin, four on custom OpenAI-backed RAG, three on Vercel AI SDK stacks, and the rest mixed.

EXPERIMENTS

Our work

Insights

About

Contact

Launch with us

Our work

Insights

About

Contact