A consulting assistant that knows your organisation before the call starts
This whitepaper describes methodology and patterns from PeopleAnalytics.AI's engagement work on embedded conversational assistants for prospective-client demo environments. Engagement details are anonymised by design; no specific client outcomes are claimed. Any numbers drawn from the demo environment are labelled as demo measurements against synthetic data.
Summary
Every buyer who visits a demo environment comes in with the same question, phrased differently: how would this apply to my workforce? A generic walkthrough doesn't answer it. The pattern here is to ship a conversational assistant with every demo — briefed on the prospective client's organisation, industry, and workforce challenges — that answers the question directly. The assistant is grounded in a versioned knowledge base, rate-limited against abuse, and wired to route exhausted sessions to a human conversation rather than to a dead end.
The problem
The sales motion for a people-analytics engagement is unforgiving. A prospective buyer has thirty minutes between meetings to evaluate whether your capability applies to her organisation. The demo environment is live; her questions are specific: would this work for our 30,000-person, unionised, mostly-frontline workforce? How does the attrition model handle our three-tier performance system? What does the BigQuery bill look like at our scale? Most demo sites give her a slide deck and a contact form. The gap between what she needs and what she gets is why most sales cycles stall in the evaluation stage.
The first-principles problem: the salesperson isn't in the demo. The buyer is alone with a set of screenshots and a form. If she had the salesperson in the room, he'd anticipate her questions, translate the demo to her context, and name the two or three things she should focus on given her stated challenges. That translation is most of what makes a sales call valuable. Without it, the demo is underpowered.
Hiring enough salespeople to be in every demo in real time doesn't scale. Producing a personalised demo for every prospect doesn't scale either. The buyer knows it, the seller knows it, and the result is a lot of demos that produce no signal because the buyer never got past the generic layer.
Why this is hard
A "contextual chatbot" sounds easy and isn't. The usual failure modes:
Ungrounded answers. An assistant that confidently invents platform capabilities the platform doesn't have is worse than no assistant. It creates commitments the engineering team will have to unmake. Every answer has to be tied to retrieved, authored content — not emergent from the base model's training data.
One-size-fits-all tone. The question "how would this apply to my organisation" sounds different from a recruiter, a technical buyer, and an executive. A system that answers all three with the same voice loses two of them. Persona is not a UI toggle; it's a routing decision that affects retrieval scope and response style.
Rate limits as dead-ends. Any public-facing LLM application needs rate limits to prevent abuse. The lazy version returns an error and tells the visitor to come back tomorrow. A good visitor who hits the limit is often the one most likely to convert; treating her like an attacker is a conversion failure, not a security win.
Conversation as content that rots. Most platform assistants are built once and become stale as the product evolves. If the assistant's knowledge isn't in the same version control as the code, it will be wrong by the next release. Wrong, in an LLM application, is fast, fluent, and confident. That's the worst failure mode.
The approach
The assistant ships with every demo via src/components/ChatBot.tsx and src/components/ChatBotWrapper.tsx, backed by /api/chat. It's not a separate product; it's a site-level capability that knows about every demo the visitor is currently looking at.
The knowledge base lives in /knowledge/ as markdown files committed to the same Git repository as the application code. That choice is deliberate. A CMS-backed knowledge base drifts from code; a Git-backed one can't. Pull requests update both at once, and the fact that marketing and engineering review each other's commits is a feature, not friction. The folder holds architecture documentation (chatbot-rag-architecture.md, ai-provider-abstraction.md, supabase-schema.md, analytics-system.md), per-demo walkthroughs (tour-chatbot/02-demo-attrition.md and siblings), and core background on qualifications and positioning.
Retrieval is Supabase pgvector. Documents are chunked and embedded into a 1,536-dimension vector table with an IVFFlat index. On each question, top-K chunks are retrieved and passed to the synthesis model with citation requirements. Supabase pgvector rather than a dedicated vector store is the right choice at this scale — embeddings sit in the same PostgreSQL as application data, with one RLS policy, one backup, one query layer. A dedicated vector store pays off in the tens-of-millions-of-vectors range, which is not where a consulting practice's knowledge base sits.
Synthesis is AWS Bedrock with Claude Haiku 4.5. The trade-off is worth being explicit about. In RAG, the quality ceiling is set by retrieval, not by model capability. A larger model synthesising well-retrieved content produces the same answer as a smaller one, at higher cost and higher latency. Haiku generates first token fast enough that conversational UX is perceived as instant rather than slow. The model choice is centralised in src/lib/models.ts, so swapping it is a one-line change if the economics shift.
Personas are wired at three layers. The visitor can select or be auto-routed to recruiter, technical, or executive, each with a different suggested-question set and a different retrieval scope. Recruiters get questions about skills and career experience; technical buyers get architecture and implementation; executives get ROI and prioritisation. The persona persists in sessionStorage and affects which knowledge chunks are eligible for retrieval.
Rate limiting is designed for conversion. Sessions cap at 40 turns; IPs cap at 100 new sessions per day; conversation depth is bounded to prevent context explosion. IP addresses are SHA-256 hashed before storage, per GDPR data minimisation. A visitor who hits the session limit is routed to the contact form with a specific message: you've explored the full demo; let's talk directly — I'd love to discuss your use case. That's not a dead end. It's a handoff, and the handoff is usually the point of the exercise.
Streaming runs token-by-token through the Vercel AI SDK. Streaming isn't cosmetic — it reduces perceived wait time to near-zero and dramatically increases the likelihood the visitor stays engaged past the first answer.
Guardrails are real. Bedrock Guardrails block off-topic, personal-advice, and entertainment requests at the model layer. PII entities — names, emails, phones, addresses, SSNs — are anonymised in both directions. Grounding threshold is set so the model must anchor answers in retrieved content. A visitor who tries to redirect the assistant to general advice gets a response explaining what the assistant does cover, with a pointer to the relevant demo.
What the system produces
- A contextual conversation during every demo visit, with persona-scoped retrieval and authored answers.
- Rate-limited session handling that converts exhausted sessions into human handoffs rather than dead ends.
- An audit trail of every interaction for post-visit review and knowledge-base improvement.
- A versioned knowledge base that travels with the code, so content and capability can't drift.
What the system does not produce:
- A replacement for the sales call. The assistant gets the buyer to the specific questions she wants to ask; the call is where commitment happens.
- Answers about competitors. Comparative claims in a chatbot are a legal liability; they belong in a human conversation. The assistant is wired to route competitor-comparison questions to a generic "we don't comment on competitors here; happy to walk through our approach in a call" response.
- Guaranteed conversion. The assistant is a conversion aid; outcomes still depend on fit, timing, and follow-up.
Patterns from engagement work
Ship analytics from day one. Session counts and message counts are cheap. Topic distribution — which knowledge-base articles are being cited — is the feedback loop that keeps the knowledge folder healthy. Without it, content rot is invisible until it's expensive.
Design the exhausted-session handoff before you need it. A frustrated visitor who hits a rate limit and gets an error is a conversion you just lost. The handoff experience — a graceful message routing to the contact form or a scheduled call — should exist on day one, not be added after the first incident.
Rewrite suggested questions in buyer-voice, not vendor-voice. Vendor-voiced questions ("What's your platform's approach to...") get low engagement. Buyer-voiced questions ("How would this handle our...") get high engagement. The assistant's suggested questions are a UI decision, not a content decision, and they need to be iterated on.
Per-chunk freshness dating is cheap insurance. A timestamp on every knowledge chunk that flags when the underlying system has changed since the chunk was written. Content rot is the long-term risk; freshness dating is the cheap way to catch it before a confident wrong answer ships.
Where this applies
This pattern works for any vendor with a complex, technical product and a knowledgeable buyer base — people analytics, data platforms, developer tools, security products, regulated SaaS. It works especially well when the sales motion has a meaningful technical-evaluation stage, because that's the stage the assistant most directly supports.
It does not work for simple products where the buyer's question is mostly about price and logistics — that conversation is better served by a straight contact form. It also does not work without a genuine commitment to keeping the knowledge base current; stale RAG is actively worse than no RAG, because it produces confident wrong answers.