Giving HR its week back

This whitepaper describes methodology and patterns from PeopleAnalytics.AI's engagement work on conversational HR assistants. Engagement details are anonymised by design; no specific client outcomes are claimed. Published figures are cited where load-bearing. Numbers drawn from the demo environment are labelled as demo measurements against synthetic data.


Summary

A well-run HR services team spends a surprising share of its week answering the same ten questions. PTO accrual, remote-work eligibility, performance-review timelines, tuition reimbursement, benefits rules — senior HRBPs whose time is worth a great deal burn it on Tier-1 policy lookup because there's no better system. This whitepaper describes the methodology for a conversational HR assistant with persona-aware routing, grounded retrieval against the client's actual policies, confidence-gated answers, and escalation routing for the questions that should reach a human.


The problem

The head of HR services typically has a quarter's worth of time-tracking data showing where her team's billable hours go. A substantial share — often well into double-digit percentages — goes to answering questions whose answers are already in the employee handbook. PTO accrual tends to top the list. Remote-work eligibility, performance-review timing, benefits enrollment, tuition reimbursement, leave policies, compensation bands — the familiar long tail that every HR organisation processes by hand because nobody has built a better system.

The team's best people often spend the most time on the easiest questions. The senior HRBP — years in the function, SPHR-certified, the person managers call for real problems — is the same person giving PTO answers. That person is also usually the one running exit interviews for the highest-regrettable-attrition population. The math doesn't work.

There's a second problem under the first. Employees increasingly expect instant answers. When they don't get them, they ask their managers. Managers, who also don't know, ask HRBPs. The original ten-minute question becomes a thirty-minute chain across three people, half of whom didn't have the answer the first two times it was asked.

Why this is hard

Most conversational HR assistants fail for three reasons.

They answer every question, including the ones that should escalate. An employee asks about a harassment policy; the bot produces a helpful paragraph about where to find the handbook, right before the interaction should have been a human one, urgently. The sensitive-topic set has to be explicit and the escalation rule has to fire when in doubt.

They treat every asker as the same. "What's my remote-work policy" has a different answer depending on whether the asker is an employee, a manager, or a senior executive. An executive asking about harassment policy needs a different path — probably to Employee Relations directly — than an employee asking the same question. A bot that treats everyone as an anonymous user produces answers that are sometimes right, sometimes wrong, and frequently inappropriate to the asker.

They hallucinate. An HR bot that invents policy specifics is worse than no bot at all. If the PTO accrual rule is three lines in the handbook, the bot must quote those three lines with a citation, not generate its own paraphrase. "Probably" is not a legally useful adverb.

The approach

The system is implemented in src/app/demos/hr-policy-assistant/. Four personas are wired through src/app/demos/hr-policy-assistant/lib/personas.ts: Employee, Manager, Executive, and HR Admin. Each persona sees a different self-service dashboard (time off, benefits, pay, performance, training, team data where applicable) and has different access to policy content. The chat interface (HRChatWindow.tsx, routed through /api/demo2/chat) is persona-aware end to end.

Retrieval is the hard part. Policy documents are chunked in src/app/demos/hr-policy-assistant/lib/chunking.ts with role-access tags. A chunk marked roleAccess: ['hr_admin'] is never returned to an employee, regardless of how the query is phrased. A chunk classified as containing PHI is filtered at the retrieval layer, not the generation layer. Classification is intentional, not emergent from the model. The production version of this pattern moves to Supabase pgvector with the same access policies; the demo uses hardcoded chunks against a representative policy library (PTO, remote work, performance reviews, harassment, compensation, FMLA, tuition, ADA accommodation) to keep the behaviour inspectable.

Synthesis is AWS Bedrock with Claude Haiku 4.5 (src/lib/models.ts). Sonnet and GPT-4 were considered. Two reasons we prefer Haiku for this pattern. First, when retrieval is scoped well, synthesis doesn't need reasoning depth — it needs to faithfully paraphrase the retrieved chunks and cite them, which Haiku does reliably. Second, latency: Haiku's time-to-first-token is low enough that the conversational UX is perceived as instant rather than slow, and perceived latency is what kills adoption.

PII protection is enforced at two layers. Bedrock Guardrails (scripts/setup_guardrail.py) anonymise names, emails, phones, addresses, and SSNs in both input and output. Grounding threshold is set so the model must cite retrieved content; off-topic and personal-advice requests are blocked upstream. A separate escalationEngine.ts watches for sensitive-topic signals — harassment, medical, termination, legal, formal complaint — and routes those interactions to a human via /api/demo2/incident, with the full conversation logged.

The confidence engine (confidenceEngine.ts) produces a badge on every answer: high, medium, or low. Low-confidence answers include a prompt to escalate. That design means the bot gets credit for "I don't know" as well as for "here's the answer," which is the right incentive for a system touching policy.

What the system produces

  • Persona-routed conversations, where the retrieval scope and response style match the asker.
  • Grounded answers citing the policy source, with sensitive topics routed to a human by design.
  • A confidence badge on every answer, with explicit escalation prompts below the threshold.
  • An auditable trail of every conversation, stored long enough to meet regulatory retention requirements.

What the system does not produce:

  • Legal advice. The bot paraphrases policy; it doesn't interpret it for a specific situation. The escalation path is the correct place for interpretation.
  • Answers the handbook can't support. If the policy library doesn't contain the answer, the bot says so and escalates. Confident wrong answers are the worst failure mode in this category.

Patterns from engagement work

Tone matters more than accuracy, after a baseline. Once retrieval is grounded and the bot is accurate, the remaining variable in adoption is voice. An accurate-but-bureaucratic bot gets ignored. Rewriting the system prompt to mirror the HR team's actual voice — warmer, shorter, more willing to name what it doesn't know — consistently moves usage more than a percentage-point improvement in retrieval quality does.

Ship the audit trail first, not the chat. The audit trail is what makes Legal comfortable. Until it exists, every conversation with Legal is theoretical. The peopleanalytics-demo2-audit table design predates the chat UI by design.

Onboard HRBPs at launch, not after. The team that understands the questions is the team that needs to trust the answers. Pairing the rollout with the HRBPs from day one changes how quickly the bot gets adopted; bolting them on after the fact creates resistance that's hard to undo.

Widen the escalation trigger set conservatively, and add a confidence-based fallback. Keyword-only escalation misses questions that don't use the trigger words. A low-confidence + health-adjacent (or legal-adjacent) fallback covers the gap. The right test: the bot should err on the side of escalating an ambiguous case, not on answering it.

Where this applies

This pattern works for HR services teams above roughly 500 employees, where the repeat-question volume is high enough that the auto-answer economics clear the investment. It works especially well in organisations with a well-structured policy library — a maintained employee handbook, not a scattered set of SharePoint documents nobody can find.

It does not work for organisations whose policies are mostly in people's heads, or whose HR services function is so small that the repeat-question volume doesn't justify the build. It also does not work without leadership alignment on escalation policy — the sensitive-topic rules are only as good as the definition of "sensitive," and that is a human decision, not a model decision.