Predicting resignations three to six months before they happen

This whitepaper describes methodology and patterns from PeopleAnalytics.AI's engagement work on individual-level attrition prediction in regulated environments. Engagement details are anonymised by design; no specific client outcomes are claimed. Published figures (SHRM, ADP, BLS) are cited where load-bearing. Numbers drawn from the demo environment are labelled as demo measurements against synthetic data.


Summary

An attrition prediction system earns its keep by doing three things at once: scoring individuals (not departments), explaining each score with features a manager can act on, and monitoring outputs for demographic fairness. Any system that doesn't clear all three bars has no business being deployed in an HR context. This whitepaper describes the methodology, the trade-offs, and where the pattern applies.


The problem

The retention signals that would have predicted an individual resignation usually live in separate systems: compensation history in the HRIS, goal completion in the performance module, engagement scores in a survey tool, manager effectiveness in a fourth. No human is stitching those threads together by hand, and nothing automated is either. When a resignation is announced, the post-mortem sounds the same every time: the signs were there; nobody connected them.

Department-level dashboards compound the problem. Quarterly attrition reports show the trend a quarter after the economic cost is already locked in. Worse, a heatmap that says "Engineering's risk is elevated" isn't actionable — no HRBP walks into a forty-person team and says "one of you is thinking about leaving." Retention conversations happen with names, not with departments.

The economics make avoidable exits expensive. Replacement cost depends on role, seniority, and market; published benchmarks — SHRM's Talent Acquisition Benchmarking Report, ADP Research Institute reports, the BLS JOLTS Quits Rate series — place average cost-per-hire in the low-to-mid thousands of dollars for most industries and meaningfully higher in specialised or regulated functions. Once ramp time and lost productivity are added, a preventable resignation in a critical role carries real cost before the replacement is even at desk. The exact number is engagement-specific; the direction of the argument is not.

Why this is hard

Three traps recur when building individual-level flight-risk scoring.

Opacity. A black-box model says "this employee is high risk, score 84" and stops. The manager has no idea why. She can't coach around a number. If Legal flags the model for adverse impact, nobody can defend its decisions. Explainability isn't cosmetic — it's what determines whether the system ever gets deployed.

Bias inherited from history. A flight-risk model trained on voluntary exits learns the patterns of the people who already left. If those exits are demographically skewed — and in most organisations they are, for reasons that aren't the current employee's fault — the model reproduces the bias and adds a scientific veneer to it. A model that systematically over-predicts risk for one group is a legal problem waiting to become an incident.

Score without action. A model that predicts resignations at an organisation that won't act on the predictions is an expensive way to confirm something HR leadership already suspected. The score is upstream of the retention decision, not a substitute for one.

Any system that doesn't solve all three — individual-level, explainable, fairness-monitored, and paired with a credible intervention path — doesn't clear the bar.

The approach

The pipeline is documented in src/app/demos/attrition/components/, with the risk engine in src/app/demos/attrition/lib/riskScore.ts. Data flows from DynamoDB (peopleanalytics-employees, peopleanalytics-performance-reviews, peopleanalytics-compensation-history, peopleanalytics-goal-completions, peopleanalytics-terminated-employees) through the server-side scoring layer to an executive view and an analyst view.

We use an interpretable rule-based risk score rather than a gradient-boosted tree or a neural network. The trade-off is real and worth naming: a GBT or a trained network would typically score a few AUC points higher on a held-out test set. What we give up in raw accuracy we buy in explainability, auditability, and Legal sign-off. Every point in the 0–100 risk score is traceable to a specific input — base rate, department factor, tenure, engagement, overtime, time since promotion, manager rating, compensation band, and optional enrichment from performance ratings, compensation stagnation, and goal completion. A hiring manager can open an employee's record and see the line items that made up the score. That's the conversation that produces a retention intervention.

For explainability, SHAP (SHapley Additive exPlanations) values render as per-employee waterfall charts in ShapWaterfall.tsx. For a given employee, the chart shows how each feature pushed the prediction up or down. Managers can see that the score is driven by compensation stagnation first, declining goal completion second, and manager rating third. That's a coaching conversation. "Your department is at risk" is not.

For fairness, a dedicated panel (FairnessPanel.tsx) decomposes predicted risk by demographic dimension — gender, ethnicity, job level, tenure band. The panel flags any dimension where predicted risk is materially higher than the population base rate, so the model is monitored continuously rather than audited once.

The scenario engine (ScenarioEngine.tsx) isn't a slider that changes a chart. It's a genuine recomputation. Leaders can model "what if Engineering gets a 5% raise band adjustment" and see the full population re-scored, with department-level expected-attrition updated. That turns the model from a report into a planning tool.

Outputs sit next to BLS Quits Rate series (publicly available via the FRED API) for the relevant sector, so every internal number has an external reference point. Without that, an attrition number is a fact; with that, it's a decision.

What the system produces

  • Per-employee risk scores on a 0–100 scale, recomputed on every data refresh.
  • SHAP-based ranked drivers for each score, rendered as waterfall charts.
  • A fairness decomposition across configurable demographic dimensions, with continuous flagging when a dimension's predicted risk diverges from the population base rate.
  • Scenario outputs showing how an intervention (compensation adjustment, promotion schedule change, workload reallocation) would change population-level expected attrition.

What the system does not produce: a guaranteed retention outcome. The link from a risk score to a retention intervention to a retained employee depends on interventions the organisation chooses to run. An attrition model is upstream of the retention decision, not a substitute for one. The companion question — which interventions actually work? — is a matched-cohort analysis addressed in the L&D ROI whitepaper.

Typical deployment shape:

  • Data unification is where most of the work lives. Features are straightforward once the data is joined; joining the data is the work.
  • Calibration against the organisation's own exit history. Threshold tuning is a policy decision, not a technical one — false-positive tolerance is the client's call. A manager checking in with someone who's fine is a different failure mode than missing someone who isn't, and different organisations weight those differently.
  • Fairness baseline, Legal review, pilot rollout. A narrow pilot (one or two departments) validates the model on the organisation's own data before broader deployment.

Patterns from engagement work

Involve Legal in week one, not week eight. The fairness panel exists because Legal asks for it. Bringing Legal in at the start shapes the model design more cleanly than discovering their requirements in a late review. Adverse-impact monitoring is a design constraint, not a feature to bolt on.

Threshold tuning is a policy decision, not a technical one. A model tuned aggressively produces too many false positives and burns out the HRBPs who follow up. A model tuned conservatively misses the cases it was supposed to catch. There is no universally correct threshold; there is only an articulated policy decision — and the fairness panel must monitor either choice. The right test: what failure mode does this organisation prefer, and has HR leadership said so explicitly?

Intervention conflation is the attribution trap. It is tempting to attribute retained employees to the risk model. It is more honest to attribute them to the intervention the risk model surfaced. The model's job is to change the priors on which interventions run where; measuring intervention effectiveness is a separate analysis. The engagements that get this right run matched-cohort evaluation on the interventions, using the same methodology as the L&D ROI whitepaper.

Drift is real and quarterly re-fitting is the floor. An attrition model trained on one labour-market regime decays when conditions change. Quarterly re-fitting against the organisation's own termination data is the baseline cadence; when FRED quit-rate series move sharply, re-fit immediately rather than waiting for the schedule.

Buy the intervention A/B framework; don't build it. A lightweight version is fine for a pilot; for ongoing operation, use something off-the-shelf. The custom version was never the thing worth building.

Where this applies

This pattern works for organisations with enough signal density to score individuals: at least several hundred employees, an HRIS whose data can be queried, and a performance-management system that captures ratings at least annually. It works especially well in regulated industries where the fairness and auditability scaffolding is a prerequisite for deployment.

It does not work at sub-100-headcount organisations (not enough signal, not enough base rate), at organisations that don't capture compensation or performance history in a queryable system, or at organisations whose HR leadership isn't prepared to have direct conversations about compensation and performance based on what the model surfaces.