01 · Problem Statement
Why accuracy is the wrong metric for churn?
An 81% accurate churn model sounds good — until you realise it catches only 19% of actual churners. The model is biased toward the majority class and misses the customers who are about to leave. This project reframes the question: instead of "how accurate is the model?", we ask "what does each wrong prediction cost the business?"
💸
False Negative · $1,000
A missed churner — customer leaves, revenue lost permanently. The costly error we most want to avoid.
📬
False Positive · $100
Unnecessary retention offer sent to someone who wouldn't have churned. Annoying but manageable.
🎯
Objective
Find the model + decision threshold combination that minimises total business cost across the test set.
02 · Exploratory Data Analysis
Who churns, and why?
10,000 bank customers across France, Germany, and Spain. EDA revealed that churn is not random — it clusters around geography, engagement level, age, and product usage.
Germany has a 32.4% churn rate — nearly double France (16.2%) and Spain (16.7%). Inactive members churn at 26.9% vs just 14.3% for active members.
Churned customers average 44.8 years vs 37.4 for retained — a 7.4-year gap. Older, higher-balance customers are the most at-risk segment.
Customers with 3 products churn at 82.7% — a sharp non-linear cliff invisible to Logistic Regression. 4-product churn is 100% but n=60, so interpret cautiously.
03 · Model Development
Three models, one cost framework
Three models were trained and evaluated using a threshold optimisation loop — testing every decision threshold from 0.01 to 0.50 and selecting the one that minimises total business cost. These are the real results from your dataset.
| Rank | Model | Best Threshold | Minimum Cost | Saving vs LR |
| 4 | Logistic Regression (Balanced) | 0.36 | $132,200 | — |
| 3 | Random Forest | 0.07 | $113,700 | −$18,500 |
| 2 | Gradient Boosting | 0.09 | $104,600 | −$27,600 |
| 1 ✓ | XGBoost | 0.23 | $103,500 | −$28,700 |
Each curve shows how total cost changes across decision thresholds. XGBoost finds the lowest minimum at threshold 0.23, costing $103,500. Gradient Boosting is close at $104,600 (threshold 0.09). Random Forest: $113,700 (threshold 0.07). Logistic Regression plateaus at $132,200.
XGBoost and Gradient Boosting achieve the highest AUC scores (0.867 and 0.871), significantly above Random Forest (0.856) and Logistic Regression (0.777). Higher AUC confirms their superior ability to discriminate between churners and non-churners at every decision threshold.
04 · Key Insights
What does the data tell the business?
🇩🇪
Germany is the Priority
32.4% churn rate vs ~16% elsewhere. Localised retention campaigns should start here — the gap is too large to be demographic noise.
😴
Inactivity = Top Signal
Inactive members churn at 26.9% vs 14.3% for active. Re-engagement programmes targeting dormant accounts have the highest ROI.
📦
3-Product Cliff
82.7% churn rate for 3-product customers. This non-linear cliff — invisible to Logistic Regression — is the strongest predictor the tree-based models capture.
👴
Age Gap: 7.4 Years
Churned customers average 44.8 years vs 37.4. Older customers who hold high balances are leaving — making early detection especially high-value.
05 · Recommendations
From model output to business action
| Action | Supporting Insight |
High Priority Deploy XGBoost at threshold 0.23 |
This configuration minimises total cost at $103,500 vs $132,200 with the Logistic Regression baseline — a direct saving of $28,700 per scoring cycle on this test set. |
High Priority Re-engage inactive customers immediately |
Inactive members churn at 26.9% — nearly double active members. Targeted notifications, offers, or check-in campaigns for dormant accounts address the single largest churn segment. |
High Priority Deploy premium retention in Germany |
Germany's 32.4% churn rate is structurally different from France (16.2%) and Spain (16.7%) — and affects a far larger customer base. Localised retention campaigns, offers, and regional root-cause analysis deliver the highest addressable ROI of any single intervention. |
Medium Priority Investigate 3-product customers urgently |
82.7% churn rate among 3-product customers (n=266) signals a product design or service delivery issue — not a retention problem. The segment is small but the signal is extreme. Understand the root cause before deploying retention spend here. |
Medium Priority Prioritise high-balance, older customers |
Churned customers hold higher average balances and are ~7 years older. Dedicated relationship management or exclusive retention incentives tied to account value protect the highest-revenue cohort. |