Machine Learning Python Scikit-learn XGBoost Plotly

Bank Customer
Churn Prediction
& Cost Optimisation

Most churn models optimise for accuracy. This one optimises for money. Four models compared on actual business cost — every wrong prediction has a price tag, and the model that minimises it wins.

$103.5k
Best Model Cost
XGBoost at threshold 0.23
$28.7k
Cost Reduction
vs Logistic Regression baseline
20.4%
Dataset Churn Rate
2,037 of 10,000 customers
0.867
Best AUC · XGBoost
vs 0.777 Logistic Regression
01 · Problem Statement

Why accuracy is the wrong metric for churn?

An 81% accurate churn model sounds good — until you realise it catches only 19% of actual churners. The model is biased toward the majority class and misses the customers who are about to leave. This project reframes the question: instead of "how accurate is the model?", we ask "what does each wrong prediction cost the business?"

💸
False Negative · $1,000
A missed churner — customer leaves, revenue lost permanently. The costly error we most want to avoid.
📬
False Positive · $100
Unnecessary retention offer sent to someone who wouldn't have churned. Annoying but manageable.
🎯
Objective
Find the model + decision threshold combination that minimises total business cost across the test set.
02 · Exploratory Data Analysis

Who churns, and why?

10,000 bank customers across France, Germany, and Spain. EDA revealed that churn is not random — it clusters around geography, engagement level, age, and product usage.

Germany has a 32.4% churn rate — nearly double France (16.2%) and Spain (16.7%). Inactive members churn at 26.9% vs just 14.3% for active members.
Churned customers average 44.8 years vs 37.4 for retained — a 7.4-year gap. Older, higher-balance customers are the most at-risk segment.
Customers with 3 products churn at 82.7% — a sharp non-linear cliff invisible to Logistic Regression. 4-product churn is 100% but n=60, so interpret cautiously.
03 · Model Development

Three models, one cost framework

Three models were trained and evaluated using a threshold optimisation loop — testing every decision threshold from 0.01 to 0.50 and selecting the one that minimises total business cost. These are the real results from your dataset.

RankModelBest ThresholdMinimum CostSaving vs LR
4Logistic Regression (Balanced)0.36$132,200
3Random Forest0.07$113,700−$18,500
2Gradient Boosting0.09$104,600−$27,600
1 ✓XGBoost0.23$103,500−$28,700
Each curve shows how total cost changes across decision thresholds. XGBoost finds the lowest minimum at threshold 0.23, costing $103,500. Gradient Boosting is close at $104,600 (threshold 0.09). Random Forest: $113,700 (threshold 0.07). Logistic Regression plateaus at $132,200.
XGBoost and Gradient Boosting achieve the highest AUC scores (0.867 and 0.871), significantly above Random Forest (0.856) and Logistic Regression (0.777). Higher AUC confirms their superior ability to discriminate between churners and non-churners at every decision threshold.
04 · Key Insights

What does the data tell the business?

🇩🇪
Germany is the Priority
32.4% churn rate vs ~16% elsewhere. Localised retention campaigns should start here — the gap is too large to be demographic noise.
😴
Inactivity = Top Signal
Inactive members churn at 26.9% vs 14.3% for active. Re-engagement programmes targeting dormant accounts have the highest ROI.
📦
3-Product Cliff
82.7% churn rate for 3-product customers. This non-linear cliff — invisible to Logistic Regression — is the strongest predictor the tree-based models capture.
👴
Age Gap: 7.4 Years
Churned customers average 44.8 years vs 37.4. Older customers who hold high balances are leaving — making early detection especially high-value.
05 · Recommendations

From model output to business action

ActionSupporting Insight
High Priority
Deploy XGBoost at threshold 0.23
This configuration minimises total cost at $103,500 vs $132,200 with the Logistic Regression baseline — a direct saving of $28,700 per scoring cycle on this test set.
High Priority
Re-engage inactive customers immediately
Inactive members churn at 26.9% — nearly double active members. Targeted notifications, offers, or check-in campaigns for dormant accounts address the single largest churn segment.
High Priority
Deploy premium retention in Germany
Germany's 32.4% churn rate is structurally different from France (16.2%) and Spain (16.7%) — and affects a far larger customer base. Localised retention campaigns, offers, and regional root-cause analysis deliver the highest addressable ROI of any single intervention.
Medium Priority
Investigate 3-product customers urgently
82.7% churn rate among 3-product customers (n=266) signals a product design or service delivery issue — not a retention problem. The segment is small but the signal is extreme. Understand the root cause before deploying retention spend here.
Medium Priority
Prioritise high-balance, older customers
Churned customers hold higher average balances and are ~7 years older. Dedicated relationship management or exclusive retention incentives tied to account value protect the highest-revenue cohort.
Dataset · Kaggle Bank Customer Churn 10,000 records · 14 features ⬡ View on GitHub →
← A/B Testing E-Commerce Funnel →