🎯 Business Problem
Traditional rule-based fraud detection systems face a fundamental limitation: they can only catch fraud patterns they've been explicitly programmed to recognize. This creates a critical vulnerability to zero-day attacks—novel fraud techniques that bypass existing rule sets.
The Detection Gap
Analysis of historical fraud cases revealed a disturbing pattern:
- 40% of confirmed fraud passed all existing validation rules
- 6-8 week lag between first occurrence and rule creation
- €180K average loss per undetected fraud pattern
- Manual review overload: 85% of flagged transactions were false positives
"We were always one step behind fraudsters. By the time we wrote rules for the last attack, they'd already moved to the next technique."
— Head of Risk Management
💡 Solution: Unsupervised Ensemble Detection
Instead of relying on labeled fraud examples, I built an unsupervised anomaly detection system that learns normal transaction patterns and flags statistical outliers—regardless of whether they match known fraud signatures.
Why Unsupervised Learning?
- No Training Labels Required: Fraud is rare (0.17% of transactions), making labeled datasets insufficient
- Adaptive to New Patterns: Detects never-before-seen fraud techniques automatically
- Reduced False Positives: Focuses on statistical deviations instead of rigid rules
Ensemble Architecture
I combined two complementary algorithms to maximize detection coverage:
1. Isolation Forest (Primary Detector)
- Identifies anomalies by measuring how easily data points can be "isolated" from normal clusters
- Excels at catching extreme outliers in high-dimensional data
- Fast training: O(n log n) complexity
2. PCA-Based Reconstruction Error (Secondary Validator)
- Reduces transaction data to principal components
- Flags transactions with high reconstruction error (poor fit to normal patterns)
- Catches subtle multi-feature anomalies that Isolation Forest might miss
Implementation
from sklearn.ensemble import IsolationForest
from sklearn.decomposition import PCA
import numpy as np
# Isolation Forest
iso_forest = IsolationForest(
contamination=0.002, # Expected fraud rate
n_estimators=200,
max_features=10,
random_state=42
)
iso_scores = iso_forest.fit_predict(transaction_features)
# PCA Reconstruction
pca = PCA(n_components=0.95) # 95% variance explained
transformed = pca.fit_transform(transaction_features)
reconstructed = pca.inverse_transform(transformed)
reconstruction_error = np.sum((transaction_features - reconstructed)**2, axis=1)
# Ensemble Decision
anomaly_threshold = np.percentile(reconstruction_error, 99.5)
final_anomalies = (iso_scores == -1) & (reconstruction_error > anomaly_threshold) 📊 Results & Impact
Detection Performance
- 0.17% Anomaly Rate: Flagged 850 suspicious transactions from 500,000 total
- 40% Lift Over Rules: Caught 340 anomalies that passed all existing validation checks
- 68% Precision: Manual review confirmed 578 of 850 flags as genuine fraud/errors
- Sub-2-Second Latency: Real-time scoring for incoming transactions
Novel Fraud Patterns Discovered
- Micro-Transaction Probing:
- Detected automated bots making 50+ small transactions to test stolen card validity
- Pattern: High transaction velocity + low amounts + sequential merchant IDs
- Geographic Velocity Violations:
- Same card used in Portugal and Brazil within 2-hour window (physically impossible)
- Rules only checked country mismatches, not time-distance feasibility
- Behavioral Deviation:
- Long-term customers suddenly purchasing high-value electronics (account takeover indicator)
- Model learned typical spending categories per customer segment
Business Impact
- €420K Prevented Loss: Estimated fraud value blocked in first 6 months
- 72% Faster Investigation: Pre-scored risk levels reduced manual review time
- Compliance Win: Enhanced PCI-DSS audit scores through proactive fraud controls
🔬 Technical Deep Dive
Feature Engineering
Created 23 behavioral features across 4 categories:
Transaction Characteristics (8 features)
- Amount (raw + z-score normalized)
- Transaction hour (cyclical encoding: sin/cos)
- Merchant category code
- Currency + cross-border flag
Velocity Metrics (6 features)
- Transactions in last 1h, 24h, 7 days
- Total spend in last 24h, 7 days
- Unique merchants in last 7 days
Behavioral Patterns (5 features)
- Deviation from user's average transaction amount
- Time since last transaction
- Typical transaction hour consistency score
- Merchant category diversity index
Geographic Features (4 features)
- Distance from user's home location
- Country mismatch with billing address
- IP geolocation consistency
Model Optimization
Hyperparameter tuning focused on two competing metrics:
- Contamination Rate: Tested 0.001 to 0.005 (0.002 optimal)
- n_estimators: Diminishing returns after 200 trees
- max_samples: Auto (√n) provided best generalization
🚧 Challenges & Solutions
Challenge 1: Defining "Normal"
Problem: Legitimate high-value transactions (e.g., luxury purchases) were flagged as anomalies
Solution: Implemented customer segmentation—separate models for retail vs. premium cardholders
Challenge 2: Concept Drift
Problem: User behavior changes over time (e.g., summer travel increases geographic diversity)
Solution: Rolling 90-day training window with weekly model retraining
Challenge 3: Explainability Gap
Problem: Compliance team needed to explain why transactions were flagged
Solution: Added SHAP values to show top contributing features for each anomaly
import shap
# Generate SHAP explanations
explainer = shap.TreeExplainer(iso_forest)
shap_values = explainer.shap_values(suspicious_transaction)
# Top 5 anomaly drivers
feature_importance = pd.DataFrame({
'feature': feature_names,
'impact': np.abs(shap_values)
}).sort_values('impact', ascending=False).head(5) 📈 Monitoring & Continuous Improvement
Production Metrics Dashboard
- Daily Anomaly Rate: Track for sudden spikes (fraud campaigns) or drops (model degradation)
- False Positive Feedback Loop: Manual reviewers label flagged transactions → retrain model monthly
- Feature Drift Detection: Alert when feature distributions shift >2 standard deviations
A/B Testing Results
Controlled experiment: 50% of transactions scored by ensemble, 50% by rules-only
- 28% more fraud caught in ensemble group
- 15% reduction in false positive manual reviews
- Cost-benefit: €4.20 saved per €1 invested in system development
🎓 Key Learnings
What Worked
- Ensemble Approach: Combining Isolation Forest + PCA caught 23% more anomalies than either alone
- Feature Engineering Matters: Velocity metrics were 3x more predictive than transaction amount
- Operational Integration: Embedded in transaction approval flow (not post-hoc analysis)
What I'd Do Differently
- Start with Simpler Model: Initial Random Forest attempt was overkill—Isolation Forest simpler and faster
- Involve Fraud Team Earlier: Their domain knowledge improved feature selection significantly
- Automate Feedback Loop: Manual labeling is bottleneck—should integrate with case management system
🚀 Future Roadmap
- Graph-Based Fraud Networks: Detect organized fraud rings through transaction graph analysis
- Real-Time Feature Streaming: Replace batch ETL with Kafka for sub-100ms scoring
- Active Learning Pipeline: Prioritize human review of highest-uncertainty predictions
- Multi-Channel Integration: Expand beyond card transactions to ACH, wire transfers, cryptocurrency