Unsupervised Fraud Detection System - Case Study

🎯 Business Problem

Traditional rule-based fraud detection systems face a fundamental limitation: they can only catch fraud patterns they've been explicitly programmed to recognize. This creates a critical vulnerability to zero-day attacks—novel fraud techniques that bypass existing rule sets.

The Detection Gap

Analysis of historical fraud cases revealed a disturbing pattern:

40% of confirmed fraud passed all existing validation rules
6-8 week lag between first occurrence and rule creation
€180K average loss per undetected fraud pattern
Manual review overload: 85% of flagged transactions were false positives

"We were always one step behind fraudsters. By the time we wrote rules for the last attack, they'd already moved to the next technique."
— Head of Risk Management

💡 Solution: Unsupervised Ensemble Detection

Instead of relying on labeled fraud examples, I built an unsupervised anomaly detection system that learns normal transaction patterns and flags statistical outliers—regardless of whether they match known fraud signatures.

Why Unsupervised Learning?

No Training Labels Required: Fraud is rare (0.17% of transactions), making labeled datasets insufficient
Adaptive to New Patterns: Detects never-before-seen fraud techniques automatically
Reduced False Positives: Focuses on statistical deviations instead of rigid rules

Ensemble Architecture

I combined two complementary algorithms to maximize detection coverage:

1. Isolation Forest (Primary Detector)

Identifies anomalies by measuring how easily data points can be "isolated" from normal clusters
Excels at catching extreme outliers in high-dimensional data
Fast training: O(n log n) complexity

2. PCA-Based Reconstruction Error (Secondary Validator)

Reduces transaction data to principal components
Flags transactions with high reconstruction error (poor fit to normal patterns)
Catches subtle multi-feature anomalies that Isolation Forest might miss

Implementation

from sklearn.ensemble import IsolationForest
from sklearn.decomposition import PCA
import numpy as np

# Isolation Forest
iso_forest = IsolationForest(
    contamination=0.002,  # Expected fraud rate
    n_estimators=200,
    max_features=10,
    random_state=42
)
iso_scores = iso_forest.fit_predict(transaction_features)

# PCA Reconstruction
pca = PCA(n_components=0.95)  # 95% variance explained
transformed = pca.fit_transform(transaction_features)
reconstructed = pca.inverse_transform(transformed)
reconstruction_error = np.sum((transaction_features - reconstructed)**2, axis=1)

# Ensemble Decision
anomaly_threshold = np.percentile(reconstruction_error, 99.5)
final_anomalies = (iso_scores == -1) & (reconstruction_error > anomaly_threshold)

📊 Results & Impact

Detection Performance

0.17% Anomaly Rate: Flagged 850 suspicious transactions from 500,000 total
40% Lift Over Rules: Caught 340 anomalies that passed all existing validation checks
68% Precision: Manual review confirmed 578 of 850 flags as genuine fraud/errors
Sub-2-Second Latency: Real-time scoring for incoming transactions

Novel Fraud Patterns Discovered

Micro-Transaction Probing:
- Detected automated bots making 50+ small transactions to test stolen card validity
- Pattern: High transaction velocity + low amounts + sequential merchant IDs
Geographic Velocity Violations:
- Same card used in Portugal and Brazil within 2-hour window (physically impossible)
- Rules only checked country mismatches, not time-distance feasibility
Behavioral Deviation:
- Long-term customers suddenly purchasing high-value electronics (account takeover indicator)
- Model learned typical spending categories per customer segment

Business Impact

€420K Prevented Loss: Estimated fraud value blocked in first 6 months
72% Faster Investigation: Pre-scored risk levels reduced manual review time
Compliance Win: Enhanced PCI-DSS audit scores through proactive fraud controls

🔬 Technical Deep Dive

Feature Engineering

Created 23 behavioral features across 4 categories:

Transaction Characteristics (8 features)

Amount (raw + z-score normalized)
Transaction hour (cyclical encoding: sin/cos)
Merchant category code
Currency + cross-border flag

Velocity Metrics (6 features)

Transactions in last 1h, 24h, 7 days
Total spend in last 24h, 7 days
Unique merchants in last 7 days

Behavioral Patterns (5 features)

Deviation from user's average transaction amount
Time since last transaction
Typical transaction hour consistency score
Merchant category diversity index

Geographic Features (4 features)

Distance from user's home location
Country mismatch with billing address
IP geolocation consistency

Model Optimization

Hyperparameter tuning focused on two competing metrics:

Contamination Rate: Tested 0.001 to 0.005 (0.002 optimal)
n_estimators: Diminishing returns after 200 trees
max_samples: Auto (√n) provided best generalization

🚧 Challenges & Solutions

Challenge 1: Defining "Normal"

Problem: Legitimate high-value transactions (e.g., luxury purchases) were flagged as anomalies

Solution: Implemented customer segmentation—separate models for retail vs. premium cardholders

Challenge 2: Concept Drift

Problem: User behavior changes over time (e.g., summer travel increases geographic diversity)

Solution: Rolling 90-day training window with weekly model retraining

Challenge 3: Explainability Gap

Problem: Compliance team needed to explain why transactions were flagged

Solution: Added SHAP values to show top contributing features for each anomaly

import shap

# Generate SHAP explanations
explainer = shap.TreeExplainer(iso_forest)
shap_values = explainer.shap_values(suspicious_transaction)

# Top 5 anomaly drivers
feature_importance = pd.DataFrame({
    'feature': feature_names,
    'impact': np.abs(shap_values)
}).sort_values('impact', ascending=False).head(5)

📈 Monitoring & Continuous Improvement

Production Metrics Dashboard

Daily Anomaly Rate: Track for sudden spikes (fraud campaigns) or drops (model degradation)
False Positive Feedback Loop: Manual reviewers label flagged transactions → retrain model monthly
Feature Drift Detection: Alert when feature distributions shift >2 standard deviations

A/B Testing Results

Controlled experiment: 50% of transactions scored by ensemble, 50% by rules-only

28% more fraud caught in ensemble group
15% reduction in false positive manual reviews
Cost-benefit: €4.20 saved per €1 invested in system development

🎓 Key Learnings

What Worked

Ensemble Approach: Combining Isolation Forest + PCA caught 23% more anomalies than either alone
Feature Engineering Matters: Velocity metrics were 3x more predictive than transaction amount
Operational Integration: Embedded in transaction approval flow (not post-hoc analysis)

What I'd Do Differently

Start with Simpler Model: Initial Random Forest attempt was overkill—Isolation Forest simpler and faster
Involve Fraud Team Earlier: Their domain knowledge improved feature selection significantly
Automate Feedback Loop: Manual labeling is bottleneck—should integrate with case management system

🚀 Future Roadmap

Graph-Based Fraud Networks: Detect organized fraud rings through transaction graph analysis
Real-Time Feature Streaming: Replace batch ETL with Kafka for sub-100ms scoring
Active Learning Pipeline: Prioritize human review of highest-uncertainty predictions
Multi-Channel Integration: Expand beyond card transactions to ACH, wire transfers, cryptocurrency