Use Case: Fraud Detection¶
Detect fraudulent transactions in real-time with cost-optimized decision thresholds.
When to Use This¶
- Payment fraud detection
- Insurance claim fraud
- Account takeover detection
- Credit application fraud
- Any binary classification with high cost asymmetry
Complete Implementation¶
from featrixsphere.api import FeatrixSphere
featrix = FeatrixSphere()
# 1. Create Foundational Model from transaction data
fm = featrix.create_foundational_model(
name="fraud_detection_model",
data_file="transactions.csv",
ignore_columns=["transaction_id", "timestamp", "user_id"]
)
fm.wait_for_training()
# 2. Create cost-sensitive classifier
# Fraud detection has extreme cost asymmetry:
# - Missing fraud: $5000 average loss
# - False positive: $10 investigation cost + customer friction
predictor = fm.create_binary_classifier(
target_column="is_fraud",
name="fraud_detector_v1",
rare_label_value="fraud", # Fraud is the rare class
cost_false_negative=5000, # Average fraud loss
cost_false_positive=10 # Investigation + friction cost
)
predictor.wait_for_training()
print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
# 3. Real-time prediction
transaction = {
"amount": 2500.0,
"merchant_category": "electronics",
"distance_from_home": 150.0,
"time_since_last_transaction": 5, # minutes
"transaction_hour": 3, # 3 AM
"is_international": True,
"device_change": True
}
result = predictor.predict(transaction)
if result.predicted_class == "fraud":
print(f"FRAUD ALERT - Confidence: {result.confidence:.2%}")
print(f"Block transaction and review")
else:
print(f"Transaction approved - Fraud probability: {result.probability:.2%}")
# 4. Batch scoring for historical analysis
import pandas as pd
historical = pd.read_csv("historical_transactions.csv")
results = predictor.batch_predict(historical, show_progress=True)
# Analyze fraud patterns
fraud_predictions = [r for r in results if r.predicted_class == "fraud"]
print(f"Flagged {len(fraud_predictions)} potential frauds out of {len(results)} transactions")
# 5. Production deployment
endpoint = predictor.create_api_endpoint(
name="fraud_api_v1",
description="Real-time fraud detection endpoint"
)
# 6. Set up alerts
predictor.configure_webhooks(
alert_drift="https://your-alert-system.com/fraud-drift",
alert_error_rate="https://your-alert-system.com/fraud-errors",
alert_performance_degradation="https://your-alert-system.com/fraud-perf"
)
# 7. Publish
fm.publish(org_id="my_org", name="fraud_model_v1")
Handling Extreme Class Imbalance¶
Fraud is typically rare (0.1% - 2% of transactions). Featrix handles this automatically, but you can specify production rates:
predictor = fm.create_binary_classifier(
target_column="is_fraud",
rare_label_value="fraud",
class_imbalance={"fraud": 0.005, "legitimate": 0.995} # 0.5% fraud rate
)
Cost-Optimal Thresholds¶
The decision threshold is automatically optimized based on your cost parameters:
result = predictor.predict(transaction)
print(f"Decision threshold: {result.threshold}") # Might be 0.02 instead of 0.5
With costs of $5000 false negative vs $10 false positive, the optimal threshold is very low - we'd rather have false alarms than miss fraud.
Real-Time Decision Flow¶
def process_transaction(transaction):
result = predictor.predict(transaction)
if result.predicted_class == "fraud":
if result.confidence > 0.9:
return "BLOCK" # High confidence fraud
elif result.confidence > 0.5:
return "REVIEW" # Medium confidence, needs review
else:
return "FLAG" # Low confidence, flag but allow
else:
return "APPROVE"
action = process_transaction(transaction)
Feature Importance for Investigation¶
result = predictor.predict(transaction, feature_importance=True)
if result.predicted_class == "fraud":
print("Fraud indicators:")
for feature, importance in sorted(
result.feature_importance.items(),
key=lambda x: x[1],
reverse=True
)[:5]:
print(f" {feature}: {importance:+.3f}")
Example output:
Fraud indicators:
transaction_hour: +0.82 # 3 AM transaction
device_change: +0.65 # New device
is_international: +0.45 # International transaction
distance_from_home: +0.38 # Far from home
amount: +0.25 # High amount
Guardrails¶
Featrix warns about unusual input values:
result = predictor.predict({
"amount": 999999.0, # Unusually high
"transaction_hour": 25, # Invalid hour
"merchant_category": "unknown"
})
if result.guardrails:
print("Input warnings:")
for column, warning in result.guardrails.items():
print(f" {column}: {warning}")
Sending Fraud Investigation Results¶
After investigating flagged transactions:
# Transaction was confirmed fraud
featrix.prediction_feedback(
prediction_uuid=result.prediction_uuid,
ground_truth="fraud"
)
# Transaction was legitimate (false positive)
featrix.prediction_feedback(
prediction_uuid=result.prediction_uuid,
ground_truth="legitimate"
)
Production API¶
Low-Latency Integration¶
import requests
def check_fraud(transaction, api_key, endpoint_url):
response = requests.post(
f"{endpoint_url}/predict",
headers={"X-API-Key": api_key},
json=transaction,
timeout=0.5 # 500ms timeout for real-time
)
result = response.json()
return result["predicted_class"] == "fraud"
Response Format¶
{
"predicted_class": "fraud",
"probability": 0.92,
"confidence": 0.85,
"probabilities": {"fraud": 0.92, "legitimate": 0.08},
"threshold": 0.02,
"prediction_uuid": "550e8400-e29b-41d4-a716-446655440000",
"guardrails": {}
}
Best Practices¶
- Set costs carefully - Average fraud loss vs investigation cost drives threshold
- Handle extreme imbalance - Specify
class_imbalanceif production differs from training - Use confidence tiers - High/medium/low confidence → different actions
- Monitor constantly - Fraud patterns change; set up drift alerts
- Send all feedback - Both confirmed fraud AND false positives improve the model
- Log prediction UUIDs - Essential for feedback and investigation audit trails