Skip to content

Use Case: Fraud Detection

Detect fraudulent transactions in real-time with cost-optimized decision thresholds.

When to Use This

  • Payment fraud detection
  • Insurance claim fraud
  • Account takeover detection
  • Credit application fraud
  • Any binary classification with high cost asymmetry

Complete Implementation

from featrixsphere.api import FeatrixSphere

featrix = FeatrixSphere()

# 1. Create Foundational Model from transaction data
fm = featrix.create_foundational_model(
    name="fraud_detection_model",
    data_file="transactions.csv",
    ignore_columns=["transaction_id", "timestamp", "user_id"]
)
fm.wait_for_training()

# 2. Create cost-sensitive classifier
#    Fraud detection has extreme cost asymmetry:
#    - Missing fraud: $5000 average loss
#    - False positive: $10 investigation cost + customer friction
predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    name="fraud_detector_v1",
    rare_label_value="fraud",         # Fraud is the rare class
    cost_false_negative=5000,         # Average fraud loss
    cost_false_positive=10            # Investigation + friction cost
)
predictor.wait_for_training()

print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")

# 3. Real-time prediction
transaction = {
    "amount": 2500.0,
    "merchant_category": "electronics",
    "distance_from_home": 150.0,
    "time_since_last_transaction": 5,  # minutes
    "transaction_hour": 3,              # 3 AM
    "is_international": True,
    "device_change": True
}

result = predictor.predict(transaction)

if result.predicted_class == "fraud":
    print(f"FRAUD ALERT - Confidence: {result.confidence:.2%}")
    print(f"Block transaction and review")
else:
    print(f"Transaction approved - Fraud probability: {result.probability:.2%}")

# 4. Batch scoring for historical analysis
import pandas as pd

historical = pd.read_csv("historical_transactions.csv")
results = predictor.batch_predict(historical, show_progress=True)

# Analyze fraud patterns
fraud_predictions = [r for r in results if r.predicted_class == "fraud"]
print(f"Flagged {len(fraud_predictions)} potential frauds out of {len(results)} transactions")

# 5. Production deployment
endpoint = predictor.create_api_endpoint(
    name="fraud_api_v1",
    description="Real-time fraud detection endpoint"
)

# 6. Set up alerts
predictor.configure_webhooks(
    alert_drift="https://your-alert-system.com/fraud-drift",
    alert_error_rate="https://your-alert-system.com/fraud-errors",
    alert_performance_degradation="https://your-alert-system.com/fraud-perf"
)

# 7. Publish
fm.publish(org_id="my_org", name="fraud_model_v1")

Handling Extreme Class Imbalance

Fraud is typically rare (0.1% - 2% of transactions). Featrix handles this automatically, but you can specify production rates:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="fraud",
    class_imbalance={"fraud": 0.005, "legitimate": 0.995}  # 0.5% fraud rate
)

Cost-Optimal Thresholds

The decision threshold is automatically optimized based on your cost parameters:

result = predictor.predict(transaction)
print(f"Decision threshold: {result.threshold}")  # Might be 0.02 instead of 0.5

With costs of $5000 false negative vs $10 false positive, the optimal threshold is very low - we'd rather have false alarms than miss fraud.

Real-Time Decision Flow

def process_transaction(transaction):
    result = predictor.predict(transaction)

    if result.predicted_class == "fraud":
        if result.confidence > 0.9:
            return "BLOCK"      # High confidence fraud
        elif result.confidence > 0.5:
            return "REVIEW"     # Medium confidence, needs review
        else:
            return "FLAG"       # Low confidence, flag but allow
    else:
        return "APPROVE"

action = process_transaction(transaction)

Feature Importance for Investigation

result = predictor.predict(transaction, feature_importance=True)

if result.predicted_class == "fraud":
    print("Fraud indicators:")
    for feature, importance in sorted(
        result.feature_importance.items(),
        key=lambda x: x[1],
        reverse=True
    )[:5]:
        print(f"  {feature}: {importance:+.3f}")

Example output:

Fraud indicators:
  transaction_hour: +0.82        # 3 AM transaction
  device_change: +0.65           # New device
  is_international: +0.45        # International transaction
  distance_from_home: +0.38      # Far from home
  amount: +0.25                  # High amount

Guardrails

Featrix warns about unusual input values:

result = predictor.predict({
    "amount": 999999.0,          # Unusually high
    "transaction_hour": 25,      # Invalid hour
    "merchant_category": "unknown"
})

if result.guardrails:
    print("Input warnings:")
    for column, warning in result.guardrails.items():
        print(f"  {column}: {warning}")

Sending Fraud Investigation Results

After investigating flagged transactions:

# Transaction was confirmed fraud
featrix.prediction_feedback(
    prediction_uuid=result.prediction_uuid,
    ground_truth="fraud"
)

# Transaction was legitimate (false positive)
featrix.prediction_feedback(
    prediction_uuid=result.prediction_uuid,
    ground_truth="legitimate"
)

Production API

Low-Latency Integration

import requests

def check_fraud(transaction, api_key, endpoint_url):
    response = requests.post(
        f"{endpoint_url}/predict",
        headers={"X-API-Key": api_key},
        json=transaction,
        timeout=0.5  # 500ms timeout for real-time
    )
    result = response.json()
    return result["predicted_class"] == "fraud"

Response Format

{
  "predicted_class": "fraud",
  "probability": 0.92,
  "confidence": 0.85,
  "probabilities": {"fraud": 0.92, "legitimate": 0.08},
  "threshold": 0.02,
  "prediction_uuid": "550e8400-e29b-41d4-a716-446655440000",
  "guardrails": {}
}

Best Practices

  1. Set costs carefully - Average fraud loss vs investigation cost drives threshold
  2. Handle extreme imbalance - Specify class_imbalance if production differs from training
  3. Use confidence tiers - High/medium/low confidence → different actions
  4. Monitor constantly - Fraud patterns change; set up drift alerts
  5. Send all feedback - Both confirmed fraud AND false positives improve the model
  6. Log prediction UUIDs - Essential for feedback and investigation audit trails