Skip to content

Safety and Quality

Featrix builds safety mechanisms into every stage of the pipeline. This guide explains how to verify your Foundational Models and predictors are safe and high-quality before deploying to production.

The Safety Philosophy

Featrix follows a core principle: the system must produce useful results on arbitrary data without human intervention, and it must never silently degrade.

Safety is layered throughout the pipeline: 1. Detection: Understands your data before any neural computation 2. Encoding: Prevents pathological values from corrupting training 3. Training: Monitors for gradient problems, collapse, and overfitting 4. Prediction: Warns about out-of-distribution inputs 5. Model Card: Records every decision for auditability

Foundational Model Quality

Training Metrics

After training, check the key metrics:

fm = featrix.foundational_model("session-id")
fm.refresh()

print(f"Status: {fm.status}")
print(f"Final loss: {fm.final_loss}")
print(f"Epochs completed: {fm.epochs}")
print(f"Dimensions: {fm.dimensions}")

Training History

Get detailed training history to spot problems:

metrics = fm.get_training_metrics()

# Loss should decrease over time
print("Loss history:", metrics.get('loss_history'))

# Learning rate schedule
print("LR history:", metrics.get('lr_history'))

What to look for: - Loss should decrease and stabilize - No sudden spikes (gradient problems) - No flat loss (not learning)

Model Card

The model card contains everything Featrix decided about your data:

model_card = fm.get_model_card()

# Column analysis
print("Columns:", model_card.get('columns'))

# Training decisions
print("Training config:", model_card.get('training_config'))

# Quality metrics
print("Quality:", model_card.get('quality_metrics'))

Predictor Quality

Performance Metrics

Check predictor performance:

predictor = fm.list_predictors()[0]

print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
print(f"F1: {predictor.f1:.4f}")

Interpretation: - Accuracy: Overall correct predictions (can be misleading with imbalanced data) - AUC (ROC-AUC): Ability to rank positive cases higher (0.5 = random, 1.0 = perfect) - F1: Balance of precision and recall (important for rare classes)

Training Suggestions

Get automated suggestions for improvement:

# Human-readable report
report = predictor.training_suggestions(as_text=True)
print(report)

# Structured data
suggestions = predictor.training_suggestions()
print(suggestions)

This analyzes:

  • Class balance issues
  • Training convergence
  • Potential overfitting
  • Feature contribution

Detailed Metrics

metrics = predictor.get_metrics()
print(metrics)

Prediction Guardrails

Every prediction includes guardrails that warn about data quality:

result = predictor.predict(record)

# Check for warnings
if result.guardrails:
    for column, warning in result.guardrails.items():
        print(f"Warning on {column}: {warning}")

Guardrail Types

Guardrail Meaning Action
value_outside_training_range Numeric value is outside training distribution Prediction may be extrapolating
unknown_category Categorical value not seen during training Uses fallback encoding
missing_value Expected column is NULL Uses learned null representation
type_mismatch Value type differs from training May be misencoded

Handling Unknown Columns

result = predictor.predict(record)

# Columns in your input that weren't in training
if result.ignored_query_columns:
    print(f"Ignored: {result.ignored_query_columns}")

# Columns the model knows about
print(f"Expected: {result.available_query_columns}")

Data Quality During Training

Column Type Detection

Featrix automatically detects column types. Check what it found:

model_card = fm.get_model_card()
columns = model_card.get('columns', {})

for col_name, col_info in columns.items():
    print(f"{col_name}: {col_info.get('detected_type')}")

Excluded Columns

Featrix may auto-exclude problematic columns:

# Columns excluded from training
excluded = model_card.get('excluded_columns', [])
for col in excluded:
    print(f"Excluded: {col['name']} - Reason: {col['reason']}")

Common exclusion reasons:

  • High uniqueness (likely an ID column)
  • All NULL values
  • Single unique value (no information)
  • Structural patterns (UUIDs, hashes)

Calibration

Featrix calibrates probability predictions so that "0.7 probability" actually means ~70% of such cases are positive.

Checking Calibration

The model card includes calibration information:

model_card = fm.get_model_card()
calibration = model_card.get('calibration', {})
print(f"Calibration method: {calibration.get('method')}")

Confidence Interpretation

The confidence field in predictions is normalized:

  • 0.0 = right at the decision boundary (uncertain)
  • 1.0 = maximally certain
  • Unlike probability, confidence accounts for the threshold
result = predictor.predict(record)

if result.confidence < 0.3:
    print("Low confidence - consider manual review")
elif result.confidence > 0.9:
    print("High confidence")

Production Safety Checklist

Before deploying to production, verify:

1. Model Quality

# Foundational Model trained successfully
assert fm.status == "done"
assert fm.final_loss is not None

# Predictor has acceptable performance
assert predictor.auc >= 0.7  # Adjust threshold for your use case
assert predictor.status == "done"

2. No Excluded Critical Columns

model_card = fm.get_model_card()
excluded = [c['name'] for c in model_card.get('excluded_columns', [])]

critical_columns = ['revenue', 'customer_type']  # Your important columns
for col in critical_columns:
    assert col not in excluded, f"Critical column {col} was excluded!"

3. Sample Predictions Work

# Test with representative samples
test_records = [
    {"age": 25, "income": 30000},  # Young, low income
    {"age": 65, "income": 150000}, # Older, high income
    {"age": 40, "income": None},   # Missing income
]

for record in test_records:
    result = predictor.predict(record)
    assert result.predicted_class is not None
    print(f"Input: {record}")
    print(f"Output: {result.predicted_class} (conf: {result.confidence:.2f})")
    print(f"Guardrails: {result.guardrails}")
    print()

4. Guardrails Don't Fire on Normal Data

# Predictions on training-like data should be clean
normal_record = {"age": 35, "income": 50000}
result = predictor.predict(normal_record)

if result.guardrails:
    print(f"WARNING: Guardrails firing on normal data: {result.guardrails}")

Monitoring in Production

Track Prediction UUIDs

Every prediction has a unique ID:

result = predictor.predict(record)
prediction_uuid = result.prediction_uuid

# Store this with your prediction for later feedback

Send Feedback

When you learn the true outcome:

# From a stored prediction result
feedback = result.send_feedback(ground_truth="actual_label")
feedback.send()

# Or using just the UUID
featrix.prediction_feedback(
    prediction_uuid="stored-uuid",
    ground_truth="actual_label"
)

Watch for Drift

Monitor guardrail frequency in production:

guardrail_counts = {}

for record in production_records:
    result = predictor.predict(record)
    if result.guardrails:
        for col, warning in result.guardrails.items():
            key = f"{col}:{warning}"
            guardrail_counts[key] = guardrail_counts.get(key, 0) + 1

# Alert if guardrails increase
print("Guardrail frequency:", guardrail_counts)

Summary

Safety Layer What It Does How to Check
Type Detection Understands column types model_card['columns']
Encoding Handles missing/invalid values Check excluded columns
Training Monitors for problems training_metrics, final_loss
Calibration Ensures probabilities are accurate model_card['calibration']
Guardrails Warns about OOD inputs result.guardrails
Model Card Records all decisions fm.get_model_card()

Next Steps