Safety and Quality¶
Featrix builds safety mechanisms into every stage of the pipeline. This guide explains how to verify your Foundational Models and predictors are safe and high-quality before deploying to production.
The Safety Philosophy¶
Featrix follows a core principle: the system must produce useful results on arbitrary data without human intervention, and it must never silently degrade.
Safety is layered throughout the pipeline: 1. Detection: Understands your data before any neural computation 2. Encoding: Prevents pathological values from corrupting training 3. Training: Monitors for gradient problems, collapse, and overfitting 4. Prediction: Warns about out-of-distribution inputs 5. Model Card: Records every decision for auditability
Foundational Model Quality¶
Training Metrics¶
After training, check the key metrics:
fm = featrix.foundational_model("session-id")
fm.refresh()
print(f"Status: {fm.status}")
print(f"Final loss: {fm.final_loss}")
print(f"Epochs completed: {fm.epochs}")
print(f"Dimensions: {fm.dimensions}")
Training History¶
Get detailed training history to spot problems:
metrics = fm.get_training_metrics()
# Loss should decrease over time
print("Loss history:", metrics.get('loss_history'))
# Learning rate schedule
print("LR history:", metrics.get('lr_history'))
What to look for: - Loss should decrease and stabilize - No sudden spikes (gradient problems) - No flat loss (not learning)
Model Card¶
The model card contains everything Featrix decided about your data:
model_card = fm.get_model_card()
# Column analysis
print("Columns:", model_card.get('columns'))
# Training decisions
print("Training config:", model_card.get('training_config'))
# Quality metrics
print("Quality:", model_card.get('quality_metrics'))
Predictor Quality¶
Performance Metrics¶
Check predictor performance:
predictor = fm.list_predictors()[0]
print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
print(f"F1: {predictor.f1:.4f}")
Interpretation: - Accuracy: Overall correct predictions (can be misleading with imbalanced data) - AUC (ROC-AUC): Ability to rank positive cases higher (0.5 = random, 1.0 = perfect) - F1: Balance of precision and recall (important for rare classes)
Training Suggestions¶
Get automated suggestions for improvement:
# Human-readable report
report = predictor.training_suggestions(as_text=True)
print(report)
# Structured data
suggestions = predictor.training_suggestions()
print(suggestions)
This analyzes:
- Class balance issues
- Training convergence
- Potential overfitting
- Feature contribution
Detailed Metrics¶
Prediction Guardrails¶
Every prediction includes guardrails that warn about data quality:
result = predictor.predict(record)
# Check for warnings
if result.guardrails:
for column, warning in result.guardrails.items():
print(f"Warning on {column}: {warning}")
Guardrail Types¶
| Guardrail | Meaning | Action |
|---|---|---|
value_outside_training_range |
Numeric value is outside training distribution | Prediction may be extrapolating |
unknown_category |
Categorical value not seen during training | Uses fallback encoding |
missing_value |
Expected column is NULL | Uses learned null representation |
type_mismatch |
Value type differs from training | May be misencoded |
Handling Unknown Columns¶
result = predictor.predict(record)
# Columns in your input that weren't in training
if result.ignored_query_columns:
print(f"Ignored: {result.ignored_query_columns}")
# Columns the model knows about
print(f"Expected: {result.available_query_columns}")
Data Quality During Training¶
Column Type Detection¶
Featrix automatically detects column types. Check what it found:
model_card = fm.get_model_card()
columns = model_card.get('columns', {})
for col_name, col_info in columns.items():
print(f"{col_name}: {col_info.get('detected_type')}")
Excluded Columns¶
Featrix may auto-exclude problematic columns:
# Columns excluded from training
excluded = model_card.get('excluded_columns', [])
for col in excluded:
print(f"Excluded: {col['name']} - Reason: {col['reason']}")
Common exclusion reasons:
- High uniqueness (likely an ID column)
- All NULL values
- Single unique value (no information)
- Structural patterns (UUIDs, hashes)
Calibration¶
Featrix calibrates probability predictions so that "0.7 probability" actually means ~70% of such cases are positive.
Checking Calibration¶
The model card includes calibration information:
model_card = fm.get_model_card()
calibration = model_card.get('calibration', {})
print(f"Calibration method: {calibration.get('method')}")
Confidence Interpretation¶
The confidence field in predictions is normalized:
- 0.0 = right at the decision boundary (uncertain)
- 1.0 = maximally certain
- Unlike probability, confidence accounts for the threshold
result = predictor.predict(record)
if result.confidence < 0.3:
print("Low confidence - consider manual review")
elif result.confidence > 0.9:
print("High confidence")
Production Safety Checklist¶
Before deploying to production, verify:
1. Model Quality¶
# Foundational Model trained successfully
assert fm.status == "done"
assert fm.final_loss is not None
# Predictor has acceptable performance
assert predictor.auc >= 0.7 # Adjust threshold for your use case
assert predictor.status == "done"
2. No Excluded Critical Columns¶
model_card = fm.get_model_card()
excluded = [c['name'] for c in model_card.get('excluded_columns', [])]
critical_columns = ['revenue', 'customer_type'] # Your important columns
for col in critical_columns:
assert col not in excluded, f"Critical column {col} was excluded!"
3. Sample Predictions Work¶
# Test with representative samples
test_records = [
{"age": 25, "income": 30000}, # Young, low income
{"age": 65, "income": 150000}, # Older, high income
{"age": 40, "income": None}, # Missing income
]
for record in test_records:
result = predictor.predict(record)
assert result.predicted_class is not None
print(f"Input: {record}")
print(f"Output: {result.predicted_class} (conf: {result.confidence:.2f})")
print(f"Guardrails: {result.guardrails}")
print()
4. Guardrails Don't Fire on Normal Data¶
# Predictions on training-like data should be clean
normal_record = {"age": 35, "income": 50000}
result = predictor.predict(normal_record)
if result.guardrails:
print(f"WARNING: Guardrails firing on normal data: {result.guardrails}")
Monitoring in Production¶
Track Prediction UUIDs¶
Every prediction has a unique ID:
result = predictor.predict(record)
prediction_uuid = result.prediction_uuid
# Store this with your prediction for later feedback
Send Feedback¶
When you learn the true outcome:
# From a stored prediction result
feedback = result.send_feedback(ground_truth="actual_label")
feedback.send()
# Or using just the UUID
featrix.prediction_feedback(
prediction_uuid="stored-uuid",
ground_truth="actual_label"
)
Watch for Drift¶
Monitor guardrail frequency in production:
guardrail_counts = {}
for record in production_records:
result = predictor.predict(record)
if result.guardrails:
for col, warning in result.guardrails.items():
key = f"{col}:{warning}"
guardrail_counts[key] = guardrail_counts.get(key, 0) + 1
# Alert if guardrails increase
print("Guardrail frequency:", guardrail_counts)
Summary¶
| Safety Layer | What It Does | How to Check |
|---|---|---|
| Type Detection | Understands column types | model_card['columns'] |
| Encoding | Handles missing/invalid values | Check excluded columns |
| Training | Monitors for problems | training_metrics, final_loss |
| Calibration | Ensures probabilities are accurate | model_card['calibration'] |
| Guardrails | Warns about OOD inputs | result.guardrails |
| Model Card | Records all decisions | fm.get_model_card() |
Next Steps¶
- Publish and monitor your model in production
- Set up webhooks for drift alerts
- Implement prediction feedback loops