Prediction Results: Built-In Safety¶

Every prediction from Featrix includes safety information. If you pay attention to the warnings and errors, you cannot get silently bad results. The system tells you when something is wrong.

The Problem with Black-Box Predictions¶

Most ML systems return a single number—the prediction—and expect you to trust it blindly:

# Typical ML library
prediction = model.predict(record)  # Returns 0.85
# Is this reliable? Who knows!

What if:

The input data is completely different from training data?
A critical column has an unexpected value?
The model is extrapolating into unknown territory?

Traditional systems don't tell you. They just return a number.

Featrix Tells You Everything¶

Every Featrix prediction includes:

result = predictor.predict(record)

# The prediction
result.predicted_class    # "will_churn"
result.probability        # 0.85
result.confidence         # 0.70  (distance from decision boundary)

# Safety information
result.guardrails         # Per-column warnings
result.ignored_query_columns   # Columns you sent that we don't know
result.available_query_columns # Columns we expected
result.prediction_uuid    # Unique ID for tracking

Guardrails: Per-Column Safety Checks¶

Before making any prediction, Featrix analyzes each input column:

For Numeric Columns¶

The system compares your value to the training distribution:

Zone	Z-Score	What It Means	System Response
Normal	±1σ	Close to average	OK
In Range	±2σ	Normal variation	OK
Outlier	±3σ	Unusual but seen	OK (flagged)
Extreme	±4σ	Rare in training	Warning
Extrapolation	>4σ	Outside training	Warning: "prediction may be less accurate"
Severe	>20σ	Far from training	Warning: "prediction quality uncertain"
Clamped	>100σ	Ridiculous value	Error: "prediction unreliable"

Example:

# Training data had income from $20K-$200K
result = predictor.predict({"income": 50000000})  # $50M

result.guardrails
# {
#     "income": "Error: value is extremely far from training data - prediction unreliable (85.2σ, clamped to 100σ)"
# }

The system won't silently fail. It tells you the prediction is unreliable.

For Categorical Columns¶

The system checks if it has seen the value before:

Situation	System Response
Known value	OK
Null/missing	Warning: "categorical value is (null)"
Unknown value	Warning: "categorical value 'X' is UNKNOWN: expected one of [...]"

Example:

# Training data had countries: ["US", "UK", "Canada", "Mexico"]
result = predictor.predict({"country": "Narnia"})

result.guardrails
# {
#     "country": "Warning: categorical value 'Narnia' is UNKNOWN: expected one of ['US', 'UK', 'Canada', 'Mexico']"
# }

The prediction still runs (using BERT semantic similarity to find the closest known value), but you're warned that this is outside training distribution.

For Unknown Columns¶

If you send columns the model doesn't know about:

result = predictor.predict({
    "age": 35,
    "income": 50000,
    "favorite_color": "blue"  # Model never saw this column
})

result.ignored_query_columns
# ["favorite_color"]

result.available_query_columns
# ["age", "income", "city", "plan_type", ...]

The model ignores unknown columns and tells you which ones.

Probability Calibration¶

Raw neural network outputs are often overconfident or underconfident. Featrix calibrates probabilities so they mean what they say.

If a prediction returns 80% confidence, approximately 80% of similar predictions are actually correct.

Three calibration methods (auto-selected during training):

Method	Best For
Temperature	Models that are uniformly overconfident
Platt Scaling	Binary classification with sigmoid miscalibration
Isotonic	Complex non-linear calibration patterns

The model card records which calibration method was used and its effectiveness.

Confidence vs Probability¶

These are different:

Probability: Raw softmax output for the predicted class (0.0-1.0)
Confidence: How far from the decision boundary (0.0 = right at boundary, 1.0 = maximally certain)

# Example: threshold=0.5, probability=0.9
# confidence = (0.9 - 0.5) / (1.0 - 0.5) = 0.8

# Example: threshold=0.5, probability=0.55
# confidence = (0.55 - 0.5) / (1.0 - 0.5) = 0.1 (low confidence!)

A prediction can have high probability (0.55 > 0.5) but low confidence (only 0.05 above threshold).

Interpreting Confidence Levels¶

Confidence	What It Means	Recommended Action
95%+	Very high confidence	Trust the prediction
80-95%	Confident	Usually correct, minor uncertainty
60-80%	Moderate	Consider additional review
40-60%	Uncertain	Likely needs human review
<40%	Low	Definitely needs review

When to Worry (and When Not To)¶

Don't Worry About¶

Missing columns: The model handles them gracefully with learned null embeddings
Unknown categories with semantic similarity: "Senior Software Engineer" works even if only "Software Engineer" was in training
Minor extrapolation (4-20σ): Predictions are usually fine, just slightly less reliable

Do Worry About¶

Errors in guardrails: These indicate predictions are unreliable
Many ignored columns: The model might be missing critical information
All predictions same class: Check training metrics for embedding collapse
All low confidence: Model may not have converged

Using Safety Information in Production¶

Pattern 1: Reject Unreliable Predictions¶

def safe_predict(predictor, record):
    result = predictor.predict(record)

    # Check for errors in guardrails
    for column, warning in (result.guardrails or {}).items():
        if warning.startswith("Error:"):
            return {
                "prediction": None,
                "rejected": True,
                "reason": f"Column '{column}': {warning}"
            }

    return {
        "prediction": result.predicted_class,
        "confidence": result.confidence,
        "warnings": result.guardrails
    }

Pattern 2: Route by Confidence¶

def route_prediction(predictor, record):
    result = predictor.predict(record)

    if result.guardrails and any(w.startswith("Error:") for w in result.guardrails.values()):
        return "human_review"  # Unreliable prediction

    if result.confidence > 0.95:
        return "auto_approve"
    elif result.confidence > 0.70:
        return "standard_review"
    else:
        return "human_review"

Pattern 3: Log Everything for Analysis¶

def predict_with_logging(predictor, record, request_id):
    result = predictor.predict(record)

    log_entry = {
        "request_id": request_id,
        "prediction_uuid": result.prediction_uuid,
        "predicted_class": result.predicted_class,
        "confidence": result.confidence,
        "guardrails": result.guardrails,
        "ignored_columns": result.ignored_query_columns
    }

    # Log for later analysis
    logger.info(json.dumps(log_entry))

    return result

The Model Card: Training Quality Warnings¶

The model card includes warnings from training:

model_card = predictor.get_model_card()

# Check training quality
if model_card.get("training_quality_warning"):
    print(f"Warning: {model_card['training_quality_warning']}")

# Check for known issues
for warning in model_card.get("warnings", []):
    print(f"Training issue: {warning}")

Training warnings might include: - Class imbalance detected - Embedding collapse during training - Validation loss still decreasing (might benefit from more epochs) - Per-class recall issues (one class has very low recall)

Summary: You Can't Get Silently Bad Results¶

Featrix predictions are transparent:

Guardrails tell you about input data issues (per column)
Confidence tells you how certain the model is
Calibration ensures probabilities mean what they say
Ignored columns tells you what the model couldn't use
Model card tells you about training quality issues

If you check the guardrails and confidence, you always know when to trust a prediction and when to escalate to human review.

This is the difference between "the model said 0.85" and "the model is 85% confident, with no guardrail warnings, using a well-calibrated probability distribution from a training run with no quality issues."

The second one is actionable. The first is gambling.