Training Predictors¶

A Predictor is a classifier or regressor trained on top of a Foundational Model. Because the Foundational Model already understands your data, predictors train quickly (2-5 minutes) and achieve high accuracy.

Quick Start¶

from featrixsphere.api import FeatrixSphere

featrix = FeatrixSphere()

# Get your Foundational Model
fm = featrix.foundational_model("your-session-id")

# Create a binary classifier
predictor = fm.create_binary_classifier(target_column="churned")
predictor.wait_for_training()

print(f"Accuracy: {predictor.accuracy}")
print(f"AUC: {predictor.auc}")

Predictor Types¶

Binary Classifier¶

For target columns with exactly 2 classes (yes/no, true/false, 0/1):

predictor = fm.create_binary_classifier(
    target_column="churned",
    name="churn_predictor"
)
predictor.wait_for_training()

Multiclass Classifier¶

For target columns with 3 or more classes:

predictor = fm.create_multi_classifier(
    target_column="product_category",
    name="category_predictor"
)
predictor.wait_for_training()

Regressor¶

For numeric target columns (prices, scores, quantities):

predictor = fm.create_regressor(
    target_column="price",
    name="price_predictor"
)
predictor.wait_for_training()

Handling Class Imbalance¶

Specifying the Rare Class¶

For binary classification with imbalanced classes, tell Featrix which class is the minority:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="fraud"  # The minority class
)

This enables:

Focal loss (focuses learning on hard cases)
Automatic class weighting
Better threshold optimization

Cost-Sensitive Classification¶

When false positives and false negatives have different business costs:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="fraud",
    cost_false_positive=100,    # Cost per false alarm ($100)
    cost_false_negative=5000    # Cost per missed fraud ($5000)
)

Featrix will compute the Bayes-optimal decision threshold that minimizes total cost.

Real-World Class Distribution¶

If your training data was resampled but production has different class frequencies:

predictor = fm.create_binary_classifier(
    target_column="approved",
    class_imbalance={
        "approved": 0.97,
        "rejected": 0.03
    }
)

This calibrates predictions to the real-world distribution.

Training with Separate Labels¶

If your labels are in a separate file:

# From CSV file
predictor = fm.create_binary_classifier(
    target_column="churned",
    labels_file="labels.csv"
)

# From DataFrame
predictor = fm.create_binary_classifier(
    target_column="churned",
    labels_df=labels_dataframe
)

The labels file must have a column matching target_column and be aligned with the training data.

Training Configuration¶

Epochs¶

By default, Featrix determines the optimal number of epochs. You can override:

predictor = fm.create_binary_classifier(
    target_column="churned",
    epochs=100  # Force 100 epochs (0 = automatic)
)

Webhooks¶

Get notified when training completes:

predictor = fm.create_binary_classifier(
    target_column="churned",
    webhooks={
        "training_finished": "https://your-server.com/predictor-done"
    }
)

Waiting for Training¶

Predictor training typically takes 2-5 minutes:

predictor = fm.create_binary_classifier(target_column="churned")

predictor.wait_for_training(
    max_wait_time=600,      # Maximum wait: 10 minutes
    poll_interval=5,        # Check every 5 seconds
    show_progress=True      # Print progress updates
)

Predictor Attributes¶

After training completes:

print(predictor.id)              # Predictor ID
print(predictor.session_id)      # Parent Foundational Model session
print(predictor.target_column)   # Target column name
print(predictor.target_type)     # "set" (classification) or "numeric" (regression)
print(predictor.status)          # "done"
print(predictor.accuracy)        # Training accuracy
print(predictor.auc)             # ROC-AUC score (classification)
print(predictor.f1)              # F1 score (classification)

Listing Predictors¶

Get all predictors for a Foundational Model:

predictors = fm.list_predictors()
for p in predictors:
    print(f"{p.target_column}: accuracy={p.accuracy:.4f}, status={p.status}")

Continue Training¶

If you want to train more epochs:

predictor.train_more(epochs=50)
predictor.wait_for_training()

Training Suggestions¶

Get suggestions for improving performance:

# As structured data
suggestions = predictor.training_suggestions()
print(suggestions)

# As human-readable text
text_report = predictor.training_suggestions(as_text=True)
print(text_report)

Training Metrics¶

Get detailed training history:

metrics = predictor.get_metrics()
print(metrics)

Model Cards¶

Every trained model generates a comprehensive model card—a structured JSON document containing everything about the model: architecture details, training data statistics, performance metrics, quality checks, and column importance scores.

# Get the model card for a foundational model
model_card = fm.get_model_card()

# Model cards include:
# - Model architecture (layers, parameters, dimensions)
# - Training data statistics (rows, splits, column types)
# - Performance metrics (loss curves, validation results)
# - Quality checks (overfitting, collapse detection, gradient health)
# - Column importance scores

To render model cards as HTML reports, use the Featrix Model Card Renderer:

pip install featrix-model-card

# Render to HTML
featrix-model-card render model_card.json -o report.html

Multiple Predictors¶

You can create multiple predictors on the same Foundational Model:

# Churn predictor
churn_predictor = fm.create_binary_classifier(target_column="churned")

# Lifetime value predictor
ltv_predictor = fm.create_regressor(target_column="lifetime_value")

# Segment predictor
segment_predictor = fm.create_multi_classifier(target_column="customer_segment")

# Wait for all
churn_predictor.wait_for_training()
ltv_predictor.wait_for_training()
segment_predictor.wait_for_training()

Each predictor trains independently on the same foundation—no need to retrain the Foundational Model.

Best Practices¶

1. Exclude the Target from the Foundational Model¶

When creating the Foundational Model, exclude columns you'll predict:

fm = featrix.create_foundational_model(
    data_file="data.csv",
    ignore_columns=["churned", "lifetime_value"]  # Prediction targets
)

This prevents information leakage.

2. Use Cost-Sensitive Thresholds for Business Decisions¶

Don't just accept the default 0.5 threshold. Think about business costs:

predictor = fm.create_binary_classifier(
    target_column="should_call_customer",
    cost_false_positive=10,    # Cost of unnecessary call
    cost_false_negative=500    # Cost of missing an at-risk customer
)

3. Specify Rare Class for Imbalanced Data¶

For fraud detection, churn prediction, or any rare event:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="yes"  # Tell Featrix which is rare
)

4. Check Metrics Before Deploying¶

Always review accuracy, AUC, and F1 before using a predictor:

print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
print(f"F1: {predictor.f1:.4f}")

# Get suggestions if performance is low
if predictor.auc < 0.8:
    print(predictor.training_suggestions(as_text=True))

Next Steps¶

Run predictions with your trained predictor
Check safety and quality metrics
Publish and monitor in production