Skip to content

Training Predictors

A Predictor is a classifier or regressor trained on top of a Foundational Model. Because the Foundational Model already understands your data, predictors train quickly (2-5 minutes) and achieve high accuracy.

Quick Start

from featrixsphere.api import FeatrixSphere

featrix = FeatrixSphere()

# Get your Foundational Model
fm = featrix.foundational_model("your-session-id")

# Create a binary classifier
predictor = fm.create_binary_classifier(target_column="churned")
predictor.wait_for_training()

print(f"Accuracy: {predictor.accuracy}")
print(f"AUC: {predictor.auc}")

Predictor Types

Binary Classifier

For target columns with exactly 2 classes (yes/no, true/false, 0/1):

predictor = fm.create_binary_classifier(
    target_column="churned",
    name="churn_predictor"
)
predictor.wait_for_training()

Multiclass Classifier

For target columns with 3 or more classes:

predictor = fm.create_multi_classifier(
    target_column="product_category",
    name="category_predictor"
)
predictor.wait_for_training()

Regressor

For numeric target columns (prices, scores, quantities):

predictor = fm.create_regressor(
    target_column="price",
    name="price_predictor"
)
predictor.wait_for_training()

Handling Class Imbalance

Specifying the Rare Class

For binary classification with imbalanced classes, tell Featrix which class is the minority:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="fraud"  # The minority class
)

This enables:

  • Focal loss (focuses learning on hard cases)
  • Automatic class weighting
  • Better threshold optimization

Cost-Sensitive Classification

When false positives and false negatives have different business costs:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="fraud",
    cost_false_positive=100,    # Cost per false alarm ($100)
    cost_false_negative=5000    # Cost per missed fraud ($5000)
)

Featrix will compute the Bayes-optimal decision threshold that minimizes total cost.

Real-World Class Distribution

If your training data was resampled but production has different class frequencies:

predictor = fm.create_binary_classifier(
    target_column="approved",
    class_imbalance={
        "approved": 0.97,
        "rejected": 0.03
    }
)

This calibrates predictions to the real-world distribution.

Training with Separate Labels

If your labels are in a separate file:

# From CSV file
predictor = fm.create_binary_classifier(
    target_column="churned",
    labels_file="labels.csv"
)

# From DataFrame
predictor = fm.create_binary_classifier(
    target_column="churned",
    labels_df=labels_dataframe
)

The labels file must have a column matching target_column and be aligned with the training data.

Training Configuration

Epochs

By default, Featrix determines the optimal number of epochs. You can override:

predictor = fm.create_binary_classifier(
    target_column="churned",
    epochs=100  # Force 100 epochs (0 = automatic)
)

Webhooks

Get notified when training completes:

predictor = fm.create_binary_classifier(
    target_column="churned",
    webhooks={
        "training_finished": "https://your-server.com/predictor-done"
    }
)

Waiting for Training

Predictor training typically takes 2-5 minutes:

predictor = fm.create_binary_classifier(target_column="churned")

predictor.wait_for_training(
    max_wait_time=600,      # Maximum wait: 10 minutes
    poll_interval=5,        # Check every 5 seconds
    show_progress=True      # Print progress updates
)

Predictor Attributes

After training completes:

print(predictor.id)              # Predictor ID
print(predictor.session_id)      # Parent Foundational Model session
print(predictor.target_column)   # Target column name
print(predictor.target_type)     # "set" (classification) or "numeric" (regression)
print(predictor.status)          # "done"
print(predictor.accuracy)        # Training accuracy
print(predictor.auc)             # ROC-AUC score (classification)
print(predictor.f1)              # F1 score (classification)

Listing Predictors

Get all predictors for a Foundational Model:

predictors = fm.list_predictors()
for p in predictors:
    print(f"{p.target_column}: accuracy={p.accuracy:.4f}, status={p.status}")

Continue Training

If you want to train more epochs:

predictor.train_more(epochs=50)
predictor.wait_for_training()

Training Suggestions

Get suggestions for improving performance:

# As structured data
suggestions = predictor.training_suggestions()
print(suggestions)

# As human-readable text
text_report = predictor.training_suggestions(as_text=True)
print(text_report)

Training Metrics

Get detailed training history:

metrics = predictor.get_metrics()
print(metrics)

Model Cards

Every trained model generates a comprehensive model card—a structured JSON document containing everything about the model: architecture details, training data statistics, performance metrics, quality checks, and column importance scores.

# Get the model card for a foundational model
model_card = fm.get_model_card()

# Model cards include:
# - Model architecture (layers, parameters, dimensions)
# - Training data statistics (rows, splits, column types)
# - Performance metrics (loss curves, validation results)
# - Quality checks (overfitting, collapse detection, gradient health)
# - Column importance scores

To render model cards as HTML reports, use the Featrix Model Card Renderer:

pip install featrix-model-card

# Render to HTML
featrix-model-card render model_card.json -o report.html

Multiple Predictors

You can create multiple predictors on the same Foundational Model:

# Churn predictor
churn_predictor = fm.create_binary_classifier(target_column="churned")

# Lifetime value predictor
ltv_predictor = fm.create_regressor(target_column="lifetime_value")

# Segment predictor
segment_predictor = fm.create_multi_classifier(target_column="customer_segment")

# Wait for all
churn_predictor.wait_for_training()
ltv_predictor.wait_for_training()
segment_predictor.wait_for_training()

Each predictor trains independently on the same foundation—no need to retrain the Foundational Model.

Best Practices

1. Exclude the Target from the Foundational Model

When creating the Foundational Model, exclude columns you'll predict:

fm = featrix.create_foundational_model(
    data_file="data.csv",
    ignore_columns=["churned", "lifetime_value"]  # Prediction targets
)

This prevents information leakage.

2. Use Cost-Sensitive Thresholds for Business Decisions

Don't just accept the default 0.5 threshold. Think about business costs:

predictor = fm.create_binary_classifier(
    target_column="should_call_customer",
    cost_false_positive=10,    # Cost of unnecessary call
    cost_false_negative=500    # Cost of missing an at-risk customer
)

3. Specify Rare Class for Imbalanced Data

For fraud detection, churn prediction, or any rare event:

predictor = fm.create_binary_classifier(
    target_column="is_fraud",
    rare_label_value="yes"  # Tell Featrix which is rare
)

4. Check Metrics Before Deploying

Always review accuracy, AUC, and F1 before using a predictor:

print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
print(f"F1: {predictor.f1:.4f}")

# Get suggestions if performance is low
if predictor.auc < 0.8:
    print(predictor.training_suggestions(as_text=True))

Next Steps