Featrix Troubleshooting Guide¶
Common issues and solutions for AI agents.
Training Issues¶
Model stuck in "training" status¶
Check progress:
Common causes:
- Large dataset (normal, wait longer)
- Network issues (retry)
- Server issues (check featrix.health_check())
Solution:
Training failed with error status¶
Check:
fm = featrix.foundational_model("session-id")
info = fm.refresh()
print(info.get('error')) # Error message
Common causes: - Empty dataset - All columns are ID-like (no learnable patterns) - Data format issues
"Column not found" error¶
Cause: Target column doesn't exist in data.
Solution:
# Check available columns
print(fm.columns)
# Verify target exists
if "is_fraud" in fm.columns:
predictor = fm.create_binary_classifier(target_column="is_fraud")
Predictor has low accuracy¶
Check metrics:
print(f"Accuracy: {predictor.accuracy}")
print(f"AUC: {predictor.auc}")
metrics = predictor.get_metrics()
print(metrics)
Common causes: - Target column has no predictive signal - Target column is an ID (random, unpredictable) - Extreme class imbalance
Solutions:
# For imbalanced data, specify the rare class
predictor = fm.create_binary_classifier(
target_column="is_fraud",
rare_label_value="fraud" # The minority class
)
# Specify production class distribution
predictor = fm.create_binary_classifier(
target_column="is_fraud",
class_imbalance={"fraud": 0.01, "not_fraud": 0.99}
)
Prediction Issues¶
Empty or null predictions¶
Check:
result = predictor.predict(record)
print(f"Predicted: {result.predicted_class}")
print(f"Probabilities: {result.probabilities}")
print(f"Guardrails: {result.guardrails}")
Common causes: - Input record missing required columns - Input values far outside training distribution
Check for column issues:
print(f"Ignored columns: {result.ignored_query_columns}")
print(f"Available columns: {result.available_query_columns}")
Guardrails warnings¶
Meaning: Input values are unusual (out-of-distribution).
result = predictor.predict(record)
if result.guardrails:
for col, warning in result.guardrails.items():
print(f"Warning: {col} - {warning}")
Types of warnings:
- out_of_range - Numeric value outside training range
- unknown_category - Categorical value never seen in training
- missing_value - Column is null/missing
Solution: Predictions may still be valid, but use caution. Consider: - Flagging for human review - Using a fallback/default - Retraining with more diverse data
Wrong columns being used¶
Check what model knows:
print(fm.columns)
model_card = fm.get_model_card()
print("Excluded:", model_card.get('excluded_columns'))
Check what you're sending:
record = {"col1": "value", "col2": 123, "unknown_col": "x"}
result = predictor.predict(record)
print("Ignored:", result.ignored_query_columns) # ["unknown_col"]
Confidence always low or always high¶
Possible causes: - Model undertrained (train more epochs) - Model overfit (less epochs, more regularization) - Data has no real signal
Check training metrics:
metrics = fm.get_training_metrics()
# Look for validation loss diverging from training loss (overfitting)
API/Connection Issues¶
Cannot connect to Featrix¶
try:
result = featrix.health_check()
print(result)
except Exception as e:
print(f"Connection failed: {e}")
Solutions: - Check internet connection - Verify API URL is correct - Check firewall/proxy settings
Timeout errors¶
For long training:
For predictions:
# Batch predictions have implicit timeout, retry on failure
import time
for attempt in range(3):
try:
results = predictor.batch_predict(records)
break
except TimeoutError:
time.sleep(5)
Rate limiting¶
Symptom: HTTP 429 errors
Solution: Add delays between requests
Or use batch predictions:
Data Issues¶
Data file not found¶
from pathlib import Path
file_path = Path("data.csv")
if not file_path.exists():
print(f"File not found: {file_path.absolute()}")
# Use absolute path to be safe
fm = featrix.create_foundational_model(data_file=str(file_path.absolute()))
Data format issues¶
Supported formats: CSV, Parquet, JSON
For CSV issues:
import pandas as pd
# Load and inspect
df = pd.read_csv("data.csv")
print(df.dtypes)
print(df.head())
# Use DataFrame directly if file loading is problematic
fm = featrix.create_foundational_model(df=df)
Too many columns / too few rows¶
Recommendations: - Minimum: 100 rows for meaningful training - Ideal: 1000+ rows - Max columns: No hard limit, but consider ignoring irrelevant ones
# Ignore columns that won't help
fm = featrix.create_foundational_model(
data_file="data.csv",
ignore_columns=["id", "uuid", "timestamp", "notes", "internal_code"]
)
Resume / Recovery¶
Find existing models¶
# List all sessions
sessions = featrix.list_sessions()
print(sessions)
# Filter by name
sessions = featrix.list_sessions(name_prefix="fraud_model")
Resume a model¶
# By session ID
fm = featrix.foundational_model("20250115-143022_abc123")
print(fm.status)
# If training, wait for it
if fm.status == "training":
fm.wait_for_training()
Find predictors for a model¶
fm = featrix.foundational_model("session-id")
predictors = fm.list_predictors()
for p in predictors:
print(f"{p.id}: {p.target_column} - {p.status}")
Common Mistakes¶
1. Not waiting for training¶
Wrong:
fm = featrix.create_foundational_model(data_file="data.csv")
predictor = fm.create_binary_classifier(target_column="target") # FAILS - FM not ready
Right:
fm = featrix.create_foundational_model(data_file="data.csv")
fm.wait_for_training() # Wait!
predictor = fm.create_binary_classifier(target_column="target")
predictor.wait_for_training() # Wait again!
2. Forgetting to specify rare class¶
For imbalanced binary classification:
# Without rare_label_value, may get poor recall on minority class
predictor = fm.create_binary_classifier(
target_column="is_fraud",
rare_label_value="fraud" # Always specify for imbalanced data
)
3. Including ID columns¶
Wrong:
fm = featrix.create_foundational_model(data_file="data.csv")
# If data has customer_id, transaction_id etc., model may learn to "memorize" instead of generalize
Right:
fm = featrix.create_foundational_model(
data_file="data.csv",
ignore_columns=["customer_id", "transaction_id", "row_id"]
)
4. Not saving prediction UUIDs¶
For feedback, you MUST save the UUID:
result = predictor.predict(record)
# Save this somewhere (database, log, etc.)
prediction_uuid = result.prediction_uuid
# Later, when you know the truth
featrix.prediction_feedback(prediction_uuid, ground_truth="actual_value")
5. Batch vs single prediction for many records¶
Slow:
Fast: