Skip to content

Troubleshooting and FAQ

Training Issues

Training is stuck at the same loss

The learning rate may be too low. The system automatically boosts the learning rate 3x after 15 epochs of no progress. If you see "NO_LEARNING" in the logs, the system has blocked early stopping for 10 additional epochs to give the model more time.

What to check: Your data quality. Is there actually signal to learn? Do you have enough rows? Are important columns mostly null?

Validation loss is increasing while training loss decreases

The model is overfitting—memorizing the training data instead of learning generalizable patterns. The automatic dropout scheduling (0.5 down to 0.25) should handle this, but you may need more training data.

Training takes forever

Possible causes:

  • Too many columns (especially text columns requiring BERT processing)
  • Too many unique categories in categorical columns
  • JSON columns that require child embedding spaces

What to do: Exclude high-cardinality columns you don't actually need for prediction.

Out of memory (OOM) errors

The batch is too large for the GPU. The system automatically retries with half the batch size, up to 3 times. If it still fails, contact support.

"DEAD_NETWORK" detected

Gradients are zero and nothing is learning. This usually indicates a data problem—check for all-constant columns or other data quality issues.

Prediction Issues

All predictions are the same class

Possible causes:

  • Severe class imbalance (the model learned to always predict the majority class)
  • Embedding collapse (all embeddings ended up identical)

What to check: Training metrics. Did the model actually learn anything? What's the validation accuracy?

Every prediction has very low confidence

The model didn't converge. Check if training loss is still decreasing—you may need more epochs.

"Unknown column" warnings

You're including columns in your query that weren't in the training data. Only use columns that were present during training.

Check the prediction response for ignored_query_columns to see which columns were ignored.

Predictions don't make sense

Possible data encoding issue. Check:

  • Were column types detected correctly?
  • Do you need column overrides for things like zip codes (categorical, not numeric)?

Automatic Interventions

The system detects and corrects many problems automatically:

Problem Detected Automatic Response
No learning (< 0.5% improvement over 5 epochs) Boost learning rate 3x for 20 epochs
Still stuck after LR boost Boost temperature 2x
Reverse class bias (minority predicted > 2x actual frequency) Reduce FocalLoss gamma
Gradient explosion Apply adaptive clipping
GPU memory exhausted Retry with 50% batch size

Diagnosing Poor Accuracy

1. Check data quality

  • Are there many null values? (Columns with > 50% nulls are auto-excluded)
  • Inconsistent formats? (e.g., dates as both "01/15/2024" and "2024-01-15")
  • Data leakage? (Target information accidentally in the features)

2. Check class balance

Look at value counts for your target. If one class is > 95% of the data, minority class recall may suffer.

3. Visualize the embeddings

Use the notebook helper to view the 3D embedding space colored by your target:

notebook = featrix.notebook()
fig = notebook.embedding_space_3d(fm, color_by="your_target")
fig.show()
  • Classes form separate clusters → Embedding space is good, predictor issue
  • Everything collapsed together → Embedding space didn't learn useful structure

4. Check per-class metrics

Overall accuracy can be misleading. One class might have 0% recall (the model never predicts it).

5. Try more data

Embedding spaces generally improve with more data, especially for minority classes.


FAQ

How is this different from scikit-learn or XGBoost?

Featrix is end-to-end. It handles:

  • Feature engineering: 20+ encoding strategies, automatic type detection
  • Model architecture: Transformers with attention
  • Hyperparameter tuning: All values computed from your data
  • Deployment: Production API endpoint included

Traditional tools require you to make all these decisions yourself.

Do I need GPUs?

No. The Featrix API handles all compute on our servers. You just need Python (or any HTTP client) to call the API.

What languages can I use?

Any language that can make HTTP requests. The Python client is most convenient, but the REST API works from JavaScript, Java, Go, or anything else.

How does it handle missing values?

Each encoder has a learned "replacement embedding" for missing values. The model learns what missingness means in context—a missing value in one column might have a very different meaning than a missing value in another.

How does it handle new/unseen categories?

The categorical encoder uses BERT semantic embeddings as a fallback. "Senior Software Engineer" works even if only "Software Engineer" was in training, because BERT understands they're semantically similar.

Is my data secure?

Data is encrypted in transit (HTTPS) and at rest on our servers. Contact Featrix for SOC2 compliance documentation.

Can I delete my data?

Yes. Contact support to request full data deletion.

Can I export the trained model?

Currently, models are accessed via API. Enterprise customers can discuss self-hosted deployment options.

How long are models stored?

Contact Featrix for the current retention policy. Save your session IDs securely—they're needed to access your models.

Can I retrain with new data?

Currently, you create a new session with updated data. Incremental training is on the roadmap.

What's the maximum dataset size?

The system has been tested with millions of rows. For very large datasets (> 10 million rows), contact support for guidance on batching strategies.

Why does embedding space training take longer than predictor training?

Embedding space training learns relationships between all columns using self-supervised learning (no labels needed). This is computationally intensive. Predictor training just maps the already-computed embeddings to your target—a much simpler task.

What embedding dimension is used?

Default is 128 dimensions, automatically scaling to 256-512 for complex datasets. The 3D visualization projects the full embedding down to 3 dimensions.