When to Use Featrix¶

Featrix is powerful, but it's not the right tool for every problem. This guide helps you decide when Featrix is a good fit and what to expect.

Featrix Excels When...¶

You Have Mixed Data Types¶

Numbers, categories, text, dates, URLs, emails, JSON—all in the same dataset. Featrix has specialized encoders for each type and learns how they interact. Traditional tools require you to manually encode each type differently.

You Have Many Columns¶

Anywhere from 10 to 500+ columns. The transformer architecture automatically discovers relationships between columns. More columns means more context for the model to learn from.

You Don't Have ML Expertise¶

All hyperparameters are computed from your data automatically. No learning rates to tune, no architectures to select, no regularization to configure. Upload data, get predictions.

You Need Results Fast¶

Training takes minutes to hours, not days or weeks. You get a production API endpoint immediately after training.

You're Doing Categorical Prediction¶

The semantic encoders handle unseen categories gracefully. "Senior Software Engineer" works even if only "Software Engineer" was in training, because the model understands they're semantically similar.

Consider Alternatives When...¶

You Have Very Little Data¶

Under 100 rows? Simple logistic regression or decision trees work better. Neural networks need data to learn from.

Dataset Size	Recommendation
< 100 rows	Use simple models (logistic regression, decision trees)
100-500 rows	Featrix works, but accuracy may be limited
500-10,000 rows	Excellent for Featrix
10,000+ rows	Ideal—more data means better embeddings

You Have Audio or Video¶

For audio, use specialized audio models. For video, use video understanding models. Featrix is designed for tabular data and images.

Note: Image support is available in beta. Contact us for access.

You Need Pure Time Series Forecasting¶

For forecasting future values based purely on historical patterns (stock prices, weather), use ARIMA, Prophet, or LSTMs.

Note: Time series support is coming soon. Featrix already handles tabular data with temporal features (timestamps, dates) effectively—it just doesn't do sequence-to-sequence forecasting yet.

You Require Legal Explainability¶

If you need to explain exactly why a model made a decision (for regulatory compliance, legal discovery, etc.), linear models with SHAP values are more interpretable than neural networks.

Your Problem is Very Simple¶

Single feature predicting single output? Linear regression is sufficient and more interpretable.

Performance Expectations¶

Training Time¶

The system targets a fixed number of optimizer updates, so smaller datasets need more epochs while larger datasets need fewer.

Dataset Size	Columns	Embedding Space	Predictor
500 rows	10	5-10 min	1-2 min
1,000 rows	20	5-15 min	1-3 min
10,000 rows	30	15-30 min	3-7 min
100,000 rows	40	45-90 min	8-15 min
500,000 rows	50	2-4 hours	15-30 min

Factors that increase training time:

More columns
More unique categories per column (larger embedding tables)
JSON columns (require child embedding spaces)

Note: Text processing is fast thanks to aggressive caching of string embeddings.

Inference Latency¶

Method	Latency	Throughput
Single prediction	10-50ms	20-100 req/sec
Batch of 100	200-500ms	200-500 rec/sec
Batch of 1,000	1-3 sec	300-1000 rec/sec
Batch of 10,000	10-30 sec	300-1000 rec/sec
Similarity search (top 10)	50-200ms	depends on index size

Accuracy Expectations¶

Accuracy depends on data quality and inherent predictability:

Data Quality	Expected Accuracy
Clean, well-structured, strong signal	90-99%+
Real-world business data with some noise	80-95%
Messy data with many missing values	70-90%
Target is inherently unpredictable	50-70%

Class imbalance effects:

Imbalance Ratio	System Response
1:1 to 3:1	Normal training
3:1 to 10:1	Automatic class weights
10:1 to 100:1	FocalLoss with aggressive weights
> 100:1	Minority class may have poor recall

Quick Decision Checklist¶

Question	Featrix Fit
Tabular data? (rows and columns)	Excellent
Images?	Supported (beta)
JSON columns?	Excellent (automatic nested handling)
Text columns?	Excellent (cached BERT embeddings)
URLs, emails, dates?	Excellent (specialized encoders)
Need embeddings for downstream use?	Yes, `get_embedding()` API
Multiple column types mixed together?	This is where Featrix shines
< 100 rows?	Use simpler models
100-500 rows?	Featrix works
500+ rows?	Ideal for Featrix
Audio/video?	Use specialized models