When to Use Featrix¶
Featrix is powerful, but it's not the right tool for every problem. This guide helps you decide when Featrix is a good fit and what to expect.
Featrix Excels When...¶
You Have Mixed Data Types¶
Numbers, categories, text, dates, URLs, emails, JSON—all in the same dataset. Featrix has specialized encoders for each type and learns how they interact. Traditional tools require you to manually encode each type differently.
You Have Many Columns¶
Anywhere from 10 to 500+ columns. The transformer architecture automatically discovers relationships between columns. More columns means more context for the model to learn from.
You Don't Have ML Expertise¶
All hyperparameters are computed from your data automatically. No learning rates to tune, no architectures to select, no regularization to configure. Upload data, get predictions.
You Need Results Fast¶
Training takes minutes to hours, not days or weeks. You get a production API endpoint immediately after training.
You're Doing Categorical Prediction¶
The semantic encoders handle unseen categories gracefully. "Senior Software Engineer" works even if only "Software Engineer" was in training, because the model understands they're semantically similar.
Consider Alternatives When...¶
You Have Very Little Data¶
Under 100 rows? Simple logistic regression or decision trees work better. Neural networks need data to learn from.
| Dataset Size | Recommendation |
|---|---|
| < 100 rows | Use simple models (logistic regression, decision trees) |
| 100-500 rows | Featrix works, but accuracy may be limited |
| 500-10,000 rows | Excellent for Featrix |
| 10,000+ rows | Ideal—more data means better embeddings |
You Have Audio or Video¶
For audio, use specialized audio models. For video, use video understanding models. Featrix is designed for tabular data and images.
Note: Image support is available in beta. Contact us for access.
You Need Pure Time Series Forecasting¶
For forecasting future values based purely on historical patterns (stock prices, weather), use ARIMA, Prophet, or LSTMs.
Note: Time series support is coming soon. Featrix already handles tabular data with temporal features (timestamps, dates) effectively—it just doesn't do sequence-to-sequence forecasting yet.
You Require Legal Explainability¶
If you need to explain exactly why a model made a decision (for regulatory compliance, legal discovery, etc.), linear models with SHAP values are more interpretable than neural networks.
Your Problem is Very Simple¶
Single feature predicting single output? Linear regression is sufficient and more interpretable.
Performance Expectations¶
Training Time¶
The system targets a fixed number of optimizer updates, so smaller datasets need more epochs while larger datasets need fewer.
| Dataset Size | Columns | Embedding Space | Predictor |
|---|---|---|---|
| 500 rows | 10 | 5-10 min | 1-2 min |
| 1,000 rows | 20 | 5-15 min | 1-3 min |
| 10,000 rows | 30 | 15-30 min | 3-7 min |
| 100,000 rows | 40 | 45-90 min | 8-15 min |
| 500,000 rows | 50 | 2-4 hours | 15-30 min |
Factors that increase training time:
- More columns
- More unique categories per column (larger embedding tables)
- JSON columns (require child embedding spaces)
Note: Text processing is fast thanks to aggressive caching of string embeddings.
Inference Latency¶
| Method | Latency | Throughput |
|---|---|---|
| Single prediction | 10-50ms | 20-100 req/sec |
| Batch of 100 | 200-500ms | 200-500 rec/sec |
| Batch of 1,000 | 1-3 sec | 300-1000 rec/sec |
| Batch of 10,000 | 10-30 sec | 300-1000 rec/sec |
| Similarity search (top 10) | 50-200ms | depends on index size |
Accuracy Expectations¶
Accuracy depends on data quality and inherent predictability:
| Data Quality | Expected Accuracy |
|---|---|
| Clean, well-structured, strong signal | 90-99%+ |
| Real-world business data with some noise | 80-95% |
| Messy data with many missing values | 70-90% |
| Target is inherently unpredictable | 50-70% |
Class imbalance effects:
| Imbalance Ratio | System Response |
|---|---|
| 1:1 to 3:1 | Normal training |
| 3:1 to 10:1 | Automatic class weights |
| 10:1 to 100:1 | FocalLoss with aggressive weights |
| > 100:1 | Minority class may have poor recall |
Quick Decision Checklist¶
| Question | Featrix Fit |
|---|---|
| Tabular data? (rows and columns) | Excellent |
| Images? | Supported (beta) |
| JSON columns? | Excellent (automatic nested handling) |
| Text columns? | Excellent (cached BERT embeddings) |
| URLs, emails, dates? | Excellent (specialized encoders) |
| Need embeddings for downstream use? | Yes, get_embedding() API |
| Multiple column types mixed together? | This is where Featrix shines |
| < 100 rows? | Use simpler models |
| 100-500 rows? | Featrix works |
| 500+ rows? | Ideal for Featrix |
| Audio/video? | Use specialized models |