Skip to content

When to Use Featrix

Featrix is powerful, but it's not the right tool for every problem. This guide helps you decide when Featrix is a good fit and what to expect.

Featrix Excels When...

You Have Mixed Data Types

Numbers, categories, text, dates, URLs, emails, JSON—all in the same dataset. Featrix has specialized encoders for each type and learns how they interact. Traditional tools require you to manually encode each type differently.

You Have Many Columns

Anywhere from 10 to 500+ columns. The transformer architecture automatically discovers relationships between columns. More columns means more context for the model to learn from.

You Don't Have ML Expertise

All hyperparameters are computed from your data automatically. No learning rates to tune, no architectures to select, no regularization to configure. Upload data, get predictions.

You Need Results Fast

Training takes minutes to hours, not days or weeks. You get a production API endpoint immediately after training.

You're Doing Categorical Prediction

The semantic encoders handle unseen categories gracefully. "Senior Software Engineer" works even if only "Software Engineer" was in training, because the model understands they're semantically similar.

Consider Alternatives When...

You Have Very Little Data

Under 100 rows? Simple logistic regression or decision trees work better. Neural networks need data to learn from.

Dataset Size Recommendation
< 100 rows Use simple models (logistic regression, decision trees)
100-500 rows Featrix works, but accuracy may be limited
500-10,000 rows Excellent for Featrix
10,000+ rows Ideal—more data means better embeddings

You Have Audio or Video

For audio, use specialized audio models. For video, use video understanding models. Featrix is designed for tabular data and images.

Note: Image support is available in beta. Contact us for access.

You Need Pure Time Series Forecasting

For forecasting future values based purely on historical patterns (stock prices, weather), use ARIMA, Prophet, or LSTMs.

Note: Time series support is coming soon. Featrix already handles tabular data with temporal features (timestamps, dates) effectively—it just doesn't do sequence-to-sequence forecasting yet.

If you need to explain exactly why a model made a decision (for regulatory compliance, legal discovery, etc.), linear models with SHAP values are more interpretable than neural networks.

Your Problem is Very Simple

Single feature predicting single output? Linear regression is sufficient and more interpretable.

Performance Expectations

Training Time

The system targets a fixed number of optimizer updates, so smaller datasets need more epochs while larger datasets need fewer.

Dataset Size Columns Embedding Space Predictor
500 rows 10 5-10 min 1-2 min
1,000 rows 20 5-15 min 1-3 min
10,000 rows 30 15-30 min 3-7 min
100,000 rows 40 45-90 min 8-15 min
500,000 rows 50 2-4 hours 15-30 min

Factors that increase training time:

  • More columns
  • More unique categories per column (larger embedding tables)
  • JSON columns (require child embedding spaces)

Note: Text processing is fast thanks to aggressive caching of string embeddings.

Inference Latency

Method Latency Throughput
Single prediction 10-50ms 20-100 req/sec
Batch of 100 200-500ms 200-500 rec/sec
Batch of 1,000 1-3 sec 300-1000 rec/sec
Batch of 10,000 10-30 sec 300-1000 rec/sec
Similarity search (top 10) 50-200ms depends on index size

Accuracy Expectations

Accuracy depends on data quality and inherent predictability:

Data Quality Expected Accuracy
Clean, well-structured, strong signal 90-99%+
Real-world business data with some noise 80-95%
Messy data with many missing values 70-90%
Target is inherently unpredictable 50-70%

Class imbalance effects:

Imbalance Ratio System Response
1:1 to 3:1 Normal training
3:1 to 10:1 Automatic class weights
10:1 to 100:1 FocalLoss with aggressive weights
> 100:1 Minority class may have poor recall

Quick Decision Checklist

Question Featrix Fit
Tabular data? (rows and columns) Excellent
Images? Supported (beta)
JSON columns? Excellent (automatic nested handling)
Text columns? Excellent (cached BERT embeddings)
URLs, emails, dates? Excellent (specialized encoders)
Need embeddings for downstream use? Yes, get_embedding() API
Multiple column types mixed together? This is where Featrix shines
< 100 rows? Use simpler models
100-500 rows? Featrix works
500+ rows? Ideal for Featrix
Audio/video? Use specialized models