Skip to content

Technical Deep Dives

Under the hood, Featrix solves dozens of hard problems that would take months to debug yourself. This section explains what we do and why it matters.

Chapters

# Document What It Covers
01 How Foundational Models Work The complete architecture—how Featrix learns from your data without labels
02 Limited Labels & Background Data How to get great results even when only 5% of your data has labels
03 Automatic Type Handling How we handle mixed types, messy data, and columns that don't fit neat categories
04 Column Interactions How different column types (numeric, categorical, text) interact and influence each other
05 Prediction Safety Guardrails, calibration, and warnings that prevent silent failures
06 Cost-Sensitive Classification Bayes-optimal thresholds when false positives and false negatives have different costs
07 Extending Trained Models Adding new columns to an existing model without retraining from scratch
08 List Embeddings How we handle multi-valued categorical features (tags, categories, skills)
09 Deep Relationship Discovery Dates, geography, distances, automatic enrichment, nested JSON, linking
10 Training Safety How Featrix detects and recovers from training failures automatically

Why This Matters

Traditional ML requires you to make hundreds of decisions—and most teams get them wrong. Featrix makes these decisions automatically, using techniques that would take months to implement and tune yourself.

Examples of what we handle:

  • Numeric columns with outliers: We try 20+ normalization strategies and pick the best one
  • Categories with thousands of values: Learned embeddings that capture semantic relationships
  • Text fields: Semantic encoders that understand "cancelled" and "canceled" are the same
  • Missing data: Learned null tokens that preserve uncertainty instead of imputing fake values
  • Class imbalance: Automatic detection and cost-weighted training
  • Mixed-type columns: Columns that are sometimes numbers, sometimes text—we handle both
  • Related columns: Address components, lat/long pairs—we detect and encode them together
  • Cost-sensitive decisions: When a false negative costs 10x more than a false positive, we adjust thresholds automatically
  • Limited labels: Use all your data, not just the labeled portion
  • Training failures: Gradient explosions, embedding collapse, memory exhaustion—all handled automatically

For Different Audiences

Data Scientists: Deep technical details on architecture, loss functions, and training dynamics.

ML Engineers: Production considerations, performance characteristics, and integration patterns.

Technical Evaluators: Evidence that Featrix solves real problems, not just demos.

Rendered Versions

For offline reading: - Featrix_Sphere_Technical_Reference.pdf - Featrix_Sphere_Technical_Reference.html