Technical Deep Dives¶

Under the hood, Featrix solves dozens of hard problems that would take months to debug yourself. This section explains what we do and why it matters.

Chapters¶

#	Document	What It Covers
01	How Foundational Models Work	The complete architecture—how Featrix learns from your data without labels
02	Limited Labels & Background Data	How to get great results even when only 5% of your data has labels
03	Automatic Type Handling	How we handle mixed types, messy data, and columns that don't fit neat categories
04	Column Interactions	How different column types (numeric, categorical, text) interact and influence each other
05	Prediction Safety	Guardrails, calibration, and warnings that prevent silent failures
06	Cost-Sensitive Classification	Bayes-optimal thresholds when false positives and false negatives have different costs
07	Extending Trained Models	Adding new columns to an existing model without retraining from scratch
08	List Embeddings	How we handle multi-valued categorical features (tags, categories, skills)
09	Deep Relationship Discovery	Dates, geography, distances, automatic enrichment, nested JSON, linking
10	Training Safety	How Featrix detects and recovers from training failures automatically

Why This Matters¶

Traditional ML requires you to make hundreds of decisions—and most teams get them wrong. Featrix makes these decisions automatically, using techniques that would take months to implement and tune yourself.

Examples of what we handle:

Numeric columns with outliers: We try 20+ normalization strategies and pick the best one
Categories with thousands of values: Learned embeddings that capture semantic relationships
Text fields: Semantic encoders that understand "cancelled" and "canceled" are the same
Missing data: Learned null tokens that preserve uncertainty instead of imputing fake values
Class imbalance: Automatic detection and cost-weighted training
Mixed-type columns: Columns that are sometimes numbers, sometimes text—we handle both
Related columns: Address components, lat/long pairs—we detect and encode them together
Cost-sensitive decisions: When a false negative costs 10x more than a false positive, we adjust thresholds automatically
Limited labels: Use all your data, not just the labeled portion
Training failures: Gradient explosions, embedding collapse, memory exhaustion—all handled automatically

For Different Audiences¶

Data Scientists: Deep technical details on architecture, loss functions, and training dynamics.

ML Engineers: Production considerations, performance characteristics, and integration patterns.

Technical Evaluators: Evidence that Featrix solves real problems, not just demos.

Rendered Versions¶

For offline reading: - Featrix_Sphere_Technical_Reference.pdf - Featrix_Sphere_Technical_Reference.html

User Guide: ../user-guide/ - How to use Featrix
API Reference: ../agents-api/ - Complete API documentation
Why Featrix: ../why-featrix/ - How Featrix solves traditional ML problems

Technical Deep Dives¶

Chapters¶

Why This Matters¶

For Different Audiences¶

Rendered Versions¶

Related Documentation¶