Skip to content

Featrix Sphere Model Card JSON Specification

This document describes the structure and meaning of the model card JSON returned by the Featrix Sphere API. Model cards are automatically generated during training and can be retrieved via GET /session/{id}/model_card.

Rendering Model Cards

To render model cards as HTML reports, use the Featrix Model Card Renderer:

pip install featrix-model-card

# Render a model card JSON file to HTML
featrix-model-card render model_card.json -o report.html

Overview

Model cards provide comprehensive metadata about trained models, including training configuration, performance metrics, data characteristics, and quality assessments. They are designed to be human-readable and suitable for rendering in UI components.

JSON Structure

Root Level Fields

The model card JSON contains the following top-level sections:

  • model_identification - Basic model metadata
  • training_dataset - Information about the training data
  • feature_inventory - Detailed information about each input feature
  • training_configuration - Hyperparameters and training settings
  • training_metrics - Performance metrics from training
  • model_architecture - Neural network structure details
  • model_quality - Quality assessments and warnings
  • technical_details - Technical implementation details
  • provenance - Creation metadata and timing
  • column_statistics - Per-column performance statistics (Embedding Space only)

Section: model_identification

Purpose: Basic identifying information about the model.

Field Type Description Example
session_id string Unique session identifier for this training run "public-alphafreight-mini-8c482fa5-1304-442d-8875-4263d5bf79d6"
job_id string Unique job identifier within the session "cadab2-20251118-010809"
name string Human-readable model name "alphafreight-mini"
target_column string | null Name of the target column being predicted (Single Predictor only) "has_fuel_card_Comdata"
target_column_type string | null Type of target: "set" (classification) or "scalar" (regression) (Single Predictor only) "set"
compute_cluster string Compute node where training occurred "BURRITO"
training_date string Date of training completion (YYYY-MM-DD) "2025-11-18"
status string Training status: "DONE", "TRAINING", "FAILED" "DONE"
model_type string Type of model: "Embedding Space" or "Single Predictor" "Single Predictor"
framework string Framework version string "FeatrixSphere v0.2.968"

Section: training_dataset

Purpose: Information about the dataset used for training.

Field Type Description Example
train_rows integer Number of training samples 431
val_rows integer Number of validation samples 108
total_rows integer Total samples (train + val) 539
total_features integer Number of input features 15
feature_names array[string] List of all input feature column names ["fleet_size", "annual_revenue", ...]
target_column string | null Target column name (Single Predictor only) "has_fuel_card_Comdata"

Section: feature_inventory

Purpose: Detailed information about each input feature/column.

Type: Array of feature objects. Each feature object contains:

Field Type Description Example
name string Column/feature name "fleet_size"
type string Data type: "scalar", "set", "free_string", "json" "scalar"
encoder_type string Encoder class name "ScalarCodec"
column_importance object Column importance metadata (weight, reason, description) See below
unique_values integer | null Number of unique values (set types only) 8
sample_values array | null Sample of unique values (set types only, max 5) ["long_haul", "local_delivery", ...]
statistics object | null Statistical summary (scalar types only) See below

column_importance Object

Field Type Description Example
weight float Importance weight assigned to the column (0.0 = excluded, 1.0 = normal) 0.0 or 1.0
reason string Reason for importance assignment See below
description string Human-readable explanation of importance decision "Column contains random/meaningless strings..."
confidence float | null Confidence score for detection (random columns only) 0.97
unique_ratio float | null Ratio of unique values (random columns only) 0.98
semantic_similarity float | null Average semantic similarity score (random columns only) 0.15
average_loss float | null Average reconstruction loss (pruned columns only) 2.45
pruning_method string | null Method used for pruning (pruned columns only) "progressive_worst_performers"

reason Values

Reason Weight When Applied Description
"included_in_training" 1.0 Default Column is included in model training with normal importance
"random_strings_detected" 0.0 At initialization Column contains random/meaningless strings (UUIDs, hashes, transaction IDs) detected before training
"pruned_during_training" 0.0 During training Column was dynamically pruned during training due to poor performance (high reconstruction loss)

Note: - Columns with weight: 0.0 contribute zero information to the model - Random columns ("random_strings_detected") are detected at initialization and never used - Pruned columns ("pruned_during_training") are disabled during training at 10% and 20% progress milestones based on worst reconstruction losses - The model creates minimal zero-output encoders for excluded columns to maintain compatibility with the dataset schema

statistics Object (for scalar types)

Field Type Description
min float Minimum value in dataset
max float Maximum value in dataset
mean float Mean/average value
std float Standard deviation
median float Median value

Section: training_configuration

Purpose: Hyperparameters and training settings.

Field Type Description Example
epochs_total integer Total number of training epochs 32
best_epoch integer Epoch with best validation performance 28
d_model integer Embedding dimension size 512
batch_size integer | null Training batch size 64
learning_rate float | null Learning rate used 0.001
optimizer string Optimizer name (typically "Adam") "Adam"
dropout_schedule object | null Dropout configuration (Embedding Space only) See below

dropout_schedule Object (Embedding Space only)

Field Type Description
enabled boolean Whether dropout was used
initial float Initial dropout rate
final float Final dropout rate

Section: training_metrics

Purpose: Performance metrics from training.

For Single Predictors:

Field Type Description Example
best_epoch object Metrics from the best epoch See below
classification_metrics object | null Classification metrics (set targets) See below
optimal_threshold object | null Optimal threshold info (binary classification) See below
argmax_metrics object | null Metrics using argmax prediction See below

best_epoch Object

Field Type Description
epoch integer Epoch number
validation_loss float Validation loss at this epoch
train_loss float Training loss at this epoch

classification_metrics Object

Field Type Description Range
accuracy float | null Classification accuracy 0.0 - 1.0
precision float | null Precision score 0.0 - 1.0
recall float | null Recall score 0.0 - 1.0
f1 float | null F1 score 0.0 - 1.0
auc float | null Area Under ROC Curve 0.0 - 1.0
is_binary boolean Whether this is binary classification true/false

optimal_threshold Object (Binary Classification)

Field Type Description
optimal_threshold float Threshold that maximizes F1 score
pos_label string | null Label considered "positive"
optimal_threshold_f1 float F1 score at optimal threshold
accuracy_at_optimal_threshold float Accuracy at optimal threshold

argmax_metrics Object (Multi-class)

Field Type Description
accuracy float Accuracy using argmax prediction
precision float Precision using argmax
recall float Recall using argmax
f1 float F1 score using argmax

For Embedding Spaces:

Field Type Description
best_epoch object Metrics from best epoch (see below)
final_epoch object Metrics from final epoch (see below)
loss_progression object Training improvement metrics (see below)

best_epoch Object (Embedding Space)

Field Type Description
epoch integer Epoch number
train_loss float Training loss
val_loss float Validation loss
spread_loss float | null Spread loss component
joint_loss float | null Joint loss component
marginal_loss float | null Marginal loss component

loss_progression Object

Field Type Description
initial_train float Initial training loss
initial_val float Initial validation loss
improvement_pct float | null Percentage improvement from initial to final

Section: model_architecture

Purpose: Neural network structure information.

Field Type Description Example
predictor_layers integer | null Number of layers in predictor network (Single Predictor only) 5
predictor_parameters integer | null Total number of trainable parameters (Single Predictor only) 264925317
embedding_space_d_model integer | null Embedding dimension (Single Predictor only) 512

Section: model_quality

Purpose: Quality assessments, warnings, and recommendations.

Field Type Description
assessment string | null Overall quality assessment: "EXCELLENT", "GOOD", "FAIR", "POOR", "UNKNOWN" (Embedding Space only)
recommendations array | null List of recommendation objects (Embedding Space only)
warnings array List of warning objects
training_quality_warning string | null Overall training quality warning message (Single Predictor only)

recommendations Array Items (Embedding Space)

Field Type Description
issue string Description of the issue detected
suggestion string Suggested action to address the issue

warnings Array Items

Field Type Description
type string Warning type: "DISTRIBUTION_SHIFT", "CLASS_IMBALANCE", etc.
severity string Severity level: "HIGH", "MODERATE", "LOW"
message string Human-readable warning message
details object | null Additional warning details (structure varies by type)
recommendation string | null Recommended action

Example details Object (DISTRIBUTION_SHIFT)

Field Type Description
threshold float KL divergence threshold used
affected_columns array List of affected column objects

Affected Column Object:

Field Type Description
column string Column name
kl_divergence float KL divergence value
interpretation string "HIGH" or "MODERATE"

Section: technical_details

Purpose: Technical implementation details.

Field Type Description Example
pytorch_version string PyTorch version used "2.1.0"
device string Training device: "GPU" or "CPU" "GPU"
precision string Numerical precision "float32"
normalization string | null Normalization method (Embedding Space only) "unit_sphere"
loss_function string Loss function used "CrossEntropyLoss" or "MSELoss" or "InfoNCE (contrastive)"

Section: provenance

Purpose: Creation metadata and timing information.

Field Type Description Example
created_at string ISO 8601 timestamp of model card creation "2025-11-18T01:08:09.123456"
training_duration_minutes float | null Total training time in minutes 45.2
version_info object | null Version information dict (Embedding Space only) Varies

Section: column_statistics (Embedding Space Only)

Purpose: Per-column performance statistics for embedding spaces.

Type: Object mapping column names to statistics objects.

Statistics Object

Field Type Description
mutual_information_bits float | null Estimated mutual information in bits
marginal_loss float | null Marginal reconstruction loss for this column

Rendering Guidelines

Visual Hierarchy

  1. Header Section: Display model_identification prominently with model name, type, and status
  2. Metrics Dashboard: Show key metrics from training_metrics (accuracy, F1, AUC for classifiers; loss for embedding spaces)
  3. Data Summary: Display training_dataset info (row counts, feature count)
  4. Feature List: Render feature_inventory as a searchable/filterable table
  5. Quality Indicators: Show model_quality warnings and recommendations prominently
  6. Technical Details: Collapsible section for technical_details and provenance

Color Coding

  • Status: Green for "DONE", Yellow for "TRAINING", Red for "FAILED"
  • Quality Assessment:
  • "EXCELLENT": Green
  • "GOOD": Blue
  • "FAIR": Yellow
  • "POOR": Orange
  • "UNKNOWN": Gray
  • Warning Severity:
  • "HIGH": Red
  • "MODERATE": Yellow
  • "LOW": Blue

Interactive Elements

  • Feature Inventory: Allow filtering by type, sorting by name
  • Metrics: Show tooltips explaining each metric
  • Warnings: Expandable details sections
  • Training Timeline: Visualize loss progression if loss_progression data available

Responsive Design

  • Mobile: Stack sections vertically, collapse technical details
  • Desktop: Multi-column layout with metrics sidebar
  • Tablet: Hybrid layout

Example Use Cases

1. Model Comparison View

Compare multiple models side-by-side using: - model_identification.name - training_metrics (best epoch metrics) - model_quality.assessment - training_dataset.total_rows

2. Feature Importance Visualization

Use column_statistics.mutual_information_bits to create a bar chart showing which features are most informative.

3. Training Quality Dashboard

Display model_quality.warnings and model_quality.recommendations to help users understand model limitations.

4. Export/Share Model Card

Generate PDF or HTML export using all sections for documentation purposes.


Notes

  • All numeric values may be null if not available
  • String fields may be null if not applicable
  • Arrays may be empty [] if no items
  • Objects may be null if section doesn't apply to model type
  • Timestamps are ISO 8601 format
  • Percentages are represented as decimals (0.0 - 1.0), not percentages (0-100)