Skip to content

Working with Hidden Metadata Columns

Include data in your records that gets stored and returned with search results but never influences the model. Perfect for IDs, timestamps, descriptions, or any contextual information you need to retrieve but don't want the model to learn from.

The __featrix_meta Prefix

Any column starting with __featrix_meta is treated as metadata:

color,size,price,__featrix_meta_id,__featrix_meta_notes
red,small,10,SKU-001,Popular item
blue,large,20,SKU-002,New arrival
green,medium,15,SKU-003,Clearance

In this example:

  • color, size, price — trained features that influence embeddings
  • __featrix_meta_id, __featrix_meta_notes — stored but never trained on

Why Use Metadata Columns?

1. Keep IDs Out of Training

Record IDs, UUIDs, and transaction IDs are meaningless to the model—they're just noise. But you need them to identify records in your results.

customer_name,revenue,industry,__featrix_meta_customer_id
Acme Corp,500000,manufacturing,CUST-12345
TechStart,250000,software,CUST-67890

2. Preserve Timestamps

Timestamps for when records were created shouldn't affect similarity—a customer from January isn't inherently different from one in March. But you may want to know when they joined.

name,plan,monthly_spend,__featrix_meta_signup_date,__featrix_meta_last_login
Alice,premium,199,2024-01-15,2024-03-01
Bob,basic,29,2024-02-20,2024-03-02

3. Add Human-Readable Context

Include descriptions, notes, or display names that help you understand results without polluting the embedding space.

sku,category,price,__featrix_meta_display_name,__featrix_meta_warehouse_notes
WDG-001,tools,10.99,Deluxe Widget Set,Shelf A3 - high turnover
GDG-002,electronics,29.99,Smart Gadget Pro,Shelf B7 - fragile

How It Works

  1. During Training: Metadata columns are automatically detected and excluded. They don't participate in type detection, encoding, or embedding generation.

  2. In Vector Databases: Metadata is stored alongside embeddings in the vector database.

  3. In Search Results: When you search, metadata columns come back with every matched record.

results = vdb.similarity_search(query, k=5)

for match in results:
    # Trained features
    print(match['record']['category'])
    print(match['record']['price'])

    # Metadata (stored but never trained on)
    print(match['record']['__featrix_meta_id'])
    print(match['record']['__featrix_meta_notes'])

Naming Rules

Valid metadata column names:

  • __featrix_meta_id
  • __featrix_meta_timestamp
  • __featrix_meta_anything_you_want

These won't work (not recognized as metadata):

  • __featrix_metaid — missing underscore after "meta"
  • _featrix_meta_id — single underscore at start
  • meta_id — doesn't have the required prefix

Example: Customer Similarity with Metadata

from featrixsphere.api import FeatrixSphere
import pandas as pd

# Data with metadata columns
data = pd.DataFrame({
    'industry': ['software', 'manufacturing', 'retail', 'software'],
    'employee_count': [50, 200, 100, 75],
    'annual_revenue': [5000000, 10000000, 3000000, 8000000],
    '__featrix_meta_company_id': ['C001', 'C002', 'C003', 'C004'],
    '__featrix_meta_account_manager': ['Alice', 'Bob', 'Alice', 'Carol'],
    '__featrix_meta_notes': ['Key account', 'Expanding', 'At risk', 'New customer']
})

featrix = FeatrixSphere()

# Metadata columns are automatically excluded from training
fm = featrix.create_foundational_model(
    name="customer_embeddings",
    df=data
)
fm.wait_for_training()

# Create vector database (metadata is stored)
vdb = fm.create_vector_database(name="customers")
vdb.add_records(data.to_dict('records'))

# Search returns metadata with results
similar = vdb.similarity_search({
    'industry': 'software',
    'employee_count': 60,
    'annual_revenue': 6000000
}, k=3)

for match in similar:
    print(f"Company: {match['record']['__featrix_meta_company_id']}")
    print(f"Manager: {match['record']['__featrix_meta_account_manager']}")
    print(f"Notes: {match['record']['__featrix_meta_notes']}")
    print(f"Similarity: {match['similarity']:.3f}")
    print()

vs. ignore_columns

Both exclude columns from training, but they serve different purposes:

ignore_columns __featrix_meta_<name>
Excluded from training Yes Yes
Stored in vector database No Yes
Returned in search results No Yes
Use for Columns you don't need at all Columns you need for context

Use ignore_columns for true throwaway columns. Use metadata columns for data you want to retrieve but not train on.