Working with Hidden Metadata Columns¶

Include data in your records that gets stored and returned with search results but never influences the model. Perfect for IDs, timestamps, descriptions, or any contextual information you need to retrieve but don't want the model to learn from.

The `__featrix_meta` Prefix¶

Any column starting with __featrix_meta is treated as metadata:

color,size,price,__featrix_meta_id,__featrix_meta_notes
red,small,10,SKU-001,Popular item
blue,large,20,SKU-002,New arrival
green,medium,15,SKU-003,Clearance

In this example:

color, size, price — trained features that influence embeddings
__featrix_meta_id, __featrix_meta_notes — stored but never trained on

Why Use Metadata Columns?¶

1. Keep IDs Out of Training¶

Record IDs, UUIDs, and transaction IDs are meaningless to the model—they're just noise. But you need them to identify records in your results.

customer_name,revenue,industry,__featrix_meta_customer_id
Acme Corp,500000,manufacturing,CUST-12345
TechStart,250000,software,CUST-67890

2. Preserve Timestamps¶

Timestamps for when records were created shouldn't affect similarity—a customer from January isn't inherently different from one in March. But you may want to know when they joined.

name,plan,monthly_spend,__featrix_meta_signup_date,__featrix_meta_last_login
Alice,premium,199,2024-01-15,2024-03-01
Bob,basic,29,2024-02-20,2024-03-02

3. Add Human-Readable Context¶

Include descriptions, notes, or display names that help you understand results without polluting the embedding space.

sku,category,price,__featrix_meta_display_name,__featrix_meta_warehouse_notes
WDG-001,tools,10.99,Deluxe Widget Set,Shelf A3 - high turnover
GDG-002,electronics,29.99,Smart Gadget Pro,Shelf B7 - fragile

How It Works¶

During Training: Metadata columns are automatically detected and excluded. They don't participate in type detection, encoding, or embedding generation.
In Vector Databases: Metadata is stored alongside embeddings in the vector database.
In Search Results: When you search, metadata columns come back with every matched record.

results = vdb.similarity_search(query, k=5)

for match in results:
    # Trained features
    print(match['record']['category'])
    print(match['record']['price'])

    # Metadata (stored but never trained on)
    print(match['record']['__featrix_meta_id'])
    print(match['record']['__featrix_meta_notes'])

Naming Rules¶

Valid metadata column names:

__featrix_meta_id
__featrix_meta_timestamp
__featrix_meta_anything_you_want

These won't work (not recognized as metadata):

__featrix_metaid — missing underscore after "meta"
_featrix_meta_id — single underscore at start
meta_id — doesn't have the required prefix

Example: Customer Similarity with Metadata¶

from featrixsphere.api import FeatrixSphere
import pandas as pd

# Data with metadata columns
data = pd.DataFrame({
    'industry': ['software', 'manufacturing', 'retail', 'software'],
    'employee_count': [50, 200, 100, 75],
    'annual_revenue': [5000000, 10000000, 3000000, 8000000],
    '__featrix_meta_company_id': ['C001', 'C002', 'C003', 'C004'],
    '__featrix_meta_account_manager': ['Alice', 'Bob', 'Alice', 'Carol'],
    '__featrix_meta_notes': ['Key account', 'Expanding', 'At risk', 'New customer']
})

featrix = FeatrixSphere()

# Metadata columns are automatically excluded from training
fm = featrix.create_foundational_model(
    name="customer_embeddings",
    df=data
)
fm.wait_for_training()

# Create vector database (metadata is stored)
vdb = fm.create_vector_database(name="customers")
vdb.add_records(data.to_dict('records'))

# Search returns metadata with results
similar = vdb.similarity_search({
    'industry': 'software',
    'employee_count': 60,
    'annual_revenue': 6000000
}, k=3)

for match in similar:
    print(f"Company: {match['record']['__featrix_meta_company_id']}")
    print(f"Manager: {match['record']['__featrix_meta_account_manager']}")
    print(f"Notes: {match['record']['__featrix_meta_notes']}")
    print(f"Similarity: {match['similarity']:.3f}")
    print()

vs. `ignore_columns`¶

Both exclude columns from training, but they serve different purposes:

	`ignore_columns`	`__featrix_meta_<name>`
Excluded from training	Yes	Yes
Stored in vector database	No	Yes
Returned in search results	No	Yes
Use for	Columns you don't need at all	Columns you need for context

Use ignore_columns for true throwaway columns. Use metadata columns for data you want to retrieve but not train on.