Use Case: Customer Churn Prediction¶
Predict which customers are likely to cancel their subscription or stop using your service.
When to Use This¶
- Subscription businesses (SaaS, media, telecom)
- Customer retention programs
- Proactive support targeting
- Marketing budget allocation
Complete Implementation¶
from featrixsphere.api import FeatrixSphere
featrix = FeatrixSphere()
# 1. Create Foundational Model from customer data
fm = featrix.create_foundational_model(
name="customer_churn_model",
data_file="customers.csv",
ignore_columns=["customer_id", "signup_date", "email"] # Exclude IDs and PII
)
fm.wait_for_training()
print(f"Foundational Model trained: {fm.dimensions} dimensions")
# 2. Create cost-sensitive binary classifier
# - Missing a churner costs $500 (lost customer value)
# - False alarm costs $50 (wasted retention effort)
predictor = fm.create_binary_classifier(
target_column="churned",
name="churn_predictor_v1",
rare_label_value="yes", # "yes" is the positive class
cost_false_negative=500, # Cost of missing a churner
cost_false_positive=50 # Cost of false alarm
)
predictor.wait_for_training()
print(f"Accuracy: {predictor.accuracy:.4f}")
print(f"AUC: {predictor.auc:.4f}")
print(f"F1: {predictor.f1:.4f}")
# 3. Make predictions
customer = {
"tenure_months": 8,
"monthly_charges": 85.0,
"contract": "month-to-month",
"payment_method": "credit_card",
"total_charges": 680.0,
"support_tickets": 3
}
result = predictor.predict(customer)
print(f"Prediction: {result.predicted_class}")
print(f"Probability: {result.probability:.2%}")
print(f"Confidence: {result.confidence:.2%}")
# 4. Batch predict for risk scoring
import pandas as pd
customers_df = pd.read_csv("active_customers.csv")
results = predictor.batch_predict(customers_df, show_progress=True)
# Find high-risk customers
high_risk = []
for i, result in enumerate(results):
if result.predicted_class == "yes" and result.confidence > 0.7:
high_risk.append({
"customer_id": customers_df.iloc[i]["customer_id"],
"churn_probability": result.probability,
"confidence": result.confidence
})
print(f"Found {len(high_risk)} high-risk customers")
# 5. Create production endpoint
endpoint = predictor.create_api_endpoint(
name="churn_api_v1",
description="Production churn prediction endpoint"
)
print(f"Endpoint URL: {endpoint.url}")
print(f"API Key: {endpoint.api_key}")
# 6. Configure monitoring webhooks
predictor.configure_webhooks(
alert_drift="https://your-slack-webhook.com/drift",
alert_performance_degradation="https://your-slack-webhook.com/perf"
)
# 7. Publish to production
fm.publish(org_id="my_org", name="churn_model_v1")
Key Parameters¶
Cost-Sensitive Classification¶
Set costs based on business impact:
| Parameter | Description | Example |
|---|---|---|
cost_false_negative |
Cost of missing a churner | $500 (customer lifetime value) |
cost_false_positive |
Cost of false churn alert | $50 (retention offer cost) |
The model optimizes the decision threshold using Bayes-optimal selection.
Class Imbalance¶
If your production data has different class distribution than training:
predictor = fm.create_binary_classifier(
target_column="churned",
class_imbalance={"yes": 0.15, "no": 0.85} # 15% churn rate in production
)
Understanding Predictions¶
PredictionResult Fields¶
result = predictor.predict(customer)
# Classification result
result.predicted_class # "yes" or "no"
result.probability # Raw probability for predicted class (0.87)
result.confidence # Normalized confidence from threshold (0.74)
result.probabilities # {"yes": 0.87, "no": 0.13}
result.threshold # Decision threshold (0.35 after cost optimization)
# Tracking
result.prediction_uuid # UUID for feedback
# Warnings
result.guardrails # Per-column warnings for unusual values
Confidence vs Probability¶
- probability: Raw softmax output (e.g., 87% chance of churn)
- confidence: How far from the decision boundary (0 = uncertain, 1 = very certain)
- threshold: Optimized cutoff point based on costs (may not be 0.5)
Feature Importance¶
Understand why customers are predicted to churn:
result = predictor.predict(customer, feature_importance=True)
# Top factors driving this prediction
for feature, importance in sorted(
result.feature_importance.items(),
key=lambda x: abs(x[1]),
reverse=True
)[:5]:
print(f"{feature}: {importance:+.3f}")
Example output:
contract: +0.45 # Month-to-month increases churn risk
support_tickets: +0.23 # More tickets = higher risk
tenure_months: -0.18 # Longer tenure = lower risk
monthly_charges: +0.12 # Higher charges = higher risk
payment_method: -0.05 # Credit card = slightly lower risk
Sending Feedback¶
Track actual outcomes to improve future models:
# After customer actually churns or stays
actual_outcome = "yes" # Customer did churn
# Option 1: From the result object
feedback = result.send_feedback(ground_truth=actual_outcome)
feedback.send()
# Option 2: Using stored prediction UUID
featrix.prediction_feedback(
prediction_uuid=stored_uuid,
ground_truth=actual_outcome
)
Production API Usage¶
Python¶
result = endpoint.predict(
{"tenure_months": 8, "monthly_charges": 85.0, "contract": "month-to-month"},
api_key=endpoint.api_key
)
HTTP¶
curl -X POST "https://sphere-api.featrix.com/endpoint/churn_api_v1/predict" \
-H "X-API-Key: your-api-key" \
-H "Content-Type: application/json" \
-d '{"tenure_months": 8, "monthly_charges": 85.0, "contract": "month-to-month"}'
Response¶
{
"predicted_class": "yes",
"probability": 0.87,
"confidence": 0.74,
"probabilities": {"yes": 0.87, "no": 0.13},
"threshold": 0.35,
"prediction_uuid": "550e8400-e29b-41d4-a716-446655440000"
}
Best Practices¶
- Exclude ID columns - Customer IDs, emails, timestamps don't help prediction
- Set appropriate costs - False negatives (missed churners) often cost more than false positives
- Monitor for drift - Customer behavior changes over time
- Send feedback - Real outcomes improve future model versions
- Version your models - Use clear naming:
churn_model_v1_2024_01