← Back to Projects Mobile Plan Recommendation Engine demo
01

Problem

Mobile carriers routinely carry customers on legacy plans that no longer match how those customers actually use their phones. For Megaline, this meant a substantial portion of their user base was on outdated pricing that was either overcharging customers who used less than they paid for or leaving revenue on the table for heavy users who needed more. The business had two modern plans — Smart and Ultra — but no systematic way to identify which plan each legacy customer should be moved to. Manual review was impractical at scale, and rule-based approaches require explicit thresholds that ignore the interactions between usage features. What Megaline needed was a classification model that could learn the boundary between Smart and Ultra from historical usage data and apply that boundary to new customers automatically.


02

Solution

This project trained a Random Forest classifier on Megaline's behavioral data — monthly calls, minutes used, messages sent, and internet data consumed — to recommend the appropriate modern plan for each legacy customer. The workflow followed a full end-to-end ML pipeline: feature inspection, train/validation/test split, model selection across Decision Tree and Random Forest algorithms, hyperparameter tuning, and final evaluation against a held-out test set. The target was a minimum test accuracy of 75%; the final model achieved 81.8%, clearing the threshold by 6.8 percentage points. Sanity checks against a dummy classifier baseline confirmed that the model was learning real signal from the behavioral features rather than exploiting class imbalance. This was the first complete ML project in the training program — the point where individual concepts like decision boundaries and ensemble methods connected into a working system.


03

Skills Acquired

What makes the result meaningful is not just the final accuracy number — it is the reasoning behind each decision along the way.


04

Deep Dive

Megaline, a mobile carrier, has a problem. Many of their customers are still on legacy plans — plans that no longer match how they actually use their phones. The business wants to move these customers to one of two modern options: Smart or Ultra. But how do you know which plan fits which customer?

You look at the data. Every month, Megaline tracks each customer's calls, minutes, messages, and data usage. That behavioral footprint tells a story about what plan they actually need — and a classification model can learn to read it.

The project requirement was clear: build a model that recommends the right plan with a minimum accuracy of 75% on the held-out test set. What followed was my first end-to-end ML workflow — and a lesson in how much a single algorithmic choice can change the outcome.

Why This Project?

This was Sprint 8 of my TripleTen AI and Machine Learning Bootcamp — my first full machine learning project. Up to this point, I had learned the theory: what a Decision Tree does, how Random Forest improves on it, what hyperparameters control overfitting. This was where I applied all of it to a real dataset for the first time.

I treated it like a genuine business problem. Megaline's goal isn't to maximize some abstract score — it's to recommend the right plan so customers stay satisfied and don't churn. That framing shaped every decision I made, including which metrics to focus on and which ones to deprioritize.

What I Learned

This was my first time making deliberate, justified decisions about which metric to optimize — and why accuracy + precision made more sense than F1 for this specific business case. That kind of reasoning — metric selection tied to real business impact — is something I now apply to every project.


What You'll Learn from This


Key Takeaways


The Dataset

Megaline's usage history: 3,214 customers, 5 columns. All four features captured behavioral signals — how often customers called, how long they talked, how many texts they sent, how much data they consumed. The target column — is_ultra — indicated which plan the customer was actually on that month.

ColumnDescriptionRange (approx.)
callsNumber of calls per month0 – 244
minutesTotal call duration (min)0 – 1,632
messagesNumber of texts0 – 224
mb_usedData used (MB)0 – 49,745
is_ultraTarget: Ultra=1, Smart=0

No missing values. No duplicate rows. No encoding needed — all features were already numeric. This was as clean as a dataset gets, which meant the focus was entirely on modeling decisions.


My Process

Phase 1

Import Libraries & Load Data

Loaded the dataset and imported the tools needed: pandas for data handling, and scikit-learn's classifiers, splitters, and metrics for the full ML workflow.

import pandas as pd
from sklearn.model_selection  import train_test_split
from sklearn.tree             import DecisionTreeClassifier
from sklearn.ensemble         import RandomForestClassifier
from sklearn.metrics          import (
    accuracy_score, precision_score,
    recall_score, f1_score, confusion_matrix
)

df = pd.read_csv('/datasets/users_behavior.csv')

Phase 2

Exploratory Data Analysis

Explored the dataset with df.head(), df.info(), and df.describe(). Three things stood out immediately:

  • 3,214 records — no missing values in any column
  • All features are numeric — no encoding required
  • mb_used ranges 0–49,745 while calls ranges 0–244 — a scale difference worth investigating
print("Shape:", df.shape)           # (3214, 5)
df.info()                          # all non-null, float64 + int64
df.describe()                       # reveals mb_used >> other features

print(df['is_ultra'].value_counts(normalize=True))
# 0 (Smart): ~69.4%
# 1 (Ultra): ~30.6%

Phase 3

Feature Scaling Consideration

The scale difference between mb_used (~0–49,745) and calls (~0–244) raised a question: does this require feature scaling?

The answer depends on the algorithm. Distance-based methods — logistic regression, KNN, SVMs, neural networks — are affected by feature scales because they compute distances between data points. Tree-based methods are not. Decision Trees and Random Forests make binary splits on individual feature thresholds, so a feature's scale has no impact on its ability to split.

In my own interpretation of the difference: Decision Trees ask "is this feature above or below a threshold?" — scale doesn't change the answer. Logistic regression asks "how far is this point from the decision boundary?" — scale changes everything.
# Feature value ranges — scale difference is significant
print(df[['calls', 'minutes', 'messages', 'mb_used']].describe().loc[['min', 'max']])

# Conclusion: tree algorithms split on thresholds, not distances.
# Feature scaling is NOT required for Decision Tree or Random Forest.
# Skipping StandardScaler — would add complexity with zero benefit here.

What I Learned

Before this project, I knew feature scaling as a "step you do before modeling." After this project, I understood why — and more importantly, when not to. Making a justified decision to skip a common preprocessing step is more valuable than following a checklist blindly.

Phase 4

Model Training & Validation

A stratified 60/20/20 split kept the ~30.6% Ultra class ratio consistent across train, validation, and test sets. The model had not seen the test set at any point during this phase.

X = df.drop('is_ultra', axis=1)
y = df['is_ultra']

# Step 1: Separate test set (20%)
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
# Step 2: Split remaining into train (60%) and validation (20%)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42
)
# Result: 1,928 train / 643 valid / 643 test

I ran two validation attempts. The first established baseline performance; the second tuned hyperparameters to push further. The goal here wasn't to pass the target — it was to understand how much each change actually moved the needle:

Validation Attempt 1 — Baseline:

model_dt_v1 = DecisionTreeClassifier(
    max_depth=6, min_samples_split=20,
    min_samples_leaf=10, max_features='sqrt', random_state=42
)
model_rf_v1 = RandomForestClassifier(
    n_estimators=100, max_depth=10,
    min_samples_split=5, max_features='sqrt', random_state=42
)
# Decision Tree: Accuracy 76.5%, Precision 67.8%
# Random Forest: Accuracy 79.5%, Precision 71.4%

Validation Attempt 2 — Tuned: Increased max_depth for Decision Tree; boosted Random Forest to 10,000 trees with more conservative splits:

model_dt_v2 = DecisionTreeClassifier(
    max_depth=10, min_samples_split=10,
    min_samples_leaf=5, max_features='sqrt', random_state=42
)
model_rf_v2 = RandomForestClassifier(
    n_estimators=10000,          # 100x more trees
    max_depth=10,
    min_samples_split=5, min_samples_leaf=2,
    max_features='sqrt', random_state=42
)
# Decision Tree: Accuracy 76.5% (no change), Precision 64.5% (-3.3%)
# Random Forest: Accuracy 79.9% (+0.4%), Precision 73.7% (+2.3%)
ModelAccuracyPrecisionvs. Target (≥ 75%)
Decision Tree — Baseline76.5%67.8%PASSES ✓
Random Forest — 100 trees79.5%71.4%PASSES ✓
Decision Tree — Tuned76.5%64.5%PASSES ✓
Random Forest — 10,000 trees79.9%73.7%PASSES ✓

Phase 4 — Key Decision

Why Accuracy and Precision — Not F1

To understand the rationale, consider the question each metric is actually answering:

  • Accuracy: "How often does the model recommend the right plan across all my customers?"
  • Precision: "As a customer, when the model recommends Ultra — how much can I trust that?"
  • Recall: "Of all customers who should be on Ultra, how many did we actually catch?"
  • F1: "What's the balanced score between Precision and Recall?"

For a plan recommendation system, missing an Ultra recommendation is not catastrophic — a customer can upgrade later when they need more data. The business does not lose that customer over a missed recommendation. But recommending the wrong plan incorrectly damages trust. That's why Accuracy and Precision are the right metrics here. Recall and F1 become less relevant when missing a prediction is recoverable.


Final Test Results

The model had not seen the test set at any point — no training, no validation, no tuning. This is what makes the result meaningful: an unbiased estimate of real-world performance. Selected model: Random Forest with 10,000 trees.

y_test_pred = model_rf_v2.predict(X_test)

final_acc  = accuracy_score(y_test, y_test_pred)   # 0.818
final_prec = precision_score(y_test, y_test_pred)  # 0.775
final_rec  = recall_score(y_test, y_test_pred)     # 0.532
final_f1   = f1_score(y_test, y_test_pred)         # 0.631

# Confusion Matrix:
# [[426  29]
#  [ 88 100]]
# True Negatives (Smart → Smart): 426
# False Positives (Smart → Ultra):  29
# False Negatives (Ultra → Smart):  88
# True Positives  (Ultra → Ultra): 100
MetricScoreNotes
Accuracy (Test) 81.8% Target ≥ 75% — PASSED ✓ (+6.8%)
Precision (Test) 77.5% When model says Ultra, it's right 77.5% of the time
Recall (Test) 53.2% Expected — deprioritized for this business case
Accuracy (Validation) 79.9% Generalization gap: +1.9% — model improved on test

The model actually performed better on the test set than on validation — 81.8% vs. 79.9%. This is rare, and it suggests the train/validation split may have landed slightly tougher examples in the validation set. Either way, the test result is what matters for deployment decisions.


Main Takeaways


Conclusion & Reflections

This was Sprint 8 — my first ML project. Looking back, what I'm most proud of isn't the 81.8% accuracy. It's the decision-making process that got there: the deliberate choice to skip feature scaling, the justification for accuracy over F1, the structured comparison across two validation attempts before touching the test set.

In a real deployment, a model like this could run inside Megaline's CRM to flag legacy-plan customers with a recommended upgrade. With 77.5% precision, nearly 4 out of 5 customers flagged for Ultra actually belong there — a reliable enough signal to drive outreach campaigns.

Growth from Sprint 8 → Sprint 9

Sprint 8 taught me the fundamentals of a clean ML workflow. By Sprint 9, I was handling class imbalance, running GridSearchCV over 108 parameter combinations, and validating data missingness with a formal MAR test before imputing. The habits built here — methodical splitting, careful metric selection, explicit before/after comparisons — carried forward into every project since.

Project RequirementStatus
Accuracy ≥ 75% on test setACHIEVED — 81.8% (+6.8% margin) ✓
Train / validation / test split usedYES — stratified 60/20/20 ✓
Multiple models evaluatedYES — Decision Tree + Random Forest, 2 rounds ✓
Feature scaling decision documentedYES — justified skip for tree-based methods ✓
Metric selection justifiedYES — Accuracy + Precision over F1 ✓

Want to Explore the Full Code?

The complete notebook — all phases, both validation attempts, final test results — is on GitHub.