Mobile Plan Recommendation Engine

01

Problem

Mobile carriers routinely carry customers on legacy plans that no longer match how those customers actually use their phones. For Megaline, this meant a substantial portion of their user base was on outdated pricing that was either overcharging customers who used less than they paid for or leaving revenue on the table for heavy users who needed more. The business had two modern plans — Smart and Ultra — but no systematic way to identify which plan each legacy customer should be moved to. Manual review was impractical at scale, and rule-based approaches require explicit thresholds that ignore the interactions between usage features. What Megaline needed was a classification model that could learn the boundary between Smart and Ultra from historical usage data and apply that boundary to new customers automatically.

02

Solution

This project trained a Random Forest classifier on Megaline's behavioral data — monthly calls, minutes used, messages sent, and internet data consumed — to recommend the appropriate modern plan for each legacy customer. The workflow followed a full end-to-end ML pipeline: feature inspection, train/validation/test split, model selection across Decision Tree and Random Forest algorithms, hyperparameter tuning, and final evaluation against a held-out test set. The target was a minimum test accuracy of 75%; the final model achieved 81.8%, clearing the threshold by 6.8 percentage points. Sanity checks against a dummy classifier baseline confirmed that the model was learning real signal from the behavioral features rather than exploiting class imbalance. This was the first complete ML project in the training program — the point where individual concepts like decision boundaries and ensemble methods connected into a working system.

03

Skills Acquired

Python — the implementation language for the full pipeline: data loading, feature analysis, model training, hyperparameter tuning, and evaluation.
scikit-learn — the machine learning framework used to train both the Decision Tree and Random Forest classifiers, run accuracy evaluations, and generate the dummy classifier baseline. scikit-learn's consistent estimator API made it straightforward to swap algorithms and compare results without rewriting the evaluation code.
Pandas — used for data loading, inspection, and feature analysis. Pandas DataFrames were the primary data structure throughout preprocessing, enabling column-wise analysis of the four behavioral features before they were passed to the model.
Decision Tree — the baseline model in the comparison. A single Decision Tree is interpretable and fast, but prone to overfitting — using it as the starting point established a performance floor that Random Forest needed to meaningfully exceed.
Random Forest — the final model that achieved 81.8% test accuracy. Random Forest averages the predictions of many decorrelated decision trees, reducing variance and improving generalization over any individual tree — which is why it consistently outperforms a single Decision Tree on tabular classification tasks.

What makes the result meaningful is not just the final accuracy number — it is the reasoning behind each decision along the way.

04

Deep Dive

Megaline, a mobile carrier, has a problem. Many of their customers are still on legacy plans — plans that no longer match how they actually use their phones. The business wants to move these customers to one of two modern options: Smart or Ultra. But how do you know which plan fits which customer?

You look at the data. Every month, Megaline tracks each customer's calls, minutes, messages, and data usage. That behavioral footprint tells a story about what plan they actually need — and a classification model can learn to read it.

The project requirement was clear: build a model that recommends the right plan with a minimum accuracy of 75% on the held-out test set. What followed was my first end-to-end ML workflow — and a lesson in how much a single algorithmic choice can change the outcome.

Why This Project?

This was Sprint 8 of my TripleTen AI and Machine Learning Bootcamp — my first full machine learning project. Up to this point, I had learned the theory: what a Decision Tree does, how Random Forest improves on it, what hyperparameters control overfitting. This was where I applied all of it to a real dataset for the first time.

I treated it like a genuine business problem. Megaline's goal isn't to maximize some abstract score — it's to recommend the right plan so customers stay satisfied and don't churn. That framing shaped every decision I made, including which metrics to focus on and which ones to deprioritize.

What I Learned

This was my first time making deliberate, justified decisions about which metric to optimize — and why accuracy + precision made more sense than F1 for this specific business case. That kind of reasoning — metric selection tied to real business impact — is something I now apply to every project.

What You'll Learn from This

Why tree-based models don't need feature scaling — and the specific reason distance-based algorithms do
How to choose between accuracy, precision, recall, and F1 based on the actual business question
What happens when you increase n_estimators from 100 to 10,000 — and when that matters
How to structure a clean train/validation/test workflow so test results are genuinely unbiased
Why Random Forest consistently outperforms Decision Tree — and what the tradeoffs are

Key Takeaways

Random Forest (10,000 trees) achieved 81.8% accuracy on the final test set — exceeding the ≥ 75% target by +6.8%
Feature scaling was deliberately skipped — tree algorithms split on thresholds, not distances; scaling would add complexity with zero benefit
Accuracy and precision were the right metrics here — a missed Ultra recommendation is recoverable; a wrong recommendation damages trust
Increasing trees from 100 → 10,000 improved Random Forest precision by +2.3% — diminishing returns, but meaningful for this use case
Random Forest outperformed Decision Tree on every metric across every experiment

The Dataset

Megaline's usage history: 3,214 customers, 5 columns. All four features captured behavioral signals — how often customers called, how long they talked, how many texts they sent, how much data they consumed. The target column — is_ultra — indicated which plan the customer was actually on that month.

Column	Description	Range (approx.)
calls	Number of calls per month	0 – 244
minutes	Total call duration (min)	0 – 1,632
messages	Number of texts	0 – 224
mb_used	Data used (MB)	0 – 49,745
is_ultra	Target: Ultra=1, Smart=0	—

No missing values. No duplicate rows. No encoding needed — all features were already numeric. This was as clean as a dataset gets, which meant the focus was entirely on modeling decisions.

My Process

Phase 1

Import Libraries & Load Data

Loaded the dataset and imported the tools needed: pandas for data handling, and scikit-learn's classifiers, splitters, and metrics for the full ML workflow.

import pandas as pd
from sklearn.model_selection  import train_test_split
from sklearn.tree             import DecisionTreeClassifier
from sklearn.ensemble         import RandomForestClassifier
from sklearn.metrics          import (
    accuracy_score, precision_score,
    recall_score, f1_score, confusion_matrix
)

df = pd.read_csv('/datasets/users_behavior.csv')

Phase 2

Exploratory Data Analysis

Explored the dataset with df.head(), df.info(), and df.describe(). Three things stood out immediately:

3,214 records — no missing values in any column
All features are numeric — no encoding required
mb_used ranges 0–49,745 while calls ranges 0–244 — a scale difference worth investigating

print("Shape:", df.shape)           # (3214, 5)
df.info()                          # all non-null, float64 + int64
df.describe()                       # reveals mb_used >> other features

print(df['is_ultra'].value_counts(normalize=True))
# 0 (Smart): ~69.4%
# 1 (Ultra): ~30.6%

Phase 3

Feature Scaling Consideration

The scale difference between mb_used (~0–49,745) and calls (~0–244) raised a question: does this require feature scaling?

The answer depends on the algorithm. Distance-based methods — logistic regression, KNN, SVMs, neural networks — are affected by feature scales because they compute distances between data points. Tree-based methods are not. Decision Trees and Random Forests make binary splits on individual feature thresholds, so a feature's scale has no impact on its ability to split.

In my own interpretation of the difference: Decision Trees ask "is this feature above or below a threshold?" — scale doesn't change the answer. Logistic regression asks "how far is this point from the decision boundary?" — scale changes everything.

# Feature value ranges — scale difference is significant
print(df[['calls', 'minutes', 'messages', 'mb_used']].describe().loc[['min', 'max']])

# Conclusion: tree algorithms split on thresholds, not distances.
# Feature scaling is NOT required for Decision Tree or Random Forest.
# Skipping StandardScaler — would add complexity with zero benefit here.

What I Learned

Before this project, I knew feature scaling as a "step you do before modeling." After this project, I understood why — and more importantly, when not to. Making a justified decision to skip a common preprocessing step is more valuable than following a checklist blindly.

Phase 4

Model Training & Validation

A stratified 60/20/20 split kept the ~30.6% Ultra class ratio consistent across train, validation, and test sets. The model had not seen the test set at any point during this phase.

X = df.drop('is_ultra', axis=1)
y = df['is_ultra']

# Step 1: Separate test set (20%)
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
# Step 2: Split remaining into train (60%) and validation (20%)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_temp, y_temp, test_size=0.25, random_state=42
)
# Result: 1,928 train / 643 valid / 643 test

I ran two validation attempts. The first established baseline performance; the second tuned hyperparameters to push further. The goal here wasn't to pass the target — it was to understand how much each change actually moved the needle:

Validation Attempt 1 — Baseline:

model_dt_v1 = DecisionTreeClassifier(
    max_depth=6, min_samples_split=20,
    min_samples_leaf=10, max_features='sqrt', random_state=42
)
model_rf_v1 = RandomForestClassifier(
    n_estimators=100, max_depth=10,
    min_samples_split=5, max_features='sqrt', random_state=42
)
# Decision Tree: Accuracy 76.5%, Precision 67.8%
# Random Forest: Accuracy 79.5%, Precision 71.4%

Validation Attempt 2 — Tuned: Increased max_depth for Decision Tree; boosted Random Forest to 10,000 trees with more conservative splits:

model_dt_v2 = DecisionTreeClassifier(
    max_depth=10, min_samples_split=10,
    min_samples_leaf=5, max_features='sqrt', random_state=42
)
model_rf_v2 = RandomForestClassifier(
    n_estimators=10000,          # 100x more trees
    max_depth=10,
    min_samples_split=5, min_samples_leaf=2,
    max_features='sqrt', random_state=42
)
# Decision Tree: Accuracy 76.5% (no change), Precision 64.5% (-3.3%)
# Random Forest: Accuracy 79.9% (+0.4%), Precision 73.7% (+2.3%)

Model	Accuracy	Precision	vs. Target (≥ 75%)
Decision Tree — Baseline	76.5%	67.8%	PASSES ✓
Random Forest — 100 trees	79.5%	71.4%	PASSES ✓
Decision Tree — Tuned	76.5%	64.5%	PASSES ✓
Random Forest — 10,000 trees	79.9%	73.7%	PASSES ✓

Phase 4 — Key Decision

Why Accuracy and Precision — Not F1

To understand the rationale, consider the question each metric is actually answering:

Accuracy: "How often does the model recommend the right plan across all my customers?"
Precision: "As a customer, when the model recommends Ultra — how much can I trust that?"
Recall: "Of all customers who should be on Ultra, how many did we actually catch?"
F1: "What's the balanced score between Precision and Recall?"

For a plan recommendation system, missing an Ultra recommendation is not catastrophic — a customer can upgrade later when they need more data. The business does not lose that customer over a missed recommendation. But recommending the wrong plan incorrectly damages trust. That's why Accuracy and Precision are the right metrics here. Recall and F1 become less relevant when missing a prediction is recoverable.

Final Test Results

The model had not seen the test set at any point — no training, no validation, no tuning. This is what makes the result meaningful: an unbiased estimate of real-world performance. Selected model: Random Forest with 10,000 trees.

y_test_pred = model_rf_v2.predict(X_test)

final_acc  = accuracy_score(y_test, y_test_pred)   # 0.818
final_prec = precision_score(y_test, y_test_pred)  # 0.775
final_rec  = recall_score(y_test, y_test_pred)     # 0.532
final_f1   = f1_score(y_test, y_test_pred)         # 0.631

# Confusion Matrix:
# [[426  29]
#  [ 88 100]]
# True Negatives (Smart → Smart): 426
# False Positives (Smart → Ultra):  29
# False Negatives (Ultra → Smart):  88
# True Positives  (Ultra → Ultra): 100

Metric	Score	Notes
Accuracy (Test)	81.8%	Target ≥ 75% — PASSED ✓ (+6.8%)
Precision (Test)	77.5%	When model says Ultra, it's right 77.5% of the time
Recall (Test)	53.2%	Expected — deprioritized for this business case
Accuracy (Validation)	79.9%	Generalization gap: +1.9% — model improved on test

The model actually performed better on the test set than on validation — 81.8% vs. 79.9%. This is rare, and it suggests the train/validation split may have landed slightly tougher examples in the validation set. Either way, the test result is what matters for deployment decisions.

Main Takeaways

Random Forest consistently dominated. It outperformed Decision Tree on every metric across every experiment — baseline, tuned, and test.
More trees helped — up to a point. Going from 100 → 10,000 trees improved precision by +2.3% and accuracy by +0.4%. Real gains, though diminishing. There's a compute cost to weigh against marginal improvement.
Feature scaling is algorithm-specific. Skipping StandardScaler was the correct decision for tree-based methods — and it was a decision I made deliberately, not by accident.
Metric selection is a business decision, not a default. Choosing accuracy + precision over F1 required understanding what "wrong" and "missed" actually cost in this context.
Clean data doesn't mean easy modeling. With no missing values and no encoding needed, every modeling decision stood on its own — no preprocessing noise to hide behind.

Conclusion & Reflections

This was Sprint 8 — my first ML project. Looking back, what I'm most proud of isn't the 81.8% accuracy. It's the decision-making process that got there: the deliberate choice to skip feature scaling, the justification for accuracy over F1, the structured comparison across two validation attempts before touching the test set.

In a real deployment, a model like this could run inside Megaline's CRM to flag legacy-plan customers with a recommended upgrade. With 77.5% precision, nearly 4 out of 5 customers flagged for Ultra actually belong there — a reliable enough signal to drive outreach campaigns.

Growth from Sprint 8 → Sprint 9

Sprint 8 taught me the fundamentals of a clean ML workflow. By Sprint 9, I was handling class imbalance, running GridSearchCV over 108 parameter combinations, and validating data missingness with a formal MAR test before imputing. The habits built here — methodical splitting, careful metric selection, explicit before/after comparisons — carried forward into every project since.

Project Requirement	Status
Accuracy ≥ 75% on test set	ACHIEVED — 81.8% (+6.8% margin) ✓
Train / validation / test split used	YES — stratified 60/20/20 ✓
Multiple models evaluated	YES — Decision Tree + Random Forest, 2 rounds ✓
Feature scaling decision documented	YES — justified skip for tree-based methods ✓
Metric selection justified	YES — Accuracy + Precision over F1 ✓

Recommending the Right Plan:
Machine Learning for Mobile Carriers

Problem

Solution

Skills Acquired

Deep Dive

Why This Project?

What You'll Learn from This

Key Takeaways

The Dataset

My Process

Import Libraries & Load Data

Exploratory Data Analysis

Feature Scaling Consideration

Model Training & Validation

Why Accuracy and Precision — Not F1

Final Test Results

Main Takeaways

Conclusion & Reflections

Want to Explore the Full Code?

Recommending the Right Plan:Machine Learning for Mobile Carriers

Problem

Solution

Skills Acquired

Deep Dive

Why This Project?

What You'll Learn from This

Key Takeaways

The Dataset

My Process

Import Libraries & Load Data

Exploratory Data Analysis

Feature Scaling Consideration

Model Training & Validation

Why Accuracy and Precision — Not F1

Final Test Results

Main Takeaways

Conclusion & Reflections

Want to Explore the Full Code?

Recommending the Right Plan:
Machine Learning for Mobile Carriers