PGDDSA Study · Semester 1

PGD01C02

Module 4 · Model Evaluation

Prediction and Decision Making

Core Titles

Key headlines and terms for quick recall

Prediction — model outputs $\hat y$ for a new $x$
Point prediction vs prediction interval
Confidence interval for $E[Y | x]$ vs prediction interval for individual $y$
Decision rule — convert prediction into action
Threshold tuning in classification
Cost-sensitive decisions — incorporate FP/FN costs
Decision theory — minimise expected loss
Deployment considerations — latency, monitoring, fallback

Basic Idea

What it is, why it matters, how it works

From model to decision

A trained model produces predictions; decision-making turns each prediction into an action.

Steps.

Predict $\hat y$ (and ideally a measure of uncertainty).
Apply a decision rule — threshold, cost function, business logic.
Take action — auto-approve, route to human review, block, etc.
Measure outcomes — was the decision correct? Feed back into retraining.

Point prediction vs uncertainty

Point prediction — a single best estimate $\hat y$ .
Confidence interval for $E[Y | x]$ — the expected response.
Prediction interval for a new individual $y$ — wider than a CI because it includes the residual variance.

Both intervals shrink with more data; prediction intervals do not vanish — they reflect intrinsic noise.

Decisions in regression

Inventory: if predicted demand $\hat y >$ reorder point, place order.
Pricing: if predicted elasticity favourable, raise price.

Decisions in classification — threshold tuning

A probabilistic classifier outputs $P(y = 1 | x)$ . Convert to a hard label using a threshold $\tau$ : $\hat y = \mathbb{1}\{P(y = 1 | x) \ge \tau\}.$

The choice of $\tau$ depends on business cost:

Cancer screening: $\tau$ low → catch most positives even if more false alarms (high recall).
Spam filter: $\tau$ high → avoid blocking legitimate email (high precision).
Fraud detection: tune to maximise dollars-saved per false alarm.

Cost-sensitive decisions

Build a cost matrix $C(\hat y, y)$ assigning cost to each (predicted, actual) pair. Choose the action that minimises expected cost: $\hat a = \arg \min_a \sum_y C(a, y) \cdot P(y | x).$

Example — credit lending. $C(\text{approve}, \text{default}) =$ ₹100 k (lost principal); $C(\text{deny}, \text{repaid}) =$ ₹10 k (lost interest). Approve iff $P(\text{repaid} | x) \cdot 10\text{k} > P(\text{default} | x) \cdot 100\text{k}$ .

Deployment considerations

Latency — model must score within SLA (e.g., < 50 ms for online ad bidding).
Throughput — handle peak request rate.
Monitoring — track prediction drift, business KPI, model health.
Fallback — degrade gracefully if model service fails (rules engine).
Audit / explainability — high-stakes decisions need traceable rationale.

Continuous improvement

Compare model predictions against actual outcomes once they materialise.
Retrain on a schedule or when drift detected.
Test new model versions via A/B / shadow mode before full rollout.

Mind Map

Visual structure of the concept

PREDICTION & DECISION-MAKING
├── Prediction
│   ├── Point estimate ŷ
│   ├── Confidence interval (for E[y|x])
│   └── Prediction interval (for new y)
├── Decision rule
│   ├── Threshold tuning (classification)
│   └── Cost matrix → expected-cost minimisation
├── Cost-sensitive examples
│   ├── Cancer screening → low τ
│   ├── Spam filter → high τ
│   └── Fraud — $ per FP / FN
├── Deployment
│   ├── Latency / throughput
│   ├── Monitoring / drift
│   └── Fallback rules
└── Continuous improvement
    ├── Compare to actuals
    ├── Retrain on schedule / drift
    └── A/B test new versions

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Differentiate point prediction and prediction interval.

Point prediction — single best-estimate $\hat y$ for new $x$ .
Prediction interval — range $[\hat y - z s, \hat y + z s]$ likely to contain the actual $y$ with given confidence; wider than the confidence interval because it includes residual noise.

Q2. What is threshold tuning in classification? Adjusting the decision threshold $\tau$ that converts predicted probabilities into class labels: $\hat y = \mathbb{1}\{P(y = 1 | x) \ge \tau\}$ . Lower $\tau$ favours recall; higher $\tau$ favours precision.

Q3. What is a cost-sensitive decision? A decision rule that picks the action minimising expected cost based on a cost matrix $C$ that assigns cost to each (predicted, actual) pair — instead of treating all errors equally.

Part B (20 marks)

Q. Discuss prediction and decision-making in machine-learning systems. Explain how predictions are translated into actions, the role of threshold tuning and cost-sensitive decisions, and deployment considerations.

From model to decision.

A trained model produces $\hat y$ (regression) or $P(y = 1 | x)$ (classification). Decision-making converts predictions into actions:

Predict $\hat y$ with optional uncertainty.
Decision rule — threshold, business logic, optimisation.
Action — auto-approve, route to human, block.
Outcome — observe actuals, feed back into model improvement.

Uncertainty.

Confidence interval (CI) — for the expected response $E[Y|x]$ .
Prediction interval (PI) — for a new individual $y$ ; wider, includes noise.

Reporting an interval rather than a point empowers downstream decisions to allow for risk.

Threshold tuning in classification.

A probabilistic classifier returns $p = P(y = 1 | x)$ . Convert to a label using $\tau$ : $\hat y = \mathbb{1}\{p \ge \tau\}.$

Default $\tau = 0.5$ is rarely optimal. The right $\tau$ depends on the business cost of FP vs FN:

Application	Cost FN	Cost FP	Better threshold
Cancer screening	Patient dies	False alarm	Low $\tau$ → high recall
Spam filter	Spam in inbox	Real email lost	High $\tau$ → high precision
Loan approval	Default (lost ₹)	Lost interest	Tuned to min expected cost

Sweep $\tau$ along the PR curve or ROC curve; pick the operating point that best serves the business.

Cost-sensitive decisions.

Define a cost matrix $C(\hat y, y)$ . Choose action $\hat a$ minimising expected cost: $\hat a = \arg \min_a \sum_y C(a, y) \cdot P(y | x).$

Example — loan. Cost(approve, default) = ₹100 k; Cost(deny, repay) = ₹10 k. Expected cost of approving: $C_{\text{approve}} = P(\text{default} | x) \cdot 100\text{k}$ . Expected cost of denying: $C_{\text{deny}} = P(\text{repay} | x) \cdot 10\text{k}$ . Approve iff $C_{\text{approve}} < C_{\text{deny}}$ , i.e., $P(\text{default} | x) < 0.0909$ .

This is mathematically equivalent to choosing the threshold $\tau = 0.91$ on $P(\text{repay} | x)$ .

Decision theory generalises this: pick the action minimising expected loss under your posterior distribution of $y$ .

Deployment considerations.

Latency — model must respond within SLA. Real-time ads need < 10 ms; batch fraud allows hours.
Throughput — handle peak QPS without timeouts. Often achieved with horizontal scaling and caching.
Monitoring.
- Input drift (PSI, KS test).
- Prediction drift.
- Outcome metrics (precision, recall, business KPI).
- Alert when degraded.
Calibration. Recalibrate probabilities periodically using Platt scaling or isotonic regression.
Fallback / safety net. If model service fails, route to rules engine or human review — never let predictions fail silently.
Auditability and explainability. SHAP / LIME explanations on individual predictions for high-stakes decisions (loan denial, medical recommendation).
Versioning. Track model artifact, training data snapshot, hyperparameters. Roll back instantly if a new version misbehaves.

Continuous improvement.

A/B test new model versions or new thresholds.
Shadow mode — score with new model but act with old; compare offline.
Retrain on a schedule or trigger when drift detected.
Feedback loop — labelled outcomes (loan repaid? fraud confirmed?) become training data.

Take-away. A model's accuracy is necessary but not sufficient; the system — thresholding, costs, monitoring, fallbacks — determines whether the model delivers business value.

In-sample Evaluation Measures Generalization Error and Out-of-Sample Metrics