PGD01C02
Module 4 · Model Evaluation

Prediction and Decision Making

Core Titles
Key headlines and terms for quick recall
  • Prediction — model outputs y^\hat y for a new xx
  • Point prediction vs prediction interval
  • Confidence interval for E[Yx]E[Y | x] vs prediction interval for individual yy
  • Decision rule — convert prediction into action
  • Threshold tuning in classification
  • Cost-sensitive decisions — incorporate FP/FN costs
  • Decision theory — minimise expected loss
  • Deployment considerations — latency, monitoring, fallback
Basic Idea
What it is, why it matters, how it works

From model to decision

A trained model produces predictions; decision-making turns each prediction into an action.

Steps.

  1. Predict y^\hat y (and ideally a measure of uncertainty).
  2. Apply a decision rule — threshold, cost function, business logic.
  3. Take action — auto-approve, route to human review, block, etc.
  4. Measure outcomes — was the decision correct? Feed back into retraining.

Point prediction vs uncertainty

  • Point prediction — a single best estimate y^\hat y.
  • Confidence interval for E[Yx]E[Y | x] — the expected response.
  • Prediction interval for a new individual yy — wider than a CI because it includes the residual variance.

Both intervals shrink with more data; prediction intervals do not vanish — they reflect intrinsic noise.

Decisions in regression

  • Inventory: if predicted demand y^>\hat y > reorder point, place order.
  • Pricing: if predicted elasticity favourable, raise price.

Decisions in classification — threshold tuning

A probabilistic classifier outputs P(y=1x)P(y = 1 | x). Convert to a hard label using a threshold τ\tau: y^=1{P(y=1x)τ}.\hat y = \mathbb{1}\{P(y = 1 | x) \ge \tau\}.

The choice of τ\tau depends on business cost:

  • Cancer screening: τ\tau low → catch most positives even if more false alarms (high recall).
  • Spam filter: τ\tau high → avoid blocking legitimate email (high precision).
  • Fraud detection: tune to maximise dollars-saved per false alarm.

Cost-sensitive decisions

Build a cost matrix C(y^,y)C(\hat y, y) assigning cost to each (predicted, actual) pair. Choose the action that minimises expected cost: a^=argminayC(a,y)P(yx).\hat a = \arg \min_a \sum_y C(a, y) \cdot P(y | x).

Example — credit lending. C(approve,default)=C(\text{approve}, \text{default}) = ₹100 k (lost principal); C(deny,repaid)=C(\text{deny}, \text{repaid}) = ₹10 k (lost interest). Approve iff P(repaidx)10k>P(defaultx)100kP(\text{repaid} | x) \cdot 10\text{k} > P(\text{default} | x) \cdot 100\text{k}.

Deployment considerations

  • Latency — model must score within SLA (e.g., < 50 ms for online ad bidding).
  • Throughput — handle peak request rate.
  • Monitoring — track prediction drift, business KPI, model health.
  • Fallback — degrade gracefully if model service fails (rules engine).
  • Audit / explainability — high-stakes decisions need traceable rationale.

Continuous improvement

  • Compare model predictions against actual outcomes once they materialise.
  • Retrain on a schedule or when drift detected.
  • Test new model versions via A/B / shadow mode before full rollout.
Mind Map
Visual structure of the concept
PREDICTION & DECISION-MAKING
├── Prediction
│   ├── Point estimate ŷ
│   ├── Confidence interval (for E[y|x])
│   └── Prediction interval (for new y)
├── Decision rule
│   ├── Threshold tuning (classification)
│   └── Cost matrix → expected-cost minimisation
├── Cost-sensitive examples
│   ├── Cancer screening → low τ
│   ├── Spam filter → high τ
│   └── Fraud — $ per FP / FN
├── Deployment
│   ├── Latency / throughput
│   ├── Monitoring / drift
│   └── Fallback rules
└── Continuous improvement
    ├── Compare to actuals
    ├── Retrain on schedule / drift
    └── A/B test new versions
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Differentiate point prediction and prediction interval.

  • Point prediction — single best-estimate y^\hat y for new xx.
  • Prediction interval — range [y^zs,y^+zs][\hat y - z s, \hat y + z s] likely to contain the actual yy with given confidence; wider than the confidence interval because it includes residual noise.

Q2. What is threshold tuning in classification? Adjusting the decision threshold τ\tau that converts predicted probabilities into class labels: y^=1{P(y=1x)τ}\hat y = \mathbb{1}\{P(y = 1 | x) \ge \tau\}. Lower τ\tau favours recall; higher τ\tau favours precision.

Q3. What is a cost-sensitive decision? A decision rule that picks the action minimising expected cost based on a cost matrix CC that assigns cost to each (predicted, actual) pair — instead of treating all errors equally.


Part B (20 marks)

Q. Discuss prediction and decision-making in machine-learning systems. Explain how predictions are translated into actions, the role of threshold tuning and cost-sensitive decisions, and deployment considerations.

From model to decision.

A trained model produces y^\hat y (regression) or P(y=1x)P(y = 1 | x) (classification). Decision-making converts predictions into actions:

  1. Predict y^\hat y with optional uncertainty.
  2. Decision rule — threshold, business logic, optimisation.
  3. Action — auto-approve, route to human, block.
  4. Outcome — observe actuals, feed back into model improvement.

Uncertainty.

  • Confidence interval (CI) — for the expected response E[Yx]E[Y|x].
  • Prediction interval (PI) — for a new individual yy; wider, includes noise.

Reporting an interval rather than a point empowers downstream decisions to allow for risk.

Threshold tuning in classification.

A probabilistic classifier returns p=P(y=1x)p = P(y = 1 | x). Convert to a label using τ\tau: y^=1{pτ}.\hat y = \mathbb{1}\{p \ge \tau\}.

Default τ=0.5\tau = 0.5 is rarely optimal. The right τ\tau depends on the business cost of FP vs FN:

ApplicationCost FNCost FPBetter threshold
Cancer screeningPatient diesFalse alarmLow τ\tau → high recall
Spam filterSpam in inboxReal email lostHigh τ\tau → high precision
Loan approvalDefault (lost ₹)Lost interestTuned to min expected cost

Sweep τ\tau along the PR curve or ROC curve; pick the operating point that best serves the business.

Cost-sensitive decisions.

Define a cost matrix C(y^,y)C(\hat y, y). Choose action a^\hat a minimising expected cost: a^=argminayC(a,y)P(yx).\hat a = \arg \min_a \sum_y C(a, y) \cdot P(y | x).

Example — loan. Cost(approve, default) = ₹100 k; Cost(deny, repay) = ₹10 k. Expected cost of approving: Capprove=P(defaultx)100kC_{\text{approve}} = P(\text{default} | x) \cdot 100\text{k}. Expected cost of denying: Cdeny=P(repayx)10kC_{\text{deny}} = P(\text{repay} | x) \cdot 10\text{k}. Approve iff Capprove<CdenyC_{\text{approve}} < C_{\text{deny}}, i.e., P(defaultx)<0.0909P(\text{default} | x) < 0.0909.

This is mathematically equivalent to choosing the threshold τ=0.91\tau = 0.91 on P(repayx)P(\text{repay} | x).

Decision theory generalises this: pick the action minimising expected loss under your posterior distribution of yy.

Deployment considerations.

  1. Latency — model must respond within SLA. Real-time ads need < 10 ms; batch fraud allows hours.

  2. Throughput — handle peak QPS without timeouts. Often achieved with horizontal scaling and caching.

  3. Monitoring.

    • Input drift (PSI, KS test).
    • Prediction drift.
    • Outcome metrics (precision, recall, business KPI).
    • Alert when degraded.
  4. Calibration. Recalibrate probabilities periodically using Platt scaling or isotonic regression.

  5. Fallback / safety net. If model service fails, route to rules engine or human review — never let predictions fail silently.

  6. Auditability and explainability. SHAP / LIME explanations on individual predictions for high-stakes decisions (loan denial, medical recommendation).

  7. Versioning. Track model artifact, training data snapshot, hyperparameters. Roll back instantly if a new version misbehaves.

Continuous improvement.

  • A/B test new model versions or new thresholds.
  • Shadow mode — score with new model but act with old; compare offline.
  • Retrain on a schedule or trigger when drift detected.
  • Feedback loop — labelled outcomes (loan repaid? fraud confirmed?) become training data.

Take-away. A model's accuracy is necessary but not sufficient; the system — thresholding, costs, monitoring, fallbacks — determines whether the model delivers business value.