Box Plots, Pivot Tables and Heat Maps
Core Titles
Key headlines and terms for quick recall- Box plot — five-number summary + outliers
- Histogram — distribution of one numeric variable
- Scatter plot — relationship between two numerics
- Bar chart, Pie chart — categorical
- Pivot table — multi-dim aggregate (Excel / pandas
pivot_table) - Heatmap — colour-coded matrix
- Correlation heatmap — pairwise correlations
- Pair plot — scatter matrix of all pairs
Basic Idea
What it is, why it matters, how it worksBox plot
A graphical summary of the five-number summary: min, , median, , max — plus outliers flagged outside .
|---| outlier (•)
|--[ box ]--| whiskers + box
median
Reveals.
- Median and spread.
- Skewness (box position within whiskers).
- Outliers as individual points.
- Comparison across groups (side-by-side box plots).
Histogram
Bars of counts (or density) per bin of a numeric variable. Shape reveals modality, skewness, gaps.
Bin choice matters — too few hides structure; too many adds noise.
Pivot table
A spreadsheet-style aggregate showing one variable's summary across two grouping variables.
| Region \ Quarter | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| North | ₹2.1M | ₹2.4M | ₹2.8M | ₹3.1M |
| South | ₹1.8M | ₹2.0M | ₹2.5M | ₹2.7M |
| East | ₹1.5M | ₹1.6M | ₹1.9M | ₹2.0M |
| West | ₹2.2M | ₹2.3M | ₹2.5M | ₹2.6M |
Use. Quickly slice business metrics by multiple dimensions. Available in Excel, Google Sheets, pandas (df.pivot_table).
Heatmap
Visualise a matrix with colour: darker = higher value.
Common uses.
- Correlation heatmap — pairwise correlations between numeric features. Spot multicollinearity instantly.
- Confusion matrix — classification performance.
- Geographic — population, sales density on a map.
- Time-of-day vs day-of-week — when do users log in?
- Student vs subject performance — find weak subjects / students at a glance.
Pair plot (scatter matrix)
A grid of scatter plots for every pair of features, with histograms on the diagonal. seaborn pairplot is the standard tool.
Use. Spot pairwise linear / non-linear relationships, clusters, outliers in a multi-feature dataset.
Other essential plots
- Bar / column chart — categorical counts.
- Pie chart — share of a whole (use sparingly).
- Line plot — time series trends.
- Violin plot — box + KDE; richer distribution view.
- Q-Q plot — checks normality.
- Geographic / choropleth — spatial data.
Tools
- Python: matplotlib, seaborn, plotly.
- R: ggplot2.
- Commercial: Tableau, Power BI, Looker.
Why visualisation matters
- Humans grok pictures faster than tables.
- EDA depends heavily on plots.
- Stakeholders absorb dashboards better than statistics.
- Anomalies and trends often jump out visually before any statistic catches them.
Mind Map
Visual structure of the conceptEDA VISUALISATIONS
├── One variable
│ ├── Histogram (distribution shape)
│ ├── Box plot (5-number + outliers)
│ ├── Violin plot (box + density)
│ └── Bar (categorical counts)
├── Two variables
│ ├── Scatter (numeric–numeric)
│ ├── Grouped box (numeric–categorical)
│ └── Contingency / mosaic (cat–cat)
├── Aggregates
│ ├── Pivot table (cross-tabulation)
│ └── Heatmap (matrix colour map)
├── Many variables
│ ├── Pair plot (scatter matrix)
│ └── Correlation heatmap
└── Time / space
├── Line plot (trend)
└── Choropleth map
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. What does a box plot show? A graphical summary of a dataset's five-number summary — minimum, , median, , maximum — with outliers flagged as individual points outside .
Q2. What is a pivot table?
A spreadsheet-style aggregation that summarises one variable across two or more grouping variables — e.g., revenue by region × quarter — available in Excel, Google Sheets, and pandas df.pivot_table.
Q3. How can a heatmap help in evaluating student performance across subjects? A heatmap displays a students × subjects score matrix as a coloured grid. Cold columns reveal weak subjects across the cohort; cold rows reveal struggling students; clusters reveal patterns such as students strong in maths also doing well in physics.
Part B (20 marks)
Q. Describe the role of box plots, pivot tables and heatmaps in Exploratory Data Analysis with examples.
Box plot.
What it shows. The five-number summary visually:
- Box spans to .
- Median marked inside.
- Whiskers extend to the most extreme non-outlier points.
- Outliers plotted individually outside or .
Information revealed.
- Centre and spread.
- Skewness (box offset within whiskers).
- Outliers.
- Multiple groups side-by-side compare distributions across categories.
Example use. Plot salary grouped by department — instantly see whether sales pays more than engineering and whether anyone is an outlier.
Pivot table.
What it is. A multi-dimensional aggregate — for each combination of two grouping variables, compute a summary (sum, mean, count) of a third.
| Region \ Quarter | Q1 | Q2 | Q3 | Q4 |
|---|---|---|---|---|
| North | ₹2.1 M | ₹2.4 M | ₹2.8 M | ₹3.1 M |
| South | ₹1.8 M | ₹2.0 M | ₹2.5 M | ₹2.7 M |
| East | ₹1.5 M | ₹1.6 M | ₹1.9 M | ₹2.0 M |
| West | ₹2.2 M | ₹2.3 M | ₹2.5 M | ₹2.6 M |
Insights. Quickly spot best-performing region (North), seasonal trends (Q4 highest), weakest region (East).
pandas one-liner.
df.pivot_table(values="revenue", index="region", columns="quarter", aggfunc="sum")
Heatmap.
What it is. A 2-D colour-coded matrix where each cell's colour intensity encodes a numeric value.
Common uses.
-
Correlation heatmap — pairwise correlations between numeric features. Identifies multicollinearity at a glance — important before linear regression.
-
Confusion matrix — actual vs predicted classes. Shows where the classifier confuses categories.
-
Geographic / spatial — population density, sales by city.
-
Time × day — login activity by hour of day × day of week.
-
Student × subject performance — instantly reveals weak subjects (cold columns), struggling students (cold rows), and learning patterns (clusters of strong subjects).
Example — student performance heatmap.
| Student \ Subject | Maths | Physics | Chem | Biology | English |
|---|---|---|---|---|---|
| Alice | 92 | 88 | 85 | 78 | 70 |
| Bob | 45 | 50 | 52 | 80 | 88 |
| Carol | 78 | 82 | 75 | 70 | 65 |
| ... |
Plot as a heatmap with red ↔ blue gradient. Cold cluster in top-right (Alice's English) is visible; Bob's cold cluster in STEM is striking; teacher decides Alice needs language coaching and Bob needs maths support.
Why these tools in EDA.
- Box plot — pinpoints outliers and compares distributions.
- Pivot table — slices business metrics across dimensions.
- Heatmap — turns dense matrices into instant patterns.
Together they form the core of fast, visual EDA — the foundation on which good models rest.