What is Data Science and its Evolution
Core Titles
Key headlines and terms for quick recall- Data Science — interdisciplinary field for extracting knowledge from data
- Pillars: Statistics + Computer Science + Mathematics + Domain knowledge
- Evolution: Statistics → BI → Data Mining → Big Data → Data Science → AI/ML
- Drivers: data abundance, cheap compute, modern algorithms, business need
- Related disciplines: Statistics, Machine Learning, Big Data Analytics, AI
- Fourth Industrial Revolution — data-driven decision making
Basic Idea
What it is, why it matters, how it worksWhat is Data Science?
Data Science is the interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply that knowledge across a broad range of application areas.
It combines:
- Mathematics & Statistics — for modelling, inference, uncertainty.
- Computer Science — for algorithms, data structures, scalable systems.
- Domain knowledge — to ask the right questions and interpret results.
A data scientist's goal: turn raw data into actionable decisions.
Evolution — how we got here
| Era | What | Drivers |
|---|---|---|
| 1960s–80s — Classical statistics | Hypothesis testing, regression on small samples | Manual data, academic use |
| 1980s–90s — Business Intelligence (BI) | Dashboards, OLAP, reporting from databases | Relational DBs, decision support |
| 1990s–2000s — Data Mining / KDD | Pattern discovery from large databases | Cheap storage, retail/credit data |
| 2000s — Web & Big Data | Hadoop, MapReduce, NoSQL | Web scale, social media |
| 2010s — Data Science | Combined stats, code, viz, ML | Cheap compute, open-source (R, Python), Kaggle |
| 2015 → — AI / Deep Learning | GPUs, neural nets, foundation models | TensorFlow / PyTorch, GPT-style LLMs |
| 2020 → — MLOps & Generative AI | Production ML pipelines, LLMs | Cloud, transformer revolution |
Why now — what made data science explode
- Data abundance — IoT, mobile, web logs, sensors produce data at zettabyte scale.
- Compute — multi-core CPUs, GPUs, cloud (AWS/Azure/GCP) became cheap.
- Algorithms — open-source ML (scikit-learn, TensorFlow, PyTorch) democratised research.
- Storage — distributed file systems (HDFS, S3) let us keep everything.
- Business pressure — competition demands data-driven decisions.
Data Science vs related fields
- Data Analytics — descriptive: what happened? (subset of DS)
- Machine Learning — algorithms that learn from data (tool of DS)
- Big Data — handling volume, velocity, variety (infrastructure for DS)
- AI — broader goal: systems that think/act intelligently (DS feeds AI)
Why it matters
DS is now the engine behind every modern product — Google search, Netflix recommendations, fraud detection at banks, MRI diagnosis at hospitals, route optimisation at delivery companies, dynamic pricing at airlines.
Mind Map
Visual structure of the conceptDATA SCIENCE
├── Definition
│ └── Extract knowledge from data
├── Pillars
│ ├── Mathematics & Statistics
│ ├── Computer Science / programming
│ └── Domain expertise
├── Evolution timeline
│ ├── Classical Stats (1960s)
│ ├── BI & OLAP (1980s)
│ ├── Data Mining / KDD (1990s)
│ ├── Big Data & Hadoop (2000s)
│ ├── Modern Data Science (2010s)
│ └── AI / LLM era (2020s)
├── Why now
│ ├── Data abundance
│ ├── Cheap compute (GPU, cloud)
│ ├── Open-source ML
│ └── Business pressure
└── Related fields
├── ML — algorithms
├── Big Data — infrastructure
├── AI — broader goal
└── Analytics — descriptive subset
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Define Data Science. The interdisciplinary field that uses scientific methods, processes and algorithms to extract knowledge and insights from structured and unstructured data, combining statistics, computer science and domain expertise.
Q2. List the three core pillars of Data Science. Mathematics & Statistics, Computer Science, and Domain Knowledge.
Q3. How does Data Science differ from Machine Learning? Data Science is the broader field of extracting insight from data (which includes data wrangling, visualization, statistics and ML); Machine Learning is one technique inside data science — algorithms that learn patterns from data.
Part B (20 marks)
Q. Trace the evolution of Data Science. Explain the factors that have driven its rapid growth in recent years and discuss its key pillars with examples.
Evolution.
| Era | Key development |
|---|---|
| 1960s–80s | Classical statistics on small samples (regression, ANOVA). |
| 1980s | Business Intelligence — dashboards, OLAP, reporting from RDBMS. |
| 1990s | Data Mining / KDD — pattern discovery from large databases (Apriori, decision trees, clustering). |
| 2000s | Big Data — Hadoop (2006), MapReduce, NoSQL, distributed storage. Web 2.0 generated unprecedented data. |
| 2010s | Data Science formalised — Hal Varian's "sexy job of the next decade." R / Python ecosystems (pandas, scikit-learn), Kaggle competitions, the role of "Data Scientist." |
| 2015 → | Deep learning revolution — AlexNet (2012) made deep CNNs mainstream; TensorFlow (2015), PyTorch (2016). |
| 2020 → | MLOps, LLMs, foundation models (BERT, GPT, Stable Diffusion). |
Drivers of rapid growth.
- Data abundance. The world produces ~2.5 quintillion bytes daily — IoT sensors, social media, mobile devices, transactions.
- Cheap compute. GPUs (originally for graphics) revolutionised neural-network training; cloud computing made on-demand scaling affordable.
- Storage — distributed file systems (HDFS, S3, Azure Blob) made it possible to keep "all" data for later analysis.
- Open-source algorithms. scikit-learn, TensorFlow, PyTorch, Hugging Face freed cutting-edge methods from academia.
- Business need. Competitive markets demand data-driven decisions — Amazon's pricing, Uber's routing, Netflix's recommendations are existential, not optional.
- Talent and education. PG programs, MOOCs, bootcamps, communities (Kaggle, Stack Overflow).
Three pillars with examples.
- Mathematics & Statistics. Probability, linear algebra, calculus, hypothesis testing power every ML algorithm. Example: a Naive Bayes spam classifier uses Bayes' theorem; PCA uses eigenvectors of the covariance matrix.
- Computer Science. Algorithms, data structures, databases, distributed systems, programming. Example: a recommendation system at Netflix uses MapReduce-style distributed ML on Spark, with low-latency online prediction in Cassandra.
- Domain knowledge. Without understanding the field — medicine, finance, retail — a data scientist can build statistically valid but practically useless models. Example: a fraud-detection model must understand legitimate transaction patterns (online vs in-store, holiday seasonality) to set meaningful thresholds.
Why all three? A pure statistician might miss algorithmic efficiency. A pure programmer might overfit. A pure domain expert lacks the modelling toolkit. The data scientist sits at the intersection — which is why the role is uniquely valuable and difficult to fill.