Applications of Data Science
Core Titles
Key headlines and terms for quick recall- Healthcare — disease prediction, drug discovery, imaging
- Finance — fraud detection, credit scoring, algorithmic trading
- Retail / E-commerce — recommendation, demand forecasting, dynamic pricing
- Manufacturing — predictive maintenance, quality control
- Marketing — customer segmentation, churn prediction, A/B testing
- Transportation — route optimisation, autonomous vehicles
- Education — personalised learning, dropout prediction
- Agriculture — yield prediction, precision farming, disease detection
- Sports — player analytics, game strategy
- Public sector — smart cities, policy analytics, crime mapping
Basic Idea
What it is, why it matters, how it worksWhy data science is everywhere
Almost every industry now generates data — and competition forces extracting value from it. Below are the major application domains with concrete examples.
1. Healthcare
- Disease prediction / diagnosis — CNNs detect pneumonia from chest X-rays at radiologist-level accuracy (CheXNet, Stanford).
- Personalised medicine — genomic data + clinical history to choose drug & dose.
- Drug discovery — graph neural networks screen billions of molecules; AlphaFold predicts protein structures.
- Hospital operations — ICU early-warning models, readmission prediction, optimised staffing.
2. Finance
- Fraud detection — XGBoost models score every transaction in real-time at Visa, PayPal.
- Credit scoring — FICO and successors use ML on credit-history data.
- Algorithmic trading — high-frequency trading firms exploit microsecond patterns.
- Insurance — risk pricing, claim fraud detection.
3. Retail / E-commerce
- Recommendation systems — Amazon's "Customers who bought this also bought" drives ~35% of sales; Netflix recommender saves ~$1 B/year in churn.
- Demand forecasting — Walmart uses ML across 4,700 stores.
- Dynamic pricing — airlines, ride-hailing.
- Inventory optimisation — minimise stockouts and overstocks.
4. Manufacturing & Industry 4.0
- Predictive maintenance — sensor data on motors / turbines predicts failure before it happens (GE, Siemens).
- Quality control — computer vision detects defects on assembly lines.
- Supply-chain optimisation — demand → inventory → logistics chained ML.
5. Marketing & Customer
- Segmentation — k-means clustering on behaviour creates targetable groups.
- Churn prediction — telecom and SaaS predict who'll leave.
- A/B testing — Booking.com runs >1000 A/B tests per year.
- Customer Lifetime Value (CLV) modelling.
6. Transportation
- Route optimisation — Google Maps, Uber, FedEx.
- Autonomous vehicles — Tesla, Waymo use CNNs + sensor fusion.
- Flight delay prediction, traffic-flow modelling.
7. Education
- Personalised learning paths — Duolingo, Khan Academy adapt to learner skill.
- Dropout prediction to intervene early.
- Automated grading for essays and short answers.
8. Agriculture
- Yield prediction from satellite imagery + weather data.
- Disease detection — CNNs spot crop disease from smartphone photos.
- Precision farming — drones + sensors tell farmers exactly where to water / fertilise.
9. Sports
- Player analytics — Moneyball (baseball), expected goals (xG) in football.
- Wearable tracking for performance and injury prevention.
- In-game strategy — basketball shot selection, tennis serve placement.
10. Government / Public Sector
- Smart cities — traffic, energy, waste optimisation.
- Public health surveillance (e.g., COVID dashboards).
- Crime mapping for police resource allocation (with fairness concerns).
- Policy analytics — evaluating program impact.
Cross-cutting impact
Across all sectors, data science converts subjective judgment into measurable, optimisable decisions — increasing efficiency, personalisation and safety.
Risks and responsibilities
- Bias and fairness — biased data → biased models (e.g., credit scoring).
- Privacy — GDPR, HIPAA constrain data use.
- Explainability — high-stakes decisions need interpretable models.
- Accountability — humans must remain in the loop for critical decisions.
Mind Map
Visual structure of the conceptDATA SCIENCE APPLICATIONS
├── Healthcare
│ ├── Disease prediction (CheXNet)
│ ├── Drug discovery (AlphaFold)
│ └── Hospital ops
├── Finance
│ ├── Fraud detection
│ ├── Credit scoring
│ └── Algorithmic trading
├── Retail / E-commerce
│ ├── Recommenders
│ ├── Demand forecasting
│ └── Dynamic pricing
├── Manufacturing
│ ├── Predictive maintenance
│ └── Visual QC
├── Marketing
│ ├── Segmentation
│ ├── Churn prediction
│ └── A/B testing
├── Transportation
│ ├── Route optimisation
│ └── Autonomous vehicles
├── Education
│ ├── Personalised learning
│ └── Dropout prediction
├── Agriculture
│ ├── Yield prediction
│ └── Disease detection
├── Sports analytics
└── Public sector / smart cities
Cross-cutting concerns: bias, privacy, explainability
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Give two applications of data science in healthcare.
- Disease prediction / diagnosis — CNNs analyse medical images to detect conditions like diabetic retinopathy or pneumonia.
- Personalised treatment — combining genomic + clinical data to tailor drug choice and dose.
Q2. Give two applications of data science in retail.
- Recommendation systems — collaborative filtering personalises product suggestions (Amazon, Netflix).
- Demand forecasting — ML models predict per-store, per-product demand to optimise inventory.
Q3. How is data science used in agriculture? For yield prediction from rainfall, temperature and satellite imagery; for crop-disease detection via CNN-based image classification on smartphone photos; and for precision farming using IoT sensors to guide water and fertiliser use.
Part B (20 marks)
Q. Discuss applications of data science across diverse fields. Provide at least two specific examples each in healthcare, finance, retail and manufacturing. What ethical concerns must be considered?
Healthcare.
-
Disease detection from medical images. CNNs trained on chest X-rays detect pneumonia at radiologist-level accuracy (CheXNet, Stanford 2017). Google Health's diabetic-retinopathy model is deployed in India and Thailand to screen patients where ophthalmologists are scarce.
-
Personalised / precision medicine. Combining genomic and clinical data, IBM Watson for Oncology recommends cancer treatments by matching tumour profiles to similar historical cases and clinical trials. Pharmacogenomics adjusts drug dosage based on a patient's genetic profile.
(Other uses: hospital re-admission prediction, drug discovery via AlphaFold-style protein modelling, ICU early-warning systems.)
Finance.
-
Fraud detection. Visa screens 100+ B transactions/year using ML — XGBoost or deep models flag anomalies in real time. False-negative cost = stolen money; false-positive cost = customer friction.
-
Credit scoring. FICO and successor models use logistic regression / gradient boosting on credit history to set lending limits. Newer fintechs (LenddoEFL, Upstart) include alternate data (mobile usage, transaction history) for unbanked customers.
(Other uses: algorithmic trading, insurance pricing, anti-money-laundering.)
Retail / E-commerce.
-
Recommendation systems. Amazon's "Customers who bought this also bought" engine generates an estimated 35% of total revenue. Netflix recommender saves ~$1 B/year in churn reduction.
-
Demand forecasting and inventory optimisation. Walmart uses ML demand forecasts across 4,700 stores to plan inventory; Zara optimises restock cycles using twice-weekly sales feeds. Reduces both stock-outs (lost revenue) and over-stocks (cash tied up).
(Other uses: dynamic pricing, churn prediction, market-basket analysis, fraud detection on returns/refunds.)
Manufacturing.
-
Predictive maintenance. Sensor data (vibration, temperature, current) on motors and turbines feeds ML models that predict failure days/weeks before it happens. GE Predix and Siemens MindSphere deploy this at industrial scale — cutting downtime by 30–50%.
-
Visual quality control. Computer-vision systems on assembly lines (e.g., Cognex, Landing AI) detect microscopic defects in semiconductors, automotive parts and pharmaceuticals at speeds and consistency impossible for humans.
(Other uses: supply-chain optimisation, energy management, robotics path-planning.)
Ethical concerns — cross-cutting.
- Bias and fairness. Models trained on historical data inherit its biases — biased credit scoring, biased predictive policing, gender-biased hiring algorithms.
- Privacy. GDPR, HIPAA, CCPA restrict what personal data can be collected/used. Data leaks can ruin trust.
- Explainability. High-stakes decisions (lending, medical, judicial) demand interpretable models — black-box deep learning may not be acceptable.
- Accountability. Who is responsible when a model harms someone? Companies must keep humans in the loop for irreversible decisions.
- Security. Adversarial attacks can fool image classifiers; data poisoning can corrupt training.
- Environmental impact. Training huge LLMs consumes megawatt-hours; sustainability matters.
Responsible practice. Document datasets (Datasheets for Datasets), models (Model Cards), monitor for drift and bias, and submit high-impact applications to ethics review.