Data Science Roles
Core Titles
Key headlines and terms for quick recall- Data Analyst — reporting, dashboards, descriptive insight
- Data Scientist — modelling, predictions, statistical inference
- Data Engineer — pipelines, infrastructure, ETL/ELT
- Machine Learning Engineer — production ML systems
- Business / BI Analyst — translating data to business KPIs
- Data Architect — schema, governance, system design
- Research Scientist — novel algorithms and publications
- MLOps Engineer — deployment, monitoring, CI/CD for models
Basic Idea
What it is, why it matters, how it worksWhy so many roles?
Data work spans the entire pipeline from raw bytes to business decisions. Each phase requires different skills and tools, so the data team is multi-disciplinary.
Major roles
1. Data Analyst. Focus on descriptive insight — what happened and why.
- Skills: SQL, Excel, Tableau / Power BI, basic statistics.
- Tasks: dashboards, ad-hoc reports, KPI tracking, A/B test analysis.
- Example: analyses last-quarter sales and tells marketing which regions underperformed.
2. Data Scientist. Focus on predictive / prescriptive modelling.
- Skills: Python / R, statistics, ML, experimentation.
- Tasks: build models (regression, classification, clustering), design experiments, explain results to stakeholders.
- Example: predicts customer churn probability with logistic regression and recommends retention strategies.
3. Data Engineer. Builds the plumbing that delivers clean data to the rest of the team.
- Skills: SQL, Python/Java/Scala, distributed systems (Spark, Kafka), cloud (AWS / GCP / Azure), data warehousing (Snowflake, BigQuery).
- Tasks: design and operate ETL/ELT pipelines, schema management, data quality.
- Example: moves transactional data from MySQL into a Snowflake warehouse every hour for analysts to query.
4. Machine Learning Engineer. Bridges research and production.
- Skills: software engineering + ML; PyTorch / TensorFlow, Docker, Kubernetes, model serving.
- Tasks: turn a notebook prototype into a scalable, low-latency, monitored ML service.
- Example: deploys a fraud-detection model to score 1000 transactions/sec in under 50 ms.
5. Business / BI Analyst. Sits between business and data — converts strategic questions into data questions.
- Skills: domain expertise, SQL, Tableau / Power BI, storytelling.
- Tasks: requirements gathering, building scorecards, presenting to leadership.
6. Data Architect. Designs the enterprise data platform — what databases / warehouses / lakes exist, how they connect, who governs what.
- Skills: data modelling, master data management, governance, security.
7. Research Scientist. Pushes the frontier — invents new algorithms, publishes papers.
- Skills: deep mathematical foundation, deep learning, academic background.
- Tasks: fundamental research at orgs like Google Brain, OpenAI, Microsoft Research.
8. MLOps Engineer. Treats ML models as products — versioning, CI/CD, monitoring, drift detection.
- Skills: DevOps + ML; MLflow, Kubeflow, Airflow, observability tooling.
Who does what — comparison
| Role | Primary output | Primary tools | Focus |
|---|---|---|---|
| Data Analyst | Reports, dashboards | SQL, Tableau | Descriptive |
| Data Scientist | Models, insights | Python, scikit-learn | Predictive |
| Data Engineer | Pipelines, storage | Spark, SQL, cloud | Data infrastructure |
| ML Engineer | Production models | PyTorch, Docker, K8s | Scaling, serving |
| Business Analyst | KPI reports, asks | Excel, SQL, BI tools | Business translation |
| Data Architect | Data models, governance | RDBMS, lakes, MDM | Platform |
| Research Scientist | Papers, new algorithms | Math, DL, simulation | Frontier |
| MLOps Engineer | Deployment, monitoring | MLflow, Airflow | Reliability |
Modern reality
In a small startup one person plays all roles. In large companies (Google, Amazon, banks) the roles are sharply separated. Job titles still vary — what one company calls "Data Scientist" another calls "ML Engineer."
Mind Map
Visual structure of the conceptDATA SCIENCE ROLES
├── DATA ANALYST
│ ├── SQL, Excel, BI tools
│ └── Reporting, dashboards
├── DATA SCIENTIST
│ ├── Python, R, statistics
│ ├── ML modelling, A/B tests
│ └── Insight → recommendations
├── DATA ENGINEER
│ ├── ETL / ELT, Spark, Kafka
│ └── Pipelines + warehouses
├── ML ENGINEER
│ ├── PyTorch, TF, Docker, K8s
│ └── Production model serving
├── BUSINESS / BI ANALYST
│ ├── Translation: biz ↔ data
│ └── KPIs, exec dashboards
├── DATA ARCHITECT
│ └── Platform, schema, governance
├── RESEARCH SCIENTIST
│ └── New algorithms, papers
└── MLOPS ENGINEER
└── CI/CD, monitoring, drift
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questionsPart A (2 marks each)
Q1. Differentiate between Data Analyst and Data Scientist. A Data Analyst focuses on descriptive analytics — dashboards, KPIs, ad-hoc questions, using SQL and BI tools. A Data Scientist focuses on predictive / prescriptive analytics — building statistical and ML models, designing experiments, using Python / R.
Q2. What does a Data Engineer do? Builds and maintains the data infrastructure — ETL/ELT pipelines, data warehouses and lakes — that delivers clean, reliable data to analysts and scientists. Uses tools like SQL, Spark, Kafka, Airflow and cloud services.
Q3. Define MLOps. MLOps = ML + DevOps. It is the set of practices for deploying, monitoring, versioning and retraining machine-learning models in production reliably and at scale.
Part B (20 marks)
Q. Describe the major roles in a modern data science team. Compare their responsibilities, required skills and typical tools. How does a Data Analyst contribute to a company's decision-making?
Major roles.
- Data Analyst — descriptive insight, dashboards, reports.
- Data Scientist — predictive models, experiments, recommendations.
- Data Engineer — pipelines, ETL/ELT, warehouses.
- Machine Learning Engineer — production ML systems.
- Business / BI Analyst — converts business questions to data questions.
- Data Architect — platform design and governance.
- Research Scientist — novel algorithms, often academic publishing.
- MLOps Engineer — CI/CD, monitoring, model lifecycle.
Comparison.
| Role | Responsibility | Tools | Skills |
|---|---|---|---|
| Data Analyst | Reports, dashboards, A/B analysis | SQL, Excel, Tableau, Power BI | Statistics, visualisation |
| Data Scientist | Modelling, experimentation | Python, R, scikit-learn, Jupyter | Stats, ML, communication |
| Data Engineer | Data pipelines & infra | Spark, Kafka, Snowflake, Airflow | Distributed systems, SQL |
| ML Engineer | Productionise & scale models | PyTorch, TF, Docker, K8s | SWE + ML |
| Business Analyst | Translate biz ↔ data | Excel, SQL, BI dashboards | Domain knowledge |
| Data Architect | Platform design | RDBMS, data lakes, MDM | Data modelling |
| Research Scientist | New algorithms | Math, simulation, deep learning | PhD-level research |
| MLOps Engineer | Reliability, deployment | MLflow, Kubeflow, monitoring | DevOps + ML |
Data Analyst's contribution to decision-making.
1. Data collection and integration. Pull data from CRM, ERP, web analytics, APIs into a single warehouse for cross-functional view.
2. Data cleaning. Handle missing values, outliers, deduplication so reports reflect reality.
3. Exploratory Data Analysis (EDA). Compute descriptive stats, plot distributions and correlations to surface trends and anomalies.
4. Dashboard & report building. Translate metrics into Tableau / Power BI dashboards so leadership can monitor KPIs in real time.
5. KPI definition and monitoring. Define quantitative targets (revenue, churn, conversion) and track deviation; trigger alerts.
6. Ad-hoc business questions. Answer specific stakeholder queries (e.g., "Which products caused last month's revenue dip?") with quick analyses.
7. A/B test analysis. Help product / marketing decide whether a tested change had a real impact.
8. Storytelling & recommendation. Translate numbers into clear narratives + action items in slide decks and presentations.
Impact. A capable data analyst converts gut decisions into data-driven decisions across the company — pricing, marketing budgets, hiring plans, supply-chain re-routing — grounded in evidence rather than intuition. They are the bridge between raw data and the people making choices.