PGDDSA Study · Semester 1

PGD01C02

Module 5 · Data Structures and Algorithm Design

Overview of Data Structures: Linear and Non-Linear

Core Titles

Key headlines and terms for quick recall

Data Structure (DS) — way to organise data for efficient access and modification
Linear DS — elements in sequence: Array, Linked List, Stack, Queue
Non-Linear DS — hierarchical / network: Tree, Graph, Heap, Hash Table
Operations: insert, delete, search, traverse
Time / space trade-offs
Why choice matters — same algorithm can be $O(n)$ or $O(\log n)$ depending on DS

Basic Idea

What it is, why it matters, how it works

What is a Data Structure?

A data structure is a way to organise, store and access data so that operations on it are efficient. Choosing the right DS is often more impactful than choosing the algorithm.

Linear Data Structures

Elements arranged sequentially — each element (except first / last) has exactly one predecessor and one successor.

1. Array.

Fixed-size, contiguous memory.
Access by index: $O(1)$ .
Insert / delete: $O(n)$ — shift elements.
Cache-friendly; foundation for higher structures.

2. Linked List.

Nodes scattered in memory; each holds a value and a pointer to the next.
Insert / delete at known node: $O(1)$ .
Access by index: $O(n)$ .
Variants: singly, doubly, circular.

3. Stack (LIFO).

Push / pop at the top.
Operations $O(1)$ .
Uses: function-call stack, undo, balanced parentheses, depth-first search.

4. Queue (FIFO).

Enqueue at back, dequeue at front.
Operations $O(1)$ (with circular buffer or linked list).
Uses: BFS, scheduling, IO buffering.
Variants: deque, priority queue.

Non-Linear Data Structures

Elements arranged hierarchically or as networks — relationships are not strictly sequential.

1. Tree.

Hierarchical; root + children.
Binary tree — at most 2 children per node.
Binary Search Tree (BST) — left < root < right.
Balanced BSTs — AVL, Red-Black; guarantee $O(\log n)$ operations.
Heap — complete binary tree with heap property (parent ≤ children for min-heap); priority queues.
B-Tree, B+ Tree — multi-way; backbone of databases / file systems.
Trie — prefix tree for strings; autocomplete, IP routing.

2. Graph.

Vertices + edges (directed / undirected, weighted / unweighted).
Representations: adjacency matrix ( $O(V^2)$ space), adjacency list ( $O(V+E)$ ).
Backbone of networks, knowledge graphs, dependency analysis.

3. Hash Table.

Key → value via hash function.
Average $O(1)$ lookup; $O(n)$ worst-case.
Collision handling: chaining, open addressing.
Foundation of Python dicts, Java HashMap.

Operations to think about

Insert, delete, search, traverse, sort.
Each DS optimises a subset of these.

Time complexity cheat sheet

DS	Access	Search	Insert	Delete
Array	$O(1)$	$O(n)$	$O(n)$	$O(n)$
Sorted Array	$O(1)$	$O(\log n)$	$O(n)$	$O(n)$
Linked List	$O(n)$	$O(n)$	$O(1)$ *	$O(1)$ *
Stack	$O(n)$	$O(n)$	$O(1)$	$O(1)$
Queue	$O(n)$	$O(n)$	$O(1)$	$O(1)$
Hash Table	–	$O(1)$ avg	$O(1)$ avg	$O(1)$ avg
BST (balanced)	–	$O(\log n)$	$O(\log n)$	$O(\log n)$
Heap	–	$O(n)$	$O(\log n)$	$O(\log n)$

* assuming pointer to the node.

Why this matters in Data Science

Pandas DataFrame is built on arrays + hash tables.
Graph algorithms power knowledge graphs and social networks.
Trees power decision-tree learners and indexes.
Heap = backbone of A*, Dijkstra and priority-based ML schedulers.

The same operation can be $O(n)$ or $O(\log n)$ depending on DS — choose wisely.

Mind Map

Visual structure of the concept

DATA STRUCTURES
├── LINEAR
│   ├── Array (contiguous, O(1) index)
│   ├── Linked List (pointers)
│   ├── Stack (LIFO)  — DFS, undo
│   └── Queue (FIFO)  — BFS, scheduling
└── NON-LINEAR
    ├── Tree
    │   ├── Binary tree, BST
    │   ├── Balanced (AVL, RB)
    │   ├── Heap (priority queue)
    │   ├── B-tree / B+ tree (DB)
    │   └── Trie (string prefix)
    ├── Graph
    │   ├── Adj matrix vs list
    │   └── Networks
    └── Hash Table
        ├── O(1) avg
        └── Chaining / open address

Exam Q&A

Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define a linear data structure with one example. A data structure in which elements are arranged sequentially — each (except first/last) has exactly one predecessor and one successor. Example: an array, a fixed-size sequence in contiguous memory with $O(1)$ index access.

Q2. Give an example of a non-linear data structure and one use. A tree — hierarchical structure with a root and children. Used in databases (B-tree index), file systems, decision-tree machine-learning models.

Q3. What is the difference between a stack and a queue?

Stack — LIFO (Last-In-First-Out); operations at the top.
Queue — FIFO (First-In-First-Out); enqueue at back, dequeue at front.

Part B (20 marks)

Q. Give an overview of linear and non-linear data structures. Discuss the main types of each, their operations and time complexities, with examples and use-cases.

Linear Data Structures.

Elements arranged sequentially; each has one predecessor / successor.

1. Array.

Fixed-size, contiguous memory.
Index access $O(1)$ .
Insert / delete $O(n)$ (shift).
Use: storing scores, numerical computation (NumPy arrays).

2. Linked List.

Nodes scattered, connected by pointers.
Insert / delete at known node $O(1)$ .
Random access $O(n)$ .
Variants: singly, doubly, circular.
Use: dynamic data with frequent insertions/deletions; LRU caches.

3. Stack (LIFO).

Push / pop at top, both $O(1)$ .
Use: function-call stack, undo/redo, balanced parentheses, DFS.

4. Queue (FIFO).

Enqueue at back, dequeue at front, both $O(1)$ .
Use: BFS, scheduling, IO buffers.
Variants: deque (double-ended), priority queue.

Non-Linear Data Structures.

Elements form a hierarchy or network.

1. Tree. Root + child nodes.

Binary Search Tree (BST) — $O(\log n)$ search/insert if balanced.
AVL / Red-Black trees — self-balancing.
Heap — complete binary tree; root is min/max; priority queues.
B-Tree / B+ Tree — multi-way; backbone of databases (MySQL InnoDB index).
Trie — prefix tree for strings; autocomplete, IP routing.

2. Graph. Vertices and edges.

Represented by adjacency matrix ( $O(V^2)$ ) or adjacency list ( $O(V + E)$ ).
Use: social networks, knowledge graphs, road networks, dependency graphs.

3. Hash Table. Key → value via hash function.

Average $O(1)$ lookup; worst $O(n)$ .
Collision handling: chaining, open addressing.
Use: Python dict, Java HashMap, caches.

Time complexity cheat sheet.

DS	Access	Search	Insert	Delete
Array	$O(1)$	$O(n)$	$O(n)$	$O(n)$
Sorted Array	$O(1)$	$O(\log n)$	$O(n)$	$O(n)$
Linked List	$O(n)$	$O(n)$	$O(1)$	$O(1)$
Stack/Queue	$O(n)$	$O(n)$	$O(1)$	$O(1)$
Hash Table	–	$O(1)$ avg	$O(1)$ avg	$O(1)$ avg
Balanced BST	–	$O(\log n)$	$O(\log n)$	$O(\log n)$
Heap	–	$O(n)$	$O(\log n)$	$O(\log n)$

Choosing a DS — examples.

Need	Best DS
Fast random access	Array
Frequent insert/delete in the middle	Linked list
LIFO history	Stack
FIFO queue	Queue
Fast key lookup	Hash table
Sorted with fast range queries	Balanced BST
Always extract min/max	Heap
Word autocomplete	Trie
Networks of relationships	Graph

Why it matters in Data Science.

pandas DataFrame uses arrays + hash tables under the hood.
Graph algorithms power knowledge graphs and recommenders.
Heaps drive priority-based ML training schedulers.
B+ Trees power SQL indexes that speed feature lookups.

Picking the right data structure can convert an $O(n^2)$ problem into $O(n \log n)$ — often the difference between a working ML pipeline and one that times out.

Overfitting, Underfitting and Model Selection Algorithm Analysis: Time and Space Complexity