PGD01C02
Module 5 · Data Structures and Algorithm Design

Overview of Data Structures: Linear and Non-Linear

Core Titles
Key headlines and terms for quick recall
  • Data Structure (DS) — way to organise data for efficient access and modification
  • Linear DS — elements in sequence: Array, Linked List, Stack, Queue
  • Non-Linear DS — hierarchical / network: Tree, Graph, Heap, Hash Table
  • Operations: insert, delete, search, traverse
  • Time / space trade-offs
  • Why choice matters — same algorithm can be O(n)O(n) or O(logn)O(\log n) depending on DS
Basic Idea
What it is, why it matters, how it works

What is a Data Structure?

A data structure is a way to organise, store and access data so that operations on it are efficient. Choosing the right DS is often more impactful than choosing the algorithm.

Linear Data Structures

Elements arranged sequentially — each element (except first / last) has exactly one predecessor and one successor.

1. Array.

  • Fixed-size, contiguous memory.
  • Access by index: O(1)O(1).
  • Insert / delete: O(n)O(n) — shift elements.
  • Cache-friendly; foundation for higher structures.

2. Linked List.

  • Nodes scattered in memory; each holds a value and a pointer to the next.
  • Insert / delete at known node: O(1)O(1).
  • Access by index: O(n)O(n).
  • Variants: singly, doubly, circular.

3. Stack (LIFO).

  • Push / pop at the top.
  • Operations O(1)O(1).
  • Uses: function-call stack, undo, balanced parentheses, depth-first search.

4. Queue (FIFO).

  • Enqueue at back, dequeue at front.
  • Operations O(1)O(1) (with circular buffer or linked list).
  • Uses: BFS, scheduling, IO buffering.
  • Variants: deque, priority queue.

Non-Linear Data Structures

Elements arranged hierarchically or as networks — relationships are not strictly sequential.

1. Tree.

  • Hierarchical; root + children.
  • Binary tree — at most 2 children per node.
  • Binary Search Tree (BST) — left < root < right.
  • Balanced BSTs — AVL, Red-Black; guarantee O(logn)O(\log n) operations.
  • Heap — complete binary tree with heap property (parent ≤ children for min-heap); priority queues.
  • B-Tree, B+ Tree — multi-way; backbone of databases / file systems.
  • Trie — prefix tree for strings; autocomplete, IP routing.

2. Graph.

  • Vertices + edges (directed / undirected, weighted / unweighted).
  • Representations: adjacency matrix (O(V2)O(V^2) space), adjacency list (O(V+E)O(V+E)).
  • Backbone of networks, knowledge graphs, dependency analysis.

3. Hash Table.

  • Key → value via hash function.
  • Average O(1)O(1) lookup; O(n)O(n) worst-case.
  • Collision handling: chaining, open addressing.
  • Foundation of Python dicts, Java HashMap.

Operations to think about

  • Insert, delete, search, traverse, sort.
  • Each DS optimises a subset of these.

Time complexity cheat sheet

DSAccessSearchInsertDelete
ArrayO(1)O(1)O(n)O(n)O(n)O(n)O(n)O(n)
Sorted ArrayO(1)O(1)O(logn)O(\log n)O(n)O(n)O(n)O(n)
Linked ListO(n)O(n)O(n)O(n)O(1)O(1)*O(1)O(1)*
StackO(n)O(n)O(n)O(n)O(1)O(1)O(1)O(1)
QueueO(n)O(n)O(n)O(n)O(1)O(1)O(1)O(1)
Hash TableO(1)O(1) avgO(1)O(1) avgO(1)O(1) avg
BST (balanced)O(logn)O(\log n)O(logn)O(\log n)O(logn)O(\log n)
HeapO(n)O(n)O(logn)O(\log n)O(logn)O(\log n)

* assuming pointer to the node.

Why this matters in Data Science

  • Pandas DataFrame is built on arrays + hash tables.
  • Graph algorithms power knowledge graphs and social networks.
  • Trees power decision-tree learners and indexes.
  • Heap = backbone of A*, Dijkstra and priority-based ML schedulers.

The same operation can be O(n)O(n) or O(logn)O(\log n) depending on DS — choose wisely.

Mind Map
Visual structure of the concept
DATA STRUCTURES
├── LINEAR
│   ├── Array (contiguous, O(1) index)
│   ├── Linked List (pointers)
│   ├── Stack (LIFO)  — DFS, undo
│   └── Queue (FIFO)  — BFS, scheduling
└── NON-LINEAR
    ├── Tree
    │   ├── Binary tree, BST
    │   ├── Balanced (AVL, RB)
    │   ├── Heap (priority queue)
    │   ├── B-tree / B+ tree (DB)
    │   └── Trie (string prefix)
    ├── Graph
    │   ├── Adj matrix vs list
    │   └── Networks
    └── Hash Table
        ├── O(1) avg
        └── Chaining / open address
Exam Q&A
Part A (2 marks) and Part B (20 marks) style questions

Part A (2 marks each)

Q1. Define a linear data structure with one example. A data structure in which elements are arranged sequentially — each (except first/last) has exactly one predecessor and one successor. Example: an array, a fixed-size sequence in contiguous memory with O(1)O(1) index access.

Q2. Give an example of a non-linear data structure and one use. A tree — hierarchical structure with a root and children. Used in databases (B-tree index), file systems, decision-tree machine-learning models.

Q3. What is the difference between a stack and a queue?

  • Stack — LIFO (Last-In-First-Out); operations at the top.
  • Queue — FIFO (First-In-First-Out); enqueue at back, dequeue at front.

Part B (20 marks)

Q. Give an overview of linear and non-linear data structures. Discuss the main types of each, their operations and time complexities, with examples and use-cases.

Linear Data Structures.

Elements arranged sequentially; each has one predecessor / successor.

1. Array.

  • Fixed-size, contiguous memory.
  • Index access O(1)O(1).
  • Insert / delete O(n)O(n) (shift).
  • Use: storing scores, numerical computation (NumPy arrays).

2. Linked List.

  • Nodes scattered, connected by pointers.
  • Insert / delete at known node O(1)O(1).
  • Random access O(n)O(n).
  • Variants: singly, doubly, circular.
  • Use: dynamic data with frequent insertions/deletions; LRU caches.

3. Stack (LIFO).

  • Push / pop at top, both O(1)O(1).
  • Use: function-call stack, undo/redo, balanced parentheses, DFS.

4. Queue (FIFO).

  • Enqueue at back, dequeue at front, both O(1)O(1).
  • Use: BFS, scheduling, IO buffers.
  • Variants: deque (double-ended), priority queue.

Non-Linear Data Structures.

Elements form a hierarchy or network.

1. Tree. Root + child nodes.

  • Binary Search Tree (BST)O(logn)O(\log n) search/insert if balanced.
  • AVL / Red-Black trees — self-balancing.
  • Heap — complete binary tree; root is min/max; priority queues.
  • B-Tree / B+ Tree — multi-way; backbone of databases (MySQL InnoDB index).
  • Trie — prefix tree for strings; autocomplete, IP routing.

2. Graph. Vertices and edges.

  • Represented by adjacency matrix (O(V2)O(V^2)) or adjacency list (O(V+E)O(V + E)).
  • Use: social networks, knowledge graphs, road networks, dependency graphs.

3. Hash Table. Key → value via hash function.

  • Average O(1)O(1) lookup; worst O(n)O(n).
  • Collision handling: chaining, open addressing.
  • Use: Python dict, Java HashMap, caches.

Time complexity cheat sheet.

DSAccessSearchInsertDelete
ArrayO(1)O(1)O(n)O(n)O(n)O(n)O(n)O(n)
Sorted ArrayO(1)O(1)O(logn)O(\log n)O(n)O(n)O(n)O(n)
Linked ListO(n)O(n)O(n)O(n)O(1)O(1)O(1)O(1)
Stack/QueueO(n)O(n)O(n)O(n)O(1)O(1)O(1)O(1)
Hash TableO(1)O(1) avgO(1)O(1) avgO(1)O(1) avg
Balanced BSTO(logn)O(\log n)O(logn)O(\log n)O(logn)O(\log n)
HeapO(n)O(n)O(logn)O(\log n)O(logn)O(\log n)

Choosing a DS — examples.

NeedBest DS
Fast random accessArray
Frequent insert/delete in the middleLinked list
LIFO historyStack
FIFO queueQueue
Fast key lookupHash table
Sorted with fast range queriesBalanced BST
Always extract min/maxHeap
Word autocompleteTrie
Networks of relationshipsGraph

Why it matters in Data Science.

  • pandas DataFrame uses arrays + hash tables under the hood.
  • Graph algorithms power knowledge graphs and recommenders.
  • Heaps drive priority-based ML training schedulers.
  • B+ Trees power SQL indexes that speed feature lookups.

Picking the right data structure can convert an O(n2)O(n^2) problem into O(nlogn)O(n \log n) — often the difference between a working ML pipeline and one that times out.