Data Structures and Algorithms Notes

Arrays

Two Pointer

Usage: Two pointer is a technique to find valid subarrays. Keep two pointers, l and r. Move l or r depending on some condition.

def f(arr):
    l, r = 0, len(arr) - 1
    while l < r:
        if cond:
            l += 1
        else:
            r -= 1
    return l

Examples: 3Sum, Container With Most Water and Trapping Rainwater

Sliding Window

Usage: Sliding window is a technique used to find valid subarrays.
Time complexity: There are n(n+1)2\frac{n(n + 1)}{2} subarrays, but a sliding window runs in O(n).

def f(arr):
    output, l, curr = 0, 0
    for r in range(len(arr)):
        curr += arr[r]
        while curr > k:
            curr -= arr[l]
            l += 1
        output = max(output, r - l + 1)
    return output

Prefix Sum

Usage: Prefix sum is used to efficiently calculate the sum of a subarray.

def f(arr):
    prefix = [arr[0]]
    for i in range(1, len(arr)):
        prefix.append(prefix[-1] + arr[i])
    return prefix

Kadane's Algorithm

Usage: Kadane's algorithm is used to find the maximum sum of a contiguous subarray. At each step, either start a new subarray at n, or extend the previous subarray with curr_sum + n.

def f(arr):
    curr_sum = 0
    best_sum = float('-inf')
    for n in arr:
        curr_sum = max(n, curr_sum + n)
        best_sum = max(best_sum, curr_sum)
    return best_sum

Usage: Binary search is used to find a specific element in a sorted array.

DescriptionLoop invariantLoop end conditionPostprocessing
Used for finding exact targets, when it is not necessary to compare to neighborsl <= rSearch space is emptyNone
Used when the condition depends on a neighbor, such as finding finding the minimum or maximum value that satisfies a condition.l < rExactly one element remainsCheck if l is valid
Used when the condition depends on two neighbors, such as finding a peak element in an arrayl + 1 < rExactly two elements remainCheck if l or r are valid
def f(n, arr):
    l, r = 0, len(arr) - 1
    while l <= r:
        m = (l + r) // 2
        if arr[m] == n:
            return m
        elif arr[m] < n:
            l = m + 1
        else:
            r = m - 1
  return -1

Examples: Search a 2D Matrix.

Sorting Algorithms

Usage: Sorting algorithms are used to sort an array.

Construction: Merge sort recursively breaks the array into left and right subarrays, sorts them, and merges the sorted lists. In the base case, the list is length 0 or 1 and is considered sorted.

def merge_sort(arr): 
    if len(arr) <= 1:
        return arr
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    return merge(left, right)

def merge(left, right):
    output = []
    i = j = 0
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            output.append(left[i])
            i += 1
        else:
            output.append(right[j])
            j += 1
    output.extend(left[i:])
    output.extend(right[j:])
    return output

Construction: Bucket sort distributes the elements into a number of buckets, which are then sorted individually.

Bucket Sort

Examples: Maximum Gap.

Linked Lists

Description: Linked lists are lists where each element has a pointer to the next element in the list.

Advice: Use a dummy node when the head of the linked list needs to be removed or modified, such as in Remove Linked List Elements and Reverse Linked List).

Fast and Slow Pointers

Usage: Fast and slow pointers are used to detect cycles in a linked list. The distance between slow and fast increases by 1 every turn.

def has_cycle(head):
    slow, fast = head, head
    while fast and fast.next:
        slow = slow.next
        fast = fast.next.next
        if fast == slow:
            return True
    return False

We can also calculate where a cycle begins. At the point when slow and fast meet, we have A + B = A + B + C + B. We also know that slow is half the speed of fast, so 2(A + B) = A + B + C + B. Simplifying, we get A = C.

Fast and Slow Pointers

Trees

Description: A tree is a directed graph where each node has exactly one parent, except for the root node.

Construction: A tree is generally constructed by having each parent store pointers to their child nodes.

class Node:
    def __init__(self, value):
        self.value = value
        self.children = []

Binary Trees

Description: A binary tree is a tree where nodes can have at most two children. An important type of binary tree are complete binary trees, which are trees that fill every level from left to right.

Construction: Complete binary trees are often implemented with arrays for efficiency.

Let the root node be index 0. For a given node at index i:

  1. Its left child is at index 2 * i + 1
  2. Its right child is at index 2 * i + 2
  3. Its parent is at index floor((i - 1) / 2)

Usage: Complete binary trees are used in binary heaps and segment trees.

Tree Traversal

There are multiple ways to traverse a tree. They can be categorized by whether they are depth-first (DFS) or breadth-first (BFS). There are three variants of DFS: pre-order (node-left-right), post-order (left-right-node), and in-order (left-node-right).

Construction: DFS needs a stack, so it is usually implemented recursively to take advantage of the call stack. However, it can also be implemented iteratively.

# DFS, pre-order, recursive
def traverse(node):
    if node is None:
        return
    visit(node)
    traverse(node.left)
    traverse(node.right)
# DFS, pre-order, iterative
def traverse(root):
    if root is None:
        return
    stack = [root]
    while len(stack) > 0:
        node = stack.pop()
        if node is None:
            continue
        visit(node)
        # note that right is pushed first
        # so that left is processed first
        if node.right is not None:
            stack.push(node.right)
        if node.left is not None:
            stack.push(node.left)

Construction: BFS needs a queue to keep track of nodes at the current level, so it is usually implemented iteratively.

# BFS, iterative
def traverse(root):
    curr = [root]
    while curr:
        nxt = []
        for node in curr:
            if node is not None:
                for child in node.children:
                    nxt.append(child)
        curr = nxt

Segment Trees

Usage: A segment tree is used for range queries and updates.

Time complexity: It can achieve both in O(log n) time. This is better than simple loops, which take O(n) time for querying, and prefix sums, which take O(n) time for updates.

tree = [0] * (2 * n) # n is the len of the array

def build(arr):
    # arr is at the bottom of the tree 
    # from n to 2n-1
    for i in range(n):
        tree[i + n] = arr[i]
    # parents start at n-1 and before 0
    # the left child is at 2i and the right child is at 2i + 1
    for i in range(n - 1, 0, -1):
        tree[i] = tree[i << 1] + tree[i << 1 | 1]

# [l, r)
def query(l, r):
    output = 0
    l += n # index of l in tree
    r += n # index of r in tree
    while l < r:
        # if l is a right child node, take it 
        if l & 1:
            output += tree[l]
            l += 1
        # if r is a right child node, take the left child
        # because of exclusive or
        if r & 1:
            r -= 1
            output += tree[r]
        # get the next parent
        l >>= 1
        r >>= 1 
    return output

def update(i, val):
    i += n # index of val in tree
    tree[i] = val
    while i > 1:
        # parent is the sum of its children
        tree[i >> 1] = tree[i] + tree[i^1] 
        i >>= 1

Examples: Block Placement Queries.
References: [1].

Heaps

Description: A heap is a tree-based data structure that satisfies the heap property, which states that a parent node should be less than (or greater than) its children nodes. It generally supports functions like push and pop.

Construction: A binary heap uses a binary tree to implement the heap.

  • In push, instantiate the element to the bottom left of the binary tree. While the element is smaller than its parent, swap it.
  • In pop, take out the root and replace it with the node at the bottom left of the tree. While the new root is larger than either of its children, swap with the smaller child.

Time complexity: push and pop are O(log n).

Advice: The python library heapq also implements a min heap.

import heapq
arr = [3, 4, 1, 2]
heapq.heapify(arr)
heapq.heappop(arr)
heapq.heappush(5)
heapq.nsmallest(1, arr)
heapq.nlargest(1, arr)

Graphs

Description: A graph is a data structure with a set of nodes connected by edges.

Topological Sort

Usage: Topological sort is used to find an ordering on a directed acyclic graph. Note that there is no ordering if there is a cycle.

Construction: Kahn's algorithm can be used to find a topological sort.

  1. Find all free nodes with no dependencies
  2. Whenever a free node is added to the ordering, decrement the degree of its neighboring nodes.
  3. If the length of the ordering is less than the number of nodes in the DAG, then there is at least one cycle.

Construction: One can also do a depth-first search to find a topological sort.

  1. Perform DFS on every unvisited node on the graph
  2. In each DFS call, first visit all unvisited neighbors, pushing them onto a stack while backtracking. The first visited node should be on the top of the stack.
  3. The stack contains the ordering. If the length of the stack is less than the number of nodes in the DAG, then there is at least one cycle.

References: Kahn's Algorithm and Topological Sort with DFS.

Union Find

Usage: Union find is used to keep track of disjoint sets. To determine whether a and b are in the same sets, simply check whether find(a) == find(b).

Construction: Union find is constructed by maintaining a representative element ("parent") for the entire set.

  • union takes two elements and combines their sets by making one element the parent of the other
  • find takes an element and returns the topmost parent
parent = list(range(n))

def find(i):
    if parent[i] == i:
        return i
    return find(parent[i])

def union(a, b):
    pa, pb = find(a), find(b)
    parent[pa] = pb

find can be optimized with path compression, which flattens the structure of the tree whenever find is called. This means that every node points directly to the topmost parent, speeding up future find operations.

union can be optimized with rank-ordering, which keeps track of the maximum height of the trees. When two trees are combined, the tree with a lower height is made a child of the tree with a higher height. This keeps the overall structure more balanced and prevents long chains of nodes, which would slow down find operations. Only when the heights of the trees are equal could the height increase.

parent = list(range(n))
rank = [0] * n

def find(i):
    if parent[i] == i:
        return i
    parent[i] = find(parent[i])
    return parent[i]

def union(a, b):
    pa, pb = find(a), find(b)
    if pa == pb:
        return
    if rank[pa] > rank[pb]:
        parent[pb] = pa
    elif rank[pa] < rank[pb]:
        parent[pa] = pb
    else:
        parent[pa] = pb
        rank[pb] += 1 

Backtracking

Usage: Backtracking is used to solve constraint satisfaction problems (CSP) where partial candidate solutions which can be checked relatively quickly.

Construction: Incrementally build candidates to the solution. Abandon a candidate ("backtrack") as soon as you determine that the candidate cannot lead to a valid solution.

def f():
    output = []  # All possible solutions
    partial = [] # Current partial solution

    def backtrack():
        nonlocal output, partial

        # Base case
        if is_solution(partial):   
            output.append(partial[:])
            return

        # Pruning
        if invalid(partial):
            return

        # Try every candidate
        for c in candidates:
            partial.append(c) # Add to the partial
            backtrack(i + 1)
            partial.pop()     # Remove from the partial

    backtrack()
    return output

Examples: Permutations and Combinations.

Dynamic Programming

Usage: Dynamic programming is used for problems where future decisions depend on earlier decisions. These may involve looking for the optimum value or the number of ways to do something.

Construction: One way to do dynamic programming is with top-down memoization.

memo = {}
def fib(n):
    if n == 0 or i == 1:
        return n
    if n not in memo:
        memo[n] = f(n - 1) + f(n - 2)
    return memo[n]

Construction: Another way to do dynamic programming is from the bottom-up. Start with base cases and iteratively build up to the solution in order.

def fib(n):
    if n == 0:
        return 0
    prev, curr = 0, 1
    for i in range(2, n):
        prev, curr = curr, prev + curr
    return curr