Data structures

List (ADT)

A list is an ordered collection of elements which can be implemented with arrays or linked lists.

  1. get returns the element at index i
  2. insert adds an element val at index i

Priority queue (ADT)

A priority queue gives each element a priority, and serves elements with a higher priority before elements with a lower priority.

  1. push adds an element val with some priority p
  2. pop removes the element with the highest priority

A stack is a priority queue where the most recently added element has the highest priority (LIFO). A queue is a priority queue where the least recently added element has the highest priority (FIFO). One way to implement arbitrary priority values is with heaps.

Tree (ADT)

A tree represents a hierarchical structure between nodes. Each node in a tree must be connected to exactly one parent node, except for the root node, which has no parent.

Any arbitrary tree can be implemented with pointers.

class Node:
    def __init__(self, value):
        self.value = value
        self.children = []

There are multiple ways to traverse a tree. They can be categorized by whether they are breadth-first (BFS) or depth-first (DFS).

There are three variants of DFS: pre-order (node-left-right), post-order (left-right-node), and in-order (left-node-right).

DFS Tree Traversal

BFS needs a queue to keep track of nodes at the current level, so it is usually implemented iteratively.

# BFS, iterative
def traverse(root):
    curr = [root]
    while curr:
        nxt = []
        for node in curr:
            if node is not None:
                for child in node.children:
                    nxt.append(child)
        curr = nxt

DFS needs a stack, so it is usually implemented recursively to take advantage of the call stack. However, it can also be implemented iteratively.

# DFS, pre-order, recursive
def traverse(node):
    if node is None:
        return
    visit(node)
    traverse(node.left)
    traverse(node.right)
# DFS, pre-order, iterative
def traverse(root):
    if root is None:
        return
    stack = [root]
    while len(stack) > 0:
        node = stack.pop()
        if node is None:
            continue
        visit(node)
        # note that right is pushed first
        # so that left is processed first
        if node.right is not None:
            stack.push(node.right)
        if node.left is not None:
            stack.push(node.left)

Binary tree

A binary tree is a tree where nodes can have at most two children, referred to as the left and right child. An important type of binary tree are complete binary trees, which are trees that fill every level from left to right. Complete binary trees are used in binary heaps and segment trees. Complete binary trees are often implemented with arrays for efficiency.

Let the root node be index 0. For any node at index i:

  1. Left child at index 2 * i + 1
  2. Right child at index 2 * i + 2
  3. Parent at index floor((i - 1) / 2)

Heap

A heap is a tree-based data structure that satisfies the heap property, which states that a parent node should be less than (or greater than) its children nodes.

Priority queues are often implemented using heaps. Types of heaps include binary, skew, binomial, and fibonacci.

Binary heap

A binary heap is a heap implemented as a binary tree. It implements push and pop in O(log n) time. While not the fastest, it can be initialized by inserting every element into it.

push

  1. Add the element to the bottom left
  2. While the element is smaller than its parent, swap

pop

  1. Replace the root with the node at the bottom left
  2. While the new root is larger than either of its children, swap with the smaller child

Segment Tree

A segment tree is used for range queries and udpates. It can achieve both in O(log n) time. This is better than simple loops, which take O(n) time for querying, and prefix sums, which take O(n) time for updates.

tree = [0] * (2 * n) # n is the len of the array

def build(arr):
    # arr is at the bottom of the tree 
    # from n to 2n-1
    for i in range(n):
        tree[i + n] = arr[i]
    # parents start at n-1 and before 0
    # the left child is at 2i and the right child is at 2i + 1
    for i in range(n - 1, 0, -1):
        tree[i] = tree[i << 1] + tree[i << 1 | 1]

# [l, r)
def query(l, r):
    output = 0
    l += n # index of l in tree
    r += n # index of r in tree
    while l < r:
        # if l is a right child node, take it 
        if l & 1:
            output += tree[l]
            l += 1
        # if r is a right child node, take the left child
        # because of exclusive or
        if r & 1:
            r -= 1
            output += tree[r]
        # get the next parent
        l >>= 1
        r >>= 1 
    return output

def update(i, val):
    i += n # index of val in tree
    tree[i] = val
    while i > 1:
        # parent is the sum of its children
        tree[i >> 1] = tree[i] + tree[i^1] 
        i >>= 1

Note that a segment tree can also be implemented with pointers.

References: [1].

Union Find

Used to keep track of disjoint sets.

union takes two elements and combines their sets. find takes an element and returns the representative element ("parent") in the set.

Basic

At first, every element is its own parent. union makes one element the parent of the other. find returns the topmost parent.

parent = list(range(n))

def find(i):
    if parent[i] == i:
        return i
    return find(parent[i])

def union(a, b):
    pa, pb = find(a), find(b)
    parent[pa] = pb

Path compression and rank-ordering

find can be optimized with path compression, which flattens the structure of the tree whenever find is called. This means that every node points directly to the topmost parent, speeding up future find operations.

union can be optimized with rank-ordering, which keeps track of the maximum height of the trees. When two trees are combined, the tree with a lower height is made a child of the tree with a higher height. This keeps the overall structure more balanced and prevents long chains of nodes, which would slow down find operations. Note that only when they are equal could the height increase.

parent = list(range(n))
rank = [0] * n

def find(i):
    if parent[i] == i:
        return i
    parent[i] = find(parent[i])
    return parent[i]

def union(a, b):
    pa, pb = find(a), find(b)
    if pa == pb:
        return
    if rank[pa] > rank[pb]:
        parent[pb] = pa
    elif rank[pa] < rank[pb]:
        parent[pa] = pb
    else:
        parent[pa] = pb
        rank[pb] += 1 

Built-In Data Structures

The python heap library heapq implements a min heap.

import heapq

arr = [3, 4, 1, 2]
heapq.heapify(arr)
heapq.heappop(arr)
heapq.heappush(5)
heapq.nsmallest(1, arr)
heapq.nlargest(1, arr)

Counter takes an iterator and returns the count of each element. It return a zero count for missing elements. Methods: items(), most_common([n])

defaultdict takes a default_factory function and returns a dictionary that returns the default value for missing keys. eg. defaultdict(list) or defaultdict(lambda x: x).