List (ADT)
A list is an ordered collection of elements which can be implemented with arrays or linked lists.
get
returns the element at indexi
insert
adds an elementval
at indexi
Priority queue (ADT)
A priority queue gives each element a priority, and serves elements with a higher priority before elements with a lower priority.
push
adds an elementval
with some priorityp
pop
removes the element with the highest priority
A stack is a priority queue where the most recently added element has the highest priority (LIFO). A queue is a priority queue where the least recently added element has the highest priority (FIFO). One way to implement arbitrary priority values is with heaps.
Tree (ADT)
A tree represents a hierarchical structure between nodes. Each node in a tree must be connected to exactly one parent node, except for the root node, which has no parent.
Any arbitrary tree can be implemented with pointers.
class Node:
def __init__(self, value):
self.value = value
self.children = []
There are multiple ways to traverse a tree. They can be categorized by whether they are breadth-first (BFS) or depth-first (DFS).
There are three variants of DFS: pre-order (node-left-right), post-order (left-right-node), and in-order (left-node-right).
BFS needs a queue to keep track of nodes at the current level, so it is usually implemented iteratively.
# BFS, iterative
def traverse(root):
curr = [root]
while curr:
nxt = []
for node in curr:
if node is not None:
for child in node.children:
nxt.append(child)
curr = nxt
DFS needs a stack, so it is usually implemented recursively to take advantage of the call stack. However, it can also be implemented iteratively.
# DFS, pre-order, recursive
def traverse(node):
if node is None:
return
visit(node)
traverse(node.left)
traverse(node.right)
# DFS, pre-order, iterative
def traverse(root):
if root is None:
return
stack = [root]
while len(stack) > 0:
node = stack.pop()
if node is None:
continue
visit(node)
# note that right is pushed first
# so that left is processed first
if node.right is not None:
stack.push(node.right)
if node.left is not None:
stack.push(node.left)
Binary tree
A binary tree is a tree where nodes can have at most two children, referred to as the left and right child. An important type of binary tree are complete binary trees, which are trees that fill every level from left to right. Complete binary trees are used in binary heaps and segment trees. Complete binary trees are often implemented with arrays for efficiency.
Let the root node be index 0
. For any node at index i
:
- Left child at index
2 * i + 1
- Right child at index
2 * i + 2
- Parent at index
floor((i - 1) / 2)
Heap
A heap is a tree-based data structure that satisfies the heap property, which states that a parent node should be less than (or greater than) its children nodes.
Priority queues are often implemented using heaps. Types of heaps include binary, skew, binomial, and fibonacci.
Binary heap
A binary heap is a heap implemented as a binary tree. It implements push
and pop
in O(log n) time. While not the fastest, it can be initialized by inserting every element into it.
push
- Add the element to the bottom left
- While the element is smaller than its parent, swap
pop
- Replace the root with the node at the bottom left
- While the new root is larger than either of its children, swap with the smaller child
Segment Tree
A segment tree is used for range queries and udpates. It can achieve both in O(log n) time. This is better than simple loops, which take O(n) time for querying, and prefix sums, which take O(n) time for updates.
tree = [0] * (2 * n) # n is the len of the array
def build(arr):
# arr is at the bottom of the tree
# from n to 2n-1
for i in range(n):
tree[i + n] = arr[i]
# parents start at n-1 and before 0
# the left child is at 2i and the right child is at 2i + 1
for i in range(n - 1, 0, -1):
tree[i] = tree[i << 1] + tree[i << 1 | 1]
# [l, r)
def query(l, r):
output = 0
l += n # index of l in tree
r += n # index of r in tree
while l < r:
# if l is a right child node, take it
if l & 1:
output += tree[l]
l += 1
# if r is a right child node, take the left child
# because of exclusive or
if r & 1:
r -= 1
output += tree[r]
# get the next parent
l >>= 1
r >>= 1
return output
def update(i, val):
i += n # index of val in tree
tree[i] = val
while i > 1:
# parent is the sum of its children
tree[i >> 1] = tree[i] + tree[i^1]
i >>= 1
Note that a segment tree can also be implemented with pointers.
References: [1].
Union Find
Used to keep track of disjoint sets.
union
takes two elements and combines their sets. find
takes an element and returns the representative element ("parent") in the set.
Basic
At first, every element is its own parent. union
makes one element the parent of the other. find
returns the topmost parent.
parent = list(range(n))
def find(i):
if parent[i] == i:
return i
return find(parent[i])
def union(a, b):
pa, pb = find(a), find(b)
parent[pa] = pb
Path compression and rank-ordering
find
can be optimized with path compression, which flattens the structure of the tree whenever find
is called. This means that every node points directly to the topmost parent, speeding up future find
operations.
union
can be optimized with rank-ordering, which keeps track of the maximum height of the trees. When two trees are combined, the tree with a lower height is made a child of the tree with a higher height. This keeps the overall structure more balanced and prevents long chains of nodes, which would slow down find
operations. Note that only when they are equal could the height increase.
parent = list(range(n))
rank = [0] * n
def find(i):
if parent[i] == i:
return i
parent[i] = find(parent[i])
return parent[i]
def union(a, b):
pa, pb = find(a), find(b)
if pa == pb:
return
if rank[pa] > rank[pb]:
parent[pb] = pa
elif rank[pa] < rank[pb]:
parent[pa] = pb
else:
parent[pa] = pb
rank[pb] += 1
Built-In Data Structures
The python heap library heapq
implements a min heap.
import heapq
arr = [3, 4, 1, 2]
heapq.heapify(arr)
heapq.heappop(arr)
heapq.heappush(5)
heapq.nsmallest(1, arr)
heapq.nlargest(1, arr)
Counter
takes an iterator and returns the count of each element. It return a zero count for missing elements. Methods: items()
, most_common([n])
defaultdict
takes a default_factory
function and returns a dictionary that returns the default value for missing keys. eg. defaultdict(list)
or defaultdict(lambda x: x)
.