Contents
Data structures
- Abstract data types
- Arrays
- Techniques: Two pointer, Sliding window, Prefix sum
- Algorithms: Kadane's algorithm, Binary search, Merge sort, Bucket sort
- Linked lists
- Techniques: Dummy node
- Algorithms: Fast and slow pointers
- Trees
- Implementations: Tree with pointers, Complete binary tree with arrays
- Techniques: Tree traversal
- Algorithms: Segment trees
- Heaps
- Implementations: Binary heap, Built-in heap
- Graphs
- Implementations:
- Algorithms: Topological sort
- Union find
- Implementations: Basic union find, Path compression and rank-ordering
- Multisets
- Implementations: Counter
- Maps
- Implementations: Defaultdict
Algorithms
Abstract data types
Abstract data types include lists, priority queues, trees, sets, multisets, maps, and graphs.
Abstract data type | Implementation |
---|---|
List | Arrays, linked lists |
Priority queue | Heaps |
Trees | Pointers, arrays |
Sets | set |
Multisets | Counter |
Maps | dict , defaultdict |
Graphs | Adjacency lists, adjacency matrices |
Disjoint set | Union find |
*The difference between lists and priority queues is that lists maintain order, while priority queues maintain priority. Generally lists implement get
and insert
and priority queues implement push
and pop
.
*A stack is a priority queue where the most recently added element has the highest priority (LIFO). A queue is a priority queue where the least recently added element has the highest priority (FIFO).
Arrays
Two pointer
Used to find valid subarrays.
Keep two pointers, l
and r
. Move l
or r
depending on some condition.
def f(arr):
l, r = 0, len(arr) - 1
while l < r:
if l > r:
l += 1
else:
r -= 1
return l
Example problems include 3Sum, Container With Most Water and Trapping Rainwater.
Sliding window
Used to find valid subarrays.
To find the most [BLANK] of valid subarrays, keep two pointers l
and r
, the output
value, and the curr
value. Slide r
until condition is broken, and then slide l
until condition is not broken.
To find the total number of valid subarrays, fix the value of r
. The number of valid windows ending at index r
is equal to the size of the window.
There are n(n + 1)/2 subarrays, but a sliding window runs in O(n).
def f(arr):
output, l, curr = 0, 0
for r in range(len(arr)):
curr += arr[r]
while curr > k:
curr -= arr[l]
l += 1
output = max(output, r - l + 1)
return output
Prefix sum
Used to efficiently calculate the sum of a subarray.
def f(arr):
prefix = [arr[0]]
for i in range(1, len(arr)):
prefix.append(prefix[-1] + arr[i])
return prefix
Kadane's algorithm
Used to find the maximum sum of a contiguous subarray within an array.
At each step, either start a new subarray at n
, or extend the previous subarray with curr_sum + n
.
def f(arr):
curr_sum = 0
best_sum = float('-inf')
for n in arr:
curr_sum = max(n, curr_sum + n)
best_sum = max(best_sum, curr_sum)
return best_sum
Binary search
Used to find an element in a sorted array.
Classic
Used for finding exact targets, when it is not necessary to compare to neighbors. The loop invariant is l <= r
, and loop ends when the search space is empty.
def f(n, arr):
l, r = 0, len(arr) - 1
while l <= r:
m = (l + r) // 2
if arr[m] == n:
return m
elif arr[m] < n:
l = m + 1
else:
r = m - 1
return -1
Example problems include Search a 2D Matrix.
Single-candidate postprocessing
Used when the condition depends on a neighbor, such as finding finding the minimum or maximum value that satisfies a condition. The loop invariant is l < r
, and the loop ends when exactly one element remains.
def f(n, arr):
l, r = 0, len(arr) - 1
while l < r:
m = (l + r) // 2
if g(arr[m]):
r = mid
else:
l = mid + 1
if g(arr[l]) == n:
return l
return -1
Double-candidate postprocessing
Used when the condition depends on two neighbors, such as finding a peak element in an array. The loop invariant is l + 1 < r
, and the loop ends when exactly two elements remain.
def f(n, arr):
l, r = 0, len(arr) - 1
while l + 1 < r:
m = (l + r) // 2
if g(arr[m]):
r = m
else:
l = m
if g(arr(l)):
return l
if g(arr(r)):
return r
return -1
Merge sort
The intuition behind mergesort
is that you recursively break the list into left and right sublists. At the base case, the list is length 0 or 1 and is considered sorted. After calling mergesort
recursively on the left and the right side, the merge
function merges the now sorted lists.
def merge_sort(arr):
if len(arr) <= 1:
return arr
left = []
right = []
for i in range(len(arr)):
if i < len(arr) / 2:
left.append(arr[i])
else:
right.append(arr[i])
left = merge_sort(left)
right = merge_sort(right)
return merge(left, right)
def merge(left, right):
output = []
while left and right:
if left[0] <= right[0]:
output.append(left[0])
left = left[1:]
else:
output.append(right[0])
right = right[1:]
if left:
output.extend(left)
if right:
output.extend(right)
return output
Bucket sort
The idea of bucket sort is to distribute the elements into a number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sort algorithm. Combined with the pigeonhole principle, it can be useful for problems like Maximum Gap.
Linked lists
Dummy node
Some problems can be simplified by using a dummy node at the start of the linked list. This avoids edge cases when the head needs to be removed or modified. Example problems include Remove Linked List Elements and Reverse Linked List).
Fast and slow pointers
Used to detect cycles in a linked list. Notice that the distance between slow
and fast
decreases by 1
every turn.
def hasCycle(head):
slow, fast = head, head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if fast == slow:
return True
return False
At the point when slow
and fast
meet, we have A + B = A + B + C + B. We also know that slow
is half the speed of fast
, so 2(A + B) = A + B + C + B. Simplifying, we get A = C. This helps us find when the cycle starts.
Trees
Tree with pointers
Any arbitrary tree can be implemented with pointers.
class Node:
def __init__(self, value):
self.value = value
self.children = []
Complete binary tree with arrays
A binary tree is a tree where nodes can have at most two children, referred to as the left and right child. An important type of binary tree are complete binary trees, which are trees that fill every level from left to right. Complete binary trees are used in binary heaps and segment trees. Complete binary trees are often implemented with arrays for efficiency.
Let the root node be index 0
. For any node at index i
:
- Left child at index
2 * i + 1
- Right child at index
2 * i + 2
- Parent at index
floor((i - 1) / 2)
Tree traversal
There are multiple ways to traverse a tree. They can be categorized by whether they are breadth-first (BFS) or depth-first (DFS).
There are three variants of DFS: pre-order (node-left-right), post-order (left-right-node), and in-order (left-node-right).
BFS needs a queue to keep track of nodes at the current level, so it is usually implemented iteratively.
# BFS, iterative
def traverse(root):
curr = [root]
while curr:
nxt = []
for node in curr:
if node is not None:
for child in node.children:
nxt.append(child)
curr = nxt
DFS needs a stack, so it is usually implemented recursively to take advantage of the call stack. However, it can also be implemented iteratively.
# DFS, pre-order, recursive
def traverse(node):
if node is None:
return
visit(node)
traverse(node.left)
traverse(node.right)
# DFS, pre-order, iterative
def traverse(root):
if root is None:
return
stack = [root]
while len(stack) > 0:
node = stack.pop()
if node is None:
continue
visit(node)
# note that right is pushed first
# so that left is processed first
if node.right is not None:
stack.push(node.right)
if node.left is not None:
stack.push(node.left)
Segment trees
A segment tree is used for range queries and udpates. It can achieve both in O(log n) time. This is better than simple loops, which take O(n) time for querying, and prefix sums, which take O(n) time for updates.
tree = [0] * (2 * n) # n is the len of the array
def build(arr):
# arr is at the bottom of the tree
# from n to 2n-1
for i in range(n):
tree[i + n] = arr[i]
# parents start at n-1 and before 0
# the left child is at 2i and the right child is at 2i + 1
for i in range(n - 1, 0, -1):
tree[i] = tree[i << 1] + tree[i << 1 | 1]
# [l, r)
def query(l, r):
output = 0
l += n # index of l in tree
r += n # index of r in tree
while l < r:
# if l is a right child node, take it
if l & 1:
output += tree[l]
l += 1
# if r is a right child node, take the left child
# because of exclusive or
if r & 1:
r -= 1
output += tree[r]
# get the next parent
l >>= 1
r >>= 1
return output
def update(i, val):
i += n # index of val in tree
tree[i] = val
while i > 1:
# parent is the sum of its children
tree[i >> 1] = tree[i] + tree[i^1]
i >>= 1
Note that a segment tree can also be implemented with pointers.
References: [1].
Heaps
A heap is a tree-based data structure that satisfies the heap property, which states that a parent node should be less than (or greater than) its children nodes. Types of heaps include binary, skew, binomial, and fibonacci.
Binary heap
A binary heap is a heap implemented as a binary tree. It implements push
and pop
in O(log n) time. While not the fastest, it can be initialized by inserting every element into it.
push
- Add the element to the bottom left
- While the element is smaller than its parent, swap
pop
- Replace the root with the node at the bottom left
- While the new root is larger than either of its children, swap with the smaller child
Built-in heap
The python heap library heapq
implements a min heap.
import heapq
arr = [3, 4, 1, 2]
heapq.heapify(arr)
heapq.heappop(arr)
heapq.heappush(5)
heapq.nsmallest(1, arr)
heapq.nlargest(1, arr)
Graphs
Topological Sort
You can only implement topological sort on DAGs. There no ordering if there is a cycle.
Kahn's algorithm
This is based on BFS.
- Keep track of free nodes that have no dependencies
- When removing the node, decrement the degree of neighboring nodes
output, free = [], []
while free:
n = free.pop()
output.append(n)
for node m adjacent to n:
decrement degree of m
if degree of m == 0:
free.append(m)
if len(output) != len(nodes):
return False # graph has at least one cycle
Depth-first search
- Choose a node
- Do a DFS on the unvisited node
- On callback, add current node to the toplogical ordering
output, temp = [], []
def dfs(node n)
if n in output:
return True
if n in temp:
return False # graph has at least one cycle
temp.append(n)
for node m adjacent to n
dfs(m)
temp.remove(n)
output.append(n)
for n in nodes:
dfs(n)
References:
Union find
Used to keep track of disjoint sets.
union
takes two elements and combines their sets. find
takes an element and returns the representative element ("parent") in the set.
Basic union find
At first, every element is its own parent. union
makes one element the parent of the other. find
returns the topmost parent.
parent = list(range(n))
def find(i):
if parent[i] == i:
return i
return find(parent[i])
def union(a, b):
pa, pb = find(a), find(b)
parent[pa] = pb
Path compression and rank-ordering
find
can be optimized with path compression, which flattens the structure of the tree whenever find
is called. This means that every node points directly to the topmost parent, speeding up future find
operations.
union
can be optimized with rank-ordering, which keeps track of the maximum height of the trees. When two trees are combined, the tree with a lower height is made a child of the tree with a higher height. This keeps the overall structure more balanced and prevents long chains of nodes, which would slow down find
operations. Note that only when they are equal could the height increase.
parent = list(range(n))
rank = [0] * n
def find(i):
if parent[i] == i:
return i
parent[i] = find(parent[i])
return parent[i]
def union(a, b):
pa, pb = find(a), find(b)
if pa == pb:
return
if rank[pa] > rank[pb]:
parent[pb] = pa
elif rank[pa] < rank[pb]:
parent[pa] = pb
else:
parent[pa] = pb
rank[pb] += 1
Multisets
Counter
Counter
takes an iterator and returns the count of each element. It return a zero count for missing elements. Methods: items()
, most_common([n])
Maps
Defaultdict
defaultdict
takes a default_factory
function and returns a dictionary that returns the default value for missing keys. eg. defaultdict(list)
or defaultdict(lambda x: x)
.
Backtracking
Incrementally build candidates to the solution. Abandon a candidate ("backtrack") as soon as you determine that the candidate cannot lead to a valid solution.
Used for constraint satisfaction problems (CSP) with partial candidate solutions which can be checked relatively quickly.
def f():
output = [] # All possible solutions
partial = [] # Current partial solution
def backtrack():
nonlocal output, partial
# Base case
if is_solution(partial):
output.append(partial[:])
return
# Pruning
if invalid(partial):
return
# Try every candidate
for c in candidates:
partial.append(c) # Add to the partial
backtrack(i + 1)
partial.pop() # Remove from the partial
backtrack()
return output
Popular problems include Permutations and Combinations.
Dynamic programming
Might be right for problems that involve looking for the optimum value or the number of ways to do something. Most importantly, future decisions depend on earlier decisions.
Top-down memoization
memo = {}
def fib(n):
if n == 0 or i == 1:
return n
if n not in memo:
memo[n] = f(n - 1) + f(n - 2)
return memo[n]
Bottom-up dynamic programming
Start with base cases and iteratively build up to the solution in order.
def fib(n):
if n == 0:
return 0
prev, curr = 0, 1
for i in range(2, n):
prev, curr = curr, prev + curr
return curr
Bit math
Use bin
to convert an integer to its binary representation:
n = 97
b = bin(n) # '0b1100001'
b[2:]
Use int
to convert a binary string to an integer:
bin('1100001', 2) # 97
Bit shifting:
1 << 3 # 01000
To check if the n
th bit of x
is set with 0-based indexing:
x & (1 << n)