sveska

Grokking DSA Part 1 Core Patterns Explained

Here are fundamental data structure and algorithm patterns, explained in simple terms for practical application in software design and problem-solving.

Sliding Window

What it is: The Sliding Window pattern is used for problems that require finding a contiguous sub-array or sub-string in a larger collection that meets a specific condition. Think of it as a “window” of a certain size that slides over your data.

How it works: Instead of re-evaluating every possible sub-array from scratch (which is inefficient), you maintain a “window” of elements. As you slide the window across the data, you add a new element at one end and remove an element from the other. This way, you only need to update your calculations based on the single element that enters and the single element that leaves the window, making it much faster.

When to use it: Use this pattern when you’re asked to find the longest, shortest, or a specific sub-array or sub-string that satisfies a certain property (e.g., “find the longest sub-string with no more than K distinct characters”). It’s a go-to for optimizing problems that would otherwise be brute-forced with nested loops, typically reducing time complexity from O(n^2) to O(n).

Islands (Matrix Traversal)

What it is: This pattern is for problems involving a 2D grid or matrix, where you need to find and analyze groups of connected cells. These groups are often called “islands” (e.g., cells with value ‘1’) surrounded by “water” (cells with value ‘0’).

How it works: You traverse the matrix cell by cell. When you encounter a cell that is part of an island (and hasn’t been visited yet), you start a graph traversal algorithm like Breadth-First Search (BFS) or Depth-First Search (DFS) from that cell. This traversal finds all connected cells of the same island. You mark each visited cell to ensure you don’t count the same island multiple times.

When to use it: Apply this pattern for problems like counting the number of islands, finding the largest island, or determining the perimeter of an island. It’s essentially about applying graph traversal techniques to a grid.

Two Pointers

What it is: The Two Pointers pattern involves using two separate pointers to iterate through a data structure, usually a sorted array. The pointers can move towards each other, away from each other, or in the same direction at different speeds.

How it works: A common use case is to have one pointer at the beginning and one at the end of a sorted array. You then move them inwards based on some condition. For example, if you’re looking for a pair of numbers that sum to a target, you can adjust the pointers based on whether the current sum is too high or too low.

When to use it: This pattern is ideal for problems where you need to find a pair of elements that meet a certain criteria in a sorted array. It’s also useful for tasks like reversing an array or string in-place. It provides a very efficient way to solve these problems in O(n) time.

Fast & Slow Pointers

What it is: This is a specific variation of the Two Pointers pattern, most famously used for problems involving cycles in data structures like linked lists.

How it works: You have a “slow” pointer that moves one step at a time and a “fast” pointer that moves two steps at a time. If there is a cycle, the fast pointer will eventually lap the slow pointer and they will meet. If there’s no cycle, the fast pointer will reach the end of the list.

When to use it: This is the classic solution for detecting a cycle in a linked list. It can also be adapted to find the middle of a linked list, the start of a cycle, or to determine if a linked list is a palindrome.

Merge Intervals

What it is: This pattern deals with problems involving a collection of intervals (e.g., [startTime, endTime]). The goal is usually to consolidate overlapping intervals into a smaller set of merged intervals.

How it works: First, you sort the intervals based on their start times. Then, you iterate through the sorted intervals, comparing the current interval with the previous one. If they overlap (i.e., the current interval’s start is before the previous interval’s end), you merge them by updating the end time of the previous interval to be the maximum of the two end times.

When to use it: Use this for any problem where you need to manage and consolidate overlapping ranges. Common examples include scheduling problems (e.g., “find the minimum number of conference rooms needed for a set of meetings”) or calendar management.

Cyclic Sort

What it is: The Cyclic Sort pattern is a clever technique for problems involving an array containing numbers in a specific, continuous range (e.g., all numbers from 1 to n).

How it works: The core idea is to place each number at its “correct” index. For an array of numbers from 1 to n, the number x should ideally be at index x-1. You iterate through the array, and if the number at the current index is not in its correct place, you swap it with the number at its correct index. You repeat this until all numbers are sorted.

When to use it: This is extremely useful for finding missing numbers, duplicate numbers, or the smallest missing positive number in an unsorted array, all in O(n) time and with O(1) space.

In-place Reversal of a LinkedList

What it is: This pattern involves reversing the direction of a linked list without creating a new list or using extra memory.

How it works: You iterate through the list one node at a time, reversing the next pointer of each node to point to the previous node. You’ll need three pointers to keep track of your state: previous, current, and next. At each step, you save the next node, point the current node’s next to previous, and then advance previous and current down the list.

When to use it: This is a fundamental linked list operation and a building block for more complex problems, such as reversing a sub-list or checking for palindromic linked lists.

Tree Breadth-First Search (BFS)

What it is: Tree BFS is a traversal algorithm that explores a tree level by level, from top to bottom.

How it works: It uses a queue data structure. You start by adding the root node to the queue. Then, you enter a loop that continues as long as the queue is not empty. In each iteration, you dequeue a node, process it, and then enqueue all of its children. This ensures you visit all nodes at the current level before moving to the next.

When to use it: BFS is perfect for finding the shortest path between two nodes in a tree or graph. It’s also the right choice for any problem that requires level-order traversal, such as finding the minimum depth of a tree, or connecting all nodes at the same level.

Tree Depth First Search (DFS)

What it is: Tree DFS is a traversal algorithm that explores a tree by going as deep as possible down one branch before backtracking.

How it works: It can be implemented elegantly using recursion (which uses the call stack) or an explicit stack. You start at the root, explore one of its subtrees completely, and then move to the next subtree. The three main variations (pre-order, in-order, post-order) depend on when you process the node relative to its children.

When to use it: DFS is well-suited for problems involving pathfinding, such as checking if a path exists between two nodes or finding a path that sums to a specific value. It’s also used for checking tree properties, like whether a tree is a valid Binary Search Tree.

Two Heaps

What it is: This pattern uses two heaps to partition a set of numbers into two halves: a smaller half and a larger half.

How it in works: You use a Max-Heap to store the smaller half of the numbers and a Min-Heap to store the larger half. The heaps are kept balanced in size. This setup allows you to efficiently access the largest number in the small half (top of the Max-Heap) and the smallest number in the large half (top of the Min-Heap).

When to use it: This is the classic solution for finding the median of a stream of numbers in real-time. By keeping the heaps balanced, the median can always be calculated in O(1) time (it’s either the top of one of the heaps or the average of the tops of both). It’s also useful for problems that require maintaining a “median-like” state, such as in scheduling or resource allocation.