How HashMaps Actually Work

Explore the internal architecture of a HashMap, how it uses hash functions to map keys to array indices, handles collisions, and resizes to maintain efficient O

Ask your own question

In depth

A HashMap is a data structure that stores key-value pairs, allowing for very fast retrieval of values based on their associated keys. Unlike searching a list, which can be slow, a HashMap provides a near-instant way to access data.

How HashMaps Work

At its core, a HashMap uses an array, often called a "bucket array," to store data. When you want to insert a key-value pair, the HashMap doesn't just put it anywhere. Instead, it uses a special function called a hash function.

The Hash Function

The hash function takes your key (e.g., the string "cat") and converts it into a numerical hash code. This hash code is then used to calculate an index within the bucket array. Typically, this is done by taking the hash code and applying a modulo operation with the array's size (`index = hash_code % array_size`). This calculated index tells the HashMap exactly where in the array to store the value associated with that key.

Handling Collisions

It's possible for two different keys to produce the same hash code, or for their hash codes to map to the same array index after the modulo operation. This situation is called a collision. To handle collisions, HashMaps commonly use a technique called "chaining." When a collision occurs at a particular index, the new key-value pair is added to a linked list (or similar structure) at that array index. Each element in this linked list is often referred to as a "bucket."

function insert(key, value):
    idx = hash(key) % arraySize
    if bucket[idx] is empty:
        bucket[idx] = Node(key, value)
    else:
        // Collision: append to the linked list at this index
        append_to_list(bucket[idx], key, value)

function get(key):
    idx = hash(key) % arraySize
    if bucket[idx] is empty:
        return null // Key not found
    else:
        // Traverse the linked list to find the key
        for each node in bucket[idx]:
            if node.key == key:
                return node.value
        return null // Key not found in chain

Rehashing for Performance

While linked lists handle collisions, if too many keys map to the same index, these chains can become very long. A long chain means that retrieving a value might require traversing many elements, reducing the HashMap's performance from its typical O(1) average-case speed towards O(n) in the worst case. To prevent this, HashMaps monitor the "load factor" (the ratio of stored items to array size). When the load factor exceeds a certain threshold, the HashMap performs a rehashing operation. This involves creating a new, larger bucket array and then re-inserting all existing key-value pairs into the new array. Each key's hash is re-calculated, and a new index is determined based on the new array's size, distributing the elements more evenly and shortening the chains.

Key Takeaways

HashMaps use a hash function to map keys to specific array indices for direct access.
Collisions, where multiple keys map to the same index, are typically resolved using linked lists (chaining).
Rehashing, which involves resizing the internal array and re-calculating all indices, maintains efficient performance.
HashMaps offer average-case O(1) time complexity for insertions, deletions, and lookups.
Effective hash functions and proper resizing are crucial for optimal HashMap performance.

Got a different question? SeaThru generates a fresh video for any topic where systems talk or data structures move.

Ask your own question →