The Magic of Hash Tables, A Quick Deep Dive into O(1)

Jake Mellichamp
Nerd For Tech
Published in
5 min readNov 3, 2021

--

Check out the Above video if you would rather watch the explanation.

Ahh yes, the Hash Table, a Computer Science classic. Often our first thought as software developers when facing an unfamiliar problem is:

“hmmm, can I use a Hash Table to solve this problem?”

Which I mean, is the correct thought! Hash Tables not only make our logic simple, but they also are extremely quick with constant time accessing speeds!

The Inspiration

Time and space complexity of a Hash Table

As I wrote the simple Map<String, Integer> my_map = new Map<String, Integer>(); I grew curious about how many lines of code were running underneath-the-hood to make this super convenient data structure work. Which…. led me into a climatic journey in discovering the in’s and out’s of our friend, the Hash Table. Here is a digest of the findings!

What is a Hash Table?

Ok ok, I’ll keep this part short and sweet I swear… I won’t doubt your intelligence. There are two parts:

  1. A Hash Table is simply a block of memory (an array) that contains ‘buckets’ which hold the data we want to map too. Phew, easy right? We call them ‘buckets’ instead of indexes because, in fact, in some implementations, the index contains a Linked List (for collisions).
  2. A Hash Table is also abstractly-addressable, meaning that instead of using an outright offset like: Data = (Memory Location of Array) + (array Offset)we substitute the offset for a user-defined variable and hash it into a unique*** integer.
  • **In a perfect world the value is unique
Illustration of a hash table https://en.wikipedia.org/wiki/Hash_table

Hashing Algorithm

Hashing with the function ‘f’

So you might ask yourself, okay Hashing algorithm, easy! The end goal is that we want to take the supplied key and convert it to an Integer. In our example:

Map<String, Integer> my_map = new Map<String, Integer>();

We would want to convert our key of type String into an Integer so that we can use it as a substitute for our Array_Offset value. That process it conducted by a hashing algorithm, which will yield a large number back, so we use modulo arithmetic with the current size of the array to prevent an ‘Array out of bounds’ error.

Closer Look at the Hashing Algorithm

Neat right?! Okay, but now you might ask yourself:

“A Hash Table’s key can be any object! Not just a string, how do we convert any Object into an Integer Array Offset?”

The beauty of a Hash Table is that it can work with any object. How? Well, the only universal trait amongst various objects is that they are all originally made of bits! So with our hash function, we will use bit-wise operations to randomize the object before converting it to an Integer.

There is no ‘standard hashing function’ that is used throughout all OOP languages, but the C++ implementation of unordered_map uses an algorithm called ‘MurmurHash’. Below is a simple example of how a 32-bit object can be hashed into a 32-bit Integer.

An example of hashing logic on a 32-bit Object

In the above example:

  1. We take a 32-bit Object, and spit it into 4 groups of 8-bit blocks.
  2. With each individual group, we perform an AND instruction with a random static variable Var1
  3. Bit shift the result to the right, or a ROL Instruction
An Example of an ROL instruction

4. Repeat steps 2 and 3 with a different variable, Var2

5. Combine each individual 8-bit block back into a 32-bit Object

6. Read the result as an Integer.

Load Factor and Inserting Elements

After hashing the key into an indexable integer, we are ready to insert a value at that memory location (the corresponding bucket).

Remember, the hash table itself is a block of memory, so after a certain amount of inserts, the block of memory will have to recreate itself. This process is relatively expensive — as the data structure will have to allocate a new block of memory, rehash the keys, and copy the values into the new index.

What determines when a hash table is full? A ‘Load Factor Value’.

Load Factor Equation

Load Factor is defined as “The number of entries in the table (n)” Divided by “The Size of the Hash Table (k)”

There’s a check that happens before the insertion of a new element into a hash table. If the load factor is exceeded on the next insert, then the Hash Table will allocate a new block of memory, rehash the keys, and copy all the data back into the hash table.

We want to have some wiggle room in the hash table to avoid collisions, this usually means having the array size greater than the number of entries at all times.

Conclusion

Tada! Magic right?

Overall, the purpose of this article was to explore how the insertion and search time complexity of a standard Hash Table is achieved in constant time. There are plenty of questions that were left unanswered, such as “how to handle collisions with a hash table optimally?” Haha… I’ll let stack overflow answer some of those unanswered questions you may have. I hope however that you can confidently see how constant time-complexity is achieved with hash tables. Below are five key takeaways:

Key Takeaways

  • Hashing functions use bit-wise operations on the Key Value
  • The C++ standard library Hashing function is called “Murmurhash
  • The hashing algorithms used for hash tables are not cryptographic algorithms and prioritize speed rather than security.
  • A hash table has to rehash it’s values once the load factor is met.
  • The standard hashing threshold is 3/4, so a hash table should usually have about 25% memory than storing all elements in an array.

--

--