A hash table in C/C++ is a data structure that maps keys to values. A hash table uses a hash function to compute indexes for a key. You can store the value at the appropriate location based on the hash table index.
The benefit of using a hash table is its very fast access time. Typically, the time complexity (amortized time complexity) is a constant O(1)
access time.
If two different keys get the same index, you will need to use other data structures (buckets) to account for these collisions. If you choose a very good hash function, the likelihood of a collision can be negligible.
The C++ STL (Standard Template Library) has the std::unordered_map()
data structure.
In this article, you will construct a hash table from scratch comprised of:
- A hash function to map keys to values.
- A hash table data structure that supports
insert
, search
, and delete
operations.
- A data structure to account for a collision of keys.
The first step is to choose a reasonably good hash function that has a low chance of collision. However, for the purposes of this tutorial, a poor hash function will be applied to better illustrate hash collisions. This limited example will also only utilize strings (or character arrays in C).
HashTable.cpp
Run this code and test different strings for potential collisions. For example, the strings Hel
and Cau
will collide since they have the same ASCII value.
This code must return a number within the bounds of the capacity of the hash table. Otherwise, it may access an unbound memory location, leading to an error.
A hash table is an array of items, which are { key: value }
pairs.
First, define the item structure:
HashTable.cpp
Now, the hash table has an array of pointers that point to Ht_item
, so it is a double-pointer.
HashTable.cpp
Your hash table will need to return the number of elements in the hash table using count
and size of the hash table using size
.
Next, create functions for allocating memory and creating items.
Create items by allocating memory for a key and value, and return a pointer to the item:
HashTable.cpp
Create the table by allocating memory and setting size
, count
, and items
:
HashTable.cpp
The preceding example allocates memory for the wrapper structure HashTable
and sets all the items to NULL
.
Freeing up memory is a C/C++ best practice. Free up memory that you’ve allocated on the heap with malloc()
and calloc()
.
Write functions that free up a table item and the whole table.
HashTable.cpp
Add a print_table()
to display the index
, key
, and value
for each item:
HashTable.cpp
This concludes the basic functionality of your custom hash table. You will now write insert, search, and delete functions.
Create a function, ht_insert()
, that performs insertions.
The function takes a HashTable
pointer, a key
, and a value
as parameters:
Now, there are certain steps involved in the ht_insert()
function.
- Create the item based on the
{ key: value }
pair.
- Compute the index based on the hash function.
- Check if the index is already occupied or not, by comparing the
key
.
- If it is not occupied, you can directly insert it into
index
.
- Otherwise, it is a collision, and you will need to handle it.
This tutorial will address handling collisions after the initial model has been created.
First, create the item:
Then, compute the index:
When inserting the key for the first time, the item must be a NULL
:
HashTable.cpp
Consider the scenario where the { key: value }
pair already exists because the same item has been inserted into the hash table. To address this, the code must update the item value to the new one:
HashTable.cpp
Consider the scenario where a collision has to be handled. To address this, a placeholder has been added:
HashTable.cpp
Now, your ht_insert()
function is complete.
Create a function, ht_search()
, that checks if the key exists, and returns the corresponding value if it does.
The function takes a HashTable
pointer and a key
as parameters:
Search for an item with the key
in the HashTable
. If the item cannot be found in the HashTable
, NULL
is returned.
HashTable.cpp
Add a print_search()
to display the item that matches the key
:
HashTable.cpp
Now, your ht_search()
function is complete.
There are different ways to resolve a collision. This tutorial will rely upon a method called Separate Chaining, which aims to create independent chains for all items that have the same hash index. The implementation in this tutorial will create these chains using linked lists.
Whenever there is a collision, additional items that collide on the same index are added to an overflow bucket list. Thus, you will not have to delete any existing records on the hash table.
Due to linked lists having O(n)
time complexity for insertion, searching, and deletion, in case of a collision, you will have a worst-case access time of O(n)
as well. The advantage of this method is that it is a good choice if your hash table has a low capacity.
Implement the overflow bucket list:
HashTable.cpp
Now, add these overflow bucket lists to your HashTable
. Every item will have a chain, so the whole table is an array of LinkedList
pointers.
HashTable.cpp
Now that overflow_buckets
have been defined, add functions to create and delete them. You will also need to account for them in the create_table()
and free_table()
functions.
HashTable.cpp
If the overflow bucket list for the item does not exist, create a list and add the item to it.
Update handle_collision()
for insertions:
HashTable.cpp
And the call:
HashTable.cpp
Now, update the search method to use overflow buckets:
HashTable.cpp
Finally, collisions are now handled in insert()
and search()
!
Let’s now finally look at the delete function:
Again, the method is similar to insertion.
- Compute the hash index and get the item.
- If it is
NULL
, don’t do anything.
- Otherwise, after comparing keys, if there is no collision chain for that index, remove the item from the table.
- If a collision chain exists, remove that element and shift the links accordingly.
HashTable.cpp
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CAPACITY 50000
unsigned long hash_function(char *str)
{
unsigned long i = 0;
for (int j = 0; str[j]; j++)
i += str[j];
return i % CAPACITY;
}
typedef struct Ht_item
{
char *key;
char *value;
} Ht_item;
typedef struct LinkedList
{
Ht_item *item;
LinkedList *next;
} LinkedList;
typedef struct HashTable
{
Ht_item **items;
LinkedList **overflow_buckets;
int size;
int count;
} HashTable;
LinkedList *allocate_list()
{
LinkedList *list = (LinkedList *)malloc(sizeof(LinkedList));
return list;
}
LinkedList *linkedlist_insert(LinkedList *list, Ht_item *item)
{
if (!list)
{
LinkedList *head = allocate_list();
head->item = item;
head->next = NULL;
list = head;
return list;
}
else if (list->next == NULL)
{
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
list->next = node;
return list;
}
LinkedList *temp = list;
while (temp->next->next)
{
temp = temp->next;
}
LinkedList *node = allocate_list();
node->item = item;
node->next = NULL;
temp->next = node;
return list;
}
Ht_item *linkedlist_remove(LinkedList *list)
{
if (!list)
return NULL;
if (!list->next)
return NULL;
LinkedList *node = list->next;
LinkedList *temp = list;
temp->next = NULL;
list = node;
Ht_item *it = NULL;
memcpy(temp->item, it, sizeof(Ht_item));
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
return it;
}
void free_linkedlist(LinkedList *list)
{
LinkedList *temp = list;
while (list)
{
temp = list;
list = list->next;
free(temp->item->key);
free(temp->item->value);
free(temp->item);
free(temp);
}
}
LinkedList **create_overflow_buckets(HashTable *table)
{
LinkedList **buckets = (LinkedList **)calloc(table->size, sizeof(LinkedList *));
for (int i = 0; i < table->size; i++)
buckets[i] = NULL;
return buckets;
}
void free_overflow_buckets(HashTable *table)
{
LinkedList **buckets = table->overflow_buckets;
for (int i = 0; i < table->size; i++)
free_linkedlist(buckets[i]);
free(buckets);
}
Ht_item *create_item(char *key, char *value)
{
Ht_item *item = (Ht_item *)malloc(sizeof(Ht_item));
item->key = (char *)malloc(strlen(key) + 1);
item->value = (char *)malloc(strlen(value) + 1);
strcpy(item->key, key);
strcpy(item->value, value);
return item;
}
HashTable *create_table(int size)
{
HashTable *table = (HashTable *)malloc(sizeof(HashTable));
table->size = size;
table->count = 0;
table->items = (Ht_item **)calloc(table->size, sizeof(Ht_item *));
for (int i = 0; i < table->size; i++)
table->items[i] = NULL;
table->overflow_buckets = create_overflow_buckets(table);
return table;
}
void free_item(Ht_item *item)
{
free(item->key);
free(item->value);
free(item);
}
void free_table(HashTable *table)
{
for (int i = 0; i < table->size; i++)
{
Ht_item *item = table->items[i];
if (item != NULL)
free_item(item);
}
free_overflow_buckets(table);
free(table->items);
free(table);
}
void handle_collision(HashTable *table, unsigned long index, Ht_item *item)
{
LinkedList *head = table->overflow_buckets[index];
if (head == NULL)
{
head = allocate_list();
head->item = item;
table->overflow_buckets[index] = head;
return;
}
else
{
table->overflow_buckets[index] = linkedlist_insert(head, item);
return;
}
}
void ht_insert(HashTable *table, char *key, char *value)
{
Ht_item *item = create_item(key, value);
int index = hash_function(key);
Ht_item *current_item = table->items[index];
if (current_item == NULL)
{
if (table->count == table->size)
{
printf("Insert Error: Hash Table is full\n");
free_item(item);
return;
}
table->items[index] = item;
table->count++;
}
else
{
if (strcmp(current_item->key, key) == 0)
{
strcpy(table->items[index]->value, value);
return;
}
else
{
handle_collision(table, index, item);
return;
}
}
}
char *ht_search(HashTable *table, char *key)
{
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
if (item != NULL)
{
if (strcmp(item->key, key) == 0)
return item->value;
if (head == NULL)
return NULL;
item = head->item;
head = head->next;
}
return NULL;
}
void ht_delete(HashTable *table, char *key)
{
int index = hash_function(key);
Ht_item *item = table->items[index];
LinkedList *head = table->overflow_buckets[index];
if (item == NULL)
{
return;
}
else
{
if (head == NULL && strcmp(item->key, key) == 0)
{
table->items[index] = NULL;
free_item(item);
table->count--;
return;
}
else if (head != NULL)
{
if (strcmp(item->key, key) == 0)
{
free_item(item);
LinkedList *node = head;
head = head->next;
node->next = NULL;
table->items[index] = create_item(node->item->key, node->item->value);
free_linkedlist(node);
table->overflow_buckets[index] = head;
return;
}
LinkedList *curr = head;
LinkedList *prev = NULL;
while (curr)
{
if (strcmp(curr->item->key, key) == 0)
{
if (prev == NULL)
{
free_linkedlist(head);
table->overflow_buckets[index] = NULL;
return;
}
else
{
prev->next = curr->next;
curr->next = NULL;
free_linkedlist(curr);
table->overflow_buckets[index] = head;
return;
}
}
curr = curr->next;
prev = curr;
}
}
}
}
void print_search(HashTable *table, char *key)
{
char *val;
if ((val = ht_search(table, key)) == NULL)
{
printf("Key:%s does not exist\n", key);
return;
}
else
{
printf("Key:%s, Value:%s\n", key, val);
}
}
void print_table(HashTable *table)
{
printf("\nHash Table\n-------------------\n");
for (int i = 0; i < table -> size; i++)
{
if (table -> items[i])
{
printf("Index:%d, Key:%s, Value:%s\n", i, table -> items[i] -> key, table -> items[i] -> value);
}
}
printf("-------------------\n\n");
}
int main()
{
HashTable *ht = create_table(CAPACITY);
ht_insert(ht, (char *)"1", (char *)"First address");
ht_insert(ht, (char *)"2", (char *)"Second address");
ht_insert(ht, (char *)"Hel", (char *)"Third address");
ht_insert(ht, (char *)"Cau", (char *)"Fourth address");
print_search(ht, (char *)"1");
print_search(ht, (char *)"2");
print_search(ht, (char *)"3");
print_search(ht, (char *)"Hel");
print_search(ht, (char *)"Cau");
print_table(ht);
ht_delete(ht, (char *)"1");
ht_delete(ht, (char *)"Cau");
print_table(ht);
free_table(ht);
return 0;
}
This code will produce the following output:
Output
Key:1, Value:First address
Key:2, Value:Second address
Key:3 does not exist
Key:Hel, Value:Third address
Key:Cau does not exist
Hash Table
-------------------
Index:49, Key:1, Value:First address
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
Hash Table
-------------------
Index:50, Key:2, Value:Second address
Index:281, Key:Hel, Value:Third address
-------------------
ht_insert()
, ht_search()
, and ht_delete
behave as expected.
In this article, you implemented a hash table from scratch in C/C++.
Experiment with other collision handling algorithms and different hash functions. Continue your learning with more C++ tutorials
if i want the user to input the key and value, how to do it? can you please help me?
- irene
else if (head != NULL) { // Collision Chain exists if (strcmp(item->key, key) == 0) { // Remove this item and set the head of the list // as the new item free_item(item); LinkedList* node = head; head = head->next; node->next = NULL; table->items[index] = create_item(node->item->key, node->item->value); free_linkedlist(node); table->overflow_buckets[index] = head; return; } LinkedList* curr = head; LinkedList* prev = NULL; while (curr) { if (strcmp(curr->item->key, key) == 0) { if (prev == NULL) { // First element of the chain. Remove the chain free_linkedlist(head); table->overflow_buckets[index] = NULL; return; } else { // This is somewhere in the chain prev->next = curr->next; curr->next = NULL; free_linkedlist(curr); table->overflow_buckets[index] = head; return; } } curr = curr->next; prev = curr; } } } } HI, I am quite not sure I understand the logic part of deleting element from table here. If I understand correctly, there are 2 possibilities if the collision chain exist. The element is on the table index, or it is on the chain, if it is on the table, we will delete it on the table, then move the first node of chain to the table, and the rest of the original chain will become the new chain attached to it. what confuses me is the second case; if it is on the chain, I saw your code check if it is on the head node or somewhere else in the chain. Why you remove the whole chain if there is one element matched on the chain. Aren’t we supposed to just detele the matched node and put the rest of chain back to it ? We are deleting the matched string, not the matched hashed function index. Thank you!
- unknown
This is the best tutorial for implementing hash table! Thank you so much ! Have a littble doubl in the delete table function. if (strcmp(curr->item->key, key) == 0) { if (prev == NULL) { // First element of the chain. Remove the chain free_linkedlist(head); table->overflow_buckets[index] = NULL; return; } Maybe if the matched element found on the first element of the chain, table->overflow_buckets[index] = NULL; We shouldn’t set it to null, since the chain has another nodes as well. we just need to take the heed node off the chain, then set the chain back to table ? thank you
- unknow
HI, i would like to add another thing I found. In the insert element into table function, I don’t think we need to check if the table is full, since a table using separate chaining can never be full. Once there is a collision, we add it to the chain, as long as we have enough memory. I feel like we need to check the memory allocation in this instead of the table index. thx!
- another
In the function ht_insert(), we should not call strcpy directly for the value, first we need to free the existing value and allocate new memory for the new value or we must call realloc. so below code will lead overflow. // Scenario 1: We only need to update value if (strcmp(current_item->key, key) == 0) { strcpy(table->items[index]->value, value); return; }
- Dileep
Since the linked list is not ordered, you can simplify the linkedlist_insert function by inserting at the head instead of the tail. This simplifies the function and makes insertion time O(1).
- Justin Bode
Could you post how to alter the same code for phone book application that compiles in dev c++
- Anon
Ht_item* it = NULL; memcpy(temp->item, it, sizeof(Ht_item)); Doesn’t your compiler warn about this? you are copying NULL to temp->item. shouldn’t this be: memcpy( it, temp->item, sizeof(Ht_item)); Because in the current implementation you are freeing NULL this should throw an error and can lead to an memory leak.
- Rick Nijhuis
Thank you for putting this article together. Conceptually I can follow and appreciate your work. I think their is some issue with the ht_insert and linkedlist_insert functions. At the time of writing I am unable to pin point the issue. Maybe you fixed it. If so I suggest revising the article. Regardless thank you for sharing. Regards Mahendra Gunawardena
- Mahendra Gunawardena
Create table as a redundant loop. You allocated memory with malloc which makes all inner cells NULL.
- Moshe