Hashmaps

Introduction

A hashmap is a data structure that stores key-value pairs, where each key is unique. It uses a hashing algorithm to map the keys to their corresponding values, which allows for efficient lookups, insertions, and deletions. Hashmaps are often used in programming languages as an efficient way to implement dictionaries, associative arrays, or maps.

In simple words, A hashmap is a way of storing data in a computer program, where each piece of data (a value) is associated with a unique identifier (a key). It allows for fast lookups, insertions, and deletions of data. It's similar to a dictionary in real life, where you can quickly look up a word and find its definition.

Internal Working of Hashmaps

The internal working of hashmaps involves several key components:

  1. Hash function: This is a function that takes a key as input and generates an index (also known as a hash code) that corresponds to a specific location in the array (also known as a bucket) where the value associated with that key is stored. The hash function should be designed to minimize collisions, where multiple keys are mapped to the same index.

  2. Array (or table): This is a data structure that stores the key-value pairs. The array has a fixed size, and each position corresponds to a specific index generated by the hash function.

  3. Collision resolution: When multiple keys are mapped to the same index, a collision occurs. Collision resolution is the process of determining how to handle this situation. Different types of hashmaps use other collision resolution techniques, such as open addressing, separate chaining, linear probing, Robin Hood hashing, and Cuckoo hashing.

  4. Load factor: The load factor is a measure of how full the hashmap is. It is calculated as the number of key-value pairs in the hashmap divided by the size of the array. A high load factor can lead to a large number of collisions, which can slow down the performance of the hashmap. To avoid this, the size of the array can be increased (rehashing) when the load factor exceeds a certain threshold.

  5. Rehashing: Rehashing is the process of creating a new hashmap with a larger array and reinserting all the key-value pairs from the original hashmap. Rehashing is done when the load factor exceeds a certain threshold, to maintain good performance.

Overall, Hashmaps use a combination of an array, a hash function, and collision resolution techniques to efficiently store and retrieve data, by mapping keys to their corresponding values, allowing for fast lookups and insertions.

Here is an example of how to implement a basic hashmap in C++ using open addressing and linear probing:

#include <iostream>
#include <vector>

const int SIZE = 10;

class HashMap {
 private:
  std::vector<int> arr;
  int size;

 public:
  HashMap() {
    size = 0;
    arr.resize(SIZE);
    for (int i = 0; i < SIZE; i++) {
      arr[i] = -1;
    }
  }

  int hash(int key) {
    return key % SIZE;
  }

  void insert(int key, int value) {
    int index = hash(key);
    if (arr[index] == -1) {
      arr[index] = value;
      size++;
    } else {
      int i = index;
      while (i < SIZE) {
        if (arr[i] == -1) {
          arr[i] = value;
          size++;
          break;
        }
        i++;
      }
    }
  }

  int search(int key) {
    int index = hash(key);
    if (arr[index] != -1) {
      return arr[index];
    } else {
      int i = index;
      while (i < SIZE) {
        if (arr[i] != -1) {
          return arr[i];
        }
        i++;
      }
    }
    return -1;
  }
};

int main() {
  HashMap hm;
  hm.insert(1, 10);
  hm.insert(2, 20);
  hm.insert(3, 30);

  std::cout << hm.search(1) << std::endl;
  std::cout << hm.search(2) << std::endl;
  std::cout << hm.search(3) << std::endl;
  return 0;
}

How to solve collision on hashmap?

Collision resolution in a hashmap is the process of determining how to handle the situation when multiple keys are mapped to the same index (or bucket) in the array by the hash function. There are several ways to resolve collisions, including:

  1. Open addressing: In this method, when a collision occurs, the algorithm looks for the next empty slot in the array to store the key-value pair. This can be done using linear probing, where the algorithm checks the next slot in the array until it finds an empty one; or using quadratic probing, where the algorithm checks the next slot using a quadratic function of the index.

  2. Separate chaining: In this method, when a collision occurs, the key-value pair is stored in a linked list at the index of the array where the collision occurred. This allows multiple key-value pairs to be stored at the same index and the linked list helps to traverse the values in O(n) time.

  3. Robin Hood Hashing: In this method when a collision occurs, it tries to find a slot that has a key that was hashed to a farther away slot. This way it tries to keep the keys that were hashed to farther away slots closer to their ideal position.

  4. Cuckoo Hashing: In this method, when a collision occurs, it replaces the key-value pair that's already in the table with the new one and re-inserts the replaced key-value pair using a different hash function.

Each of these collision resolution techniques has its advantages and disadvantages, and the choice of which one to use depends on the specific requirements of the application. Open addressing and separate chaining are the most commonly used techniques in hashmaps.

Application Of Hashmaps

Hashing can be implemented in a wide variety of applications, including:

  1. Data storage: Hashmaps are used to store key-value pairs in a way that allows for efficient lookups, insertions, and deletions. This is used in many databases, key-value stores, and other data storage systems.

  2. Caching: Hashmaps can be used to implement caching systems, where frequently accessed data is stored in memory for fast access.

  3. Searching: Hashmaps can be used to implement search algorithms for large data sets, such as in-memory search engines or spell checkers.

  4. Cryptography: Hash functions can be used to secure data by generating a unique, fixed-size value (hash) from an input of any size. This is used in digital signatures, password storage, and more.

  5. Networking: Hash functions can be used to distribute data across a network, such as in distributed hash tables (DHTs) used in peer-to-peer networks.

  6. Load balancing: Hash functions can be used to distribute incoming traffic evenly across multiple servers in a load-balancing setup.

  7. Data compression: Hash functions can be used to compress data by replacing repetitive patterns with unique, fixed-size values.

  8. Pattern matching: Hash functions can be used to quickly search for patterns in large data sets, such as in bioinformatics or network security.

These are just some examples of how hash functions and hashmaps can be used in various applications. The specific type of hashing algorithm used will depend on the requirements of the application, such as the need for collision resolution, the size of the data set, the desired speed and security level, and so on.