【数据结构】哈希表的原理及其实现

文章目录

哈希表的概念
哈希函数的设计
- 常见的哈希函数
哈希冲突
- 1. 闭散列
- - 代码实现
- 2. 开散列
- 拉链法的优点
针对开散列哈希的扩展
基于开散列拉链法封装哈希表
- MyHash.h
基于哈希表实现unordered_map类
- Myunordered_map.h
基于哈希表实现unordered_set类
- Myunordered_map.h

哈希表的概念

哈希表，也被称为散列表。是一种通过键可以快速找到对应值的一种数据结构。像我们常使用的unordered系列的容器，其本质就是一个哈希表。哈希表最显著的特点就是查找数据的效率非常的高，平均查找的时间复杂度为O(1)。通过哈希函数将键值映射到数组的具体位置，再对其进行操作。

哈希表查找的过程为：通过键来找到数组中的一个位置，通过这个位置我们能找到该键唯一映射的值。
哈希表的y重要设计模块就是哈希函数以及解决哈希冲突的方法。下面将对此做出解释：

哈希函数的设计

哈希表的性能的一个关键因素就是哈希函数的设计。具体的来说，哈希函数指的是通过键值找到映射到数组具体位置的方法。一个好的哈希函数能将键均匀的分布到哈希表的各个位置。

常见的哈希函数

直接定值法
取关键字的某个线性函数为散列地址，该线性函数的返回值应该是一个整数。比如Hash(key)=A*Key+B
除留余数法
设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址

哈希冲突

什么叫哈希冲突？哈希冲突又叫做哈希碰撞。具体是指，多个键通过一个哈希函数得到的位置相同。造成哈希冲突的主要原因是，哈希表的大小是有限的，而输入的数据可能会非常多，因此不可能避免出现哈希冲突。

那出现哈希冲突之后该如何解决呢？下面给出几种常见的解决哈希冲突的方法：

1. 闭散列

闭散列又叫开放定址法。当发生哈希冲突时，如果该哈希表还有空位，那么就从冲突位置开始往后找，直到找到空位。该如何往后找空位呢？一个一个找嘛？

线性探测：线性探测，从发生冲突的位置开始，一个一个往后面找，直到找到空位。
二次探测：从发生冲突的位置开始，使用一个二次探测公式来寻找下一个探测的位置，探测公式为：
H(k)=(Hash(k)+C1*i+C2*(i^2))%p
- 其中H(k)表示的是第i次探测的位置。
- Hash(k)表示通过哈希函数得到的初始地址
- C1和C2是一个常数，C1通常为0，C2通常为1，即H(k)=(Hash(k)+(i^2))%p
- i是探测的次数，从0开始
- p是哈希表的大小

二次探测的优点是能有效减少一次探测中数据过于“集中”的问题，即连续的冲突位置，会形成长链。而二次探测会使得探测的位置更加分散。

研究表明：当表的长度为质数且表装载因子a不超过0.5时，新的表项一定能够插入，而且任何一个位置都不会被探查两次。因此只要表中有一半的空位置，就不会存在表满的问题。在搜索时可以不考虑表装满的情况，但在插入时必须确保表的装载因子a不超过0.5，如果超出必须考虑增容。

装载因子又叫做负载因子，是哈希表中元素个数与哈希表大小的比值。

代码实现

哈希表类主要有两个模块需要实现，一个是节点类，还有一个是存储节点的容器。容器我们选择使用STL中的vector。对于节点类，我们希望至少能实现以下功能：

存储键值对
存储状态

值得注意的是，由于线性探测法的特性，删除一个元素之后可能会影响后续查找元素。我们需要给每个节点赋予三种状态表示：空节点、存在值得节点、被删除得节点。有了这三种种状态，我们在查找元素时，就不会因为中间曾经被删除的节点而影响。

#define _CRT_SECURE_NO_WARNINGS 1

#include<iostream>
#include<vector>
using namespace std;
namespace Close_Hash{
	enum Statu{
		EMPTY,
		EXIST,
		DEL
	};


	template<class K,class V>
	struct Node {
		Node():_statu(EMPTY){}
		pair<K, V> _kv;
		Statu _statu;
	};

	template<class K>
	struct HashFunc{
		size_t operator()(const K& key) {
			return (size_t)key;
		}
	};

	template<>
	struct HashFunc<string> {
		size_t operator()(const string& key) {
			size_t res = 0;
			for (auto it : key) {
				int u = it - '0';
				res = res * 131 + u;
			}
			return res;
		}
	};


	template<class K,class V,class Hash=HashFunc<K>>
	class HashTable {
	public:
		HashTable(size_t capacity = 10)
			: _totalSize(0)
			, _ht(capacity)

		{

		}
		V& operator[](const K& key) {

			Node<K, V>* node = Find(key);
			if (node) {
				return (node->_kv).second;
			}
			else {
				this->Insert(make_pair(key,0));

				return Find(key)->_kv.second;
			}
		}
		// 插入
		bool Insert(const pair<K, V>& val) {
			//负载超标
			if (_totalSize * 10 >= 7 * _ht.size()) {
				HashTable<K, V> newtable;
				
				newtable._ht.resize(_ht.size() * 2);
				for (int i = 0; i < _ht.size(); i++) {
					if (_ht[i]._statu == EXIST) {
						newtable.Insert(_ht[i]._kv);
					}
				}
				_ht.swap(newtable._ht);
			 }
			Hash ha;
			size_t pos = ha(val.first) % _ht.size();
			//线性探测

			while (_ht[pos]._statu == EXIST) {
				pos++;
				pos = pos % _ht.size();
			}
			_ht[pos]._kv = val;
			_ht[pos]._statu = EXIST;
			_totalSize++;
			return true;

		}

		// 查找
		Node<K,V>* Find(const K& key) {
			Hash ha;
			size_t pos = ha(key) % _ht.size();
			while (_ht[pos]._statu != EMPTY) {
				if (_ht[pos]._statu == EXIST && _ht[pos]._kv.first == key) {
					return &_ht[pos];
				}
				pos++;
				pos = pos % _ht.size();
			}
			return nullptr;
		}

		// 删除
		bool Erase(const K& key) {
			Node<K, V>* node = Find(key);
			if (!node)return false;
			node->_statu = DEL;
			return true;

		}

		size_t Size() {
			return _ht.size();
		}
		bool Empty() const
		{
			return _ht.empty();
		}


	private:
		vector<Node<K,V>> _ht;
		size_t _totalSize;
	};

}

2. 开散列

开散列法又被叫做链地址法或者拉链法。和闭散列不同的是，具有相同地址的关键码属于同一集合，并用一个单链表维护这个集合。我们将这个集合称为桶。开散列得到哈希表的每一个元素实际上是一个名为桶的单链表。于是，当我们发生哈希冲突时，不用去哈希表中找空位了，因为当前桶是一个单链表，插入元素时直接头插就好了。
在这里插入图片描述

拉链法的优点

拉链法处理哈希冲突的效率是非常高的。即使多个元素通过哈希函数得到的地址是一样的，也不会显著影响哈希表的性能。此外，拉链法下的负载因子是可以超过1的，因此拉链法下的哈希表可以灵活的应对动态数据集的增长，不需要频繁的调整哈希表的大小。值得一提的是，虽然看上去会使用额外的空间来存放指针，但由于不需要严格维护平衡因子，空间效率并不会比开放定址法低。因为开放定址法需要开大量额外的空间来保证负载因子不溢出。

代码实现：

跟线性探测代码不一样的地方在于，拉链法的节点实际上是一个链表的头节点（桶）。所以对于哈希表的节点类，我们希望有以下功能：

指向下一个元素的指针
存储键值对

在插入元素时，将新节点插入到桶中。

#pragma once
#include<vector>
#include<iostream>
using namespace std;

namespace hash_bucket {
	
	template<class K,class V>
	struct HashNode {
		HashNode(const pair<K, V>& kv)
			: _kv(kv)
			,_next(nullptr)
		{

		}
		HashNode()
		   :_next(nullptr)
		{

		}
		pair<K, V> _kv;
		HashNode<K, V>* _next;
	};

	template<class K>
	struct HashFunc {
		size_t operator()(const K& k) {
			return (size_t)k;
		}
	};
	template<>
	struct HashFunc<string> {
		size_t operator()(const string& str) {
			size_t res = 0;
			for (auto it : str) {
				int u = it;
				res = res * 131 + u;
			}
			return res;
		}
	};

	template<class K,class V,class Hash=HashFunc<K>>
	class HashTable {
	public:
		typedef HashNode<K, V> Node;
		typedef Node* pNode;
		HashTable()
			:_table(10, nullptr)
			, _n(0)
		{

		}
		//随机访问
		V& operator[](const K& k) {
			if (!Find(k)) {
				Insert(make_pair(k,V()));
			}
			return Find(k)->_kv.second;
		}

		//插入
		bool Insert(const pair<K, V>& kv) {
			Hash ha;
			//负载因子为1时扩容
			if (_n == _table.size()) {
				HashTable<K, V> newtable;
				newtable._table.resize(_table.size() * 2);

				for (size_t i = 0; i < _table.size(); i++) {
					if (_table[i]) {
						pNode cur = _table[i];
						while (cur) {
							pNode temp = cur->_next;
							size_t pos = ha(cur->_kv.first) % newtable._table.size();
							if (newtable._table[pos]) {
								cur->_next = newtable._table[pos];
							}
							newtable._table[pos] = cur;
							cur = temp;
						}
					}
				}
				_table.swap(newtable._table);
			}
	
			size_t pos = ha(kv.first) % _table.size();
			pNode newnode = new Node(kv);
	 		if (_table[pos] != nullptr) {
				newnode->_next = _table[pos];
			}
			_table[pos] = newnode;
			_n++;
			return true;
		}

		//查找
		pNode Find(const K& k) {
			Hash ha;
			size_t pos= ha(k) % _table.size();
			pNode cur = _table[pos];
			while (cur) {
				if (cur->_kv.first == k)return cur;
				cur = cur->_next;
			}
			return nullptr;
		}

		//删除
		bool Erase(const K& k) {
			Hash ha;
			if (!Find(k))return false;
			size_t pos = ha(k) % _table.size();
			pNode pre = nullptr;
			pNode cur = _table[pos];
			if (cur->_kv.first == k) {
				delete cur;
				_table[pos] = nullptr;
				return true;
			}
			else {
				while (cur) {
					if (cur->_kv.first == k) {
						pre->_next = cur->_next;
						delete cur;
						cur = nullptr;
						return true;
					}
					pre = cur;
					cur = cur->_next;
				}
				return false;
			}
			
		}

	private:
		vector<pNode> _table;
		int _n;//存储有效桶的个数
	};

}

上述代码大致实现了基于开散列法的哈希表，支持键值对的插入，删除、查询以及修改。

注意开散列法哈希表的扩容方案。当负载因子达到一定值时，我们选择扩大哈希表的大小，对于旧哈希表的元素不用再重新拷贝构造，而是移植到新表中。这样就节省了很多空间。具体实现如下：

//负载因子为1时扩容
if (_n == _table.size()) {
	HashTable<K, V> newtable;//新表
	newtable._table.resize(_table.size() * 2);

	for (size_t i = 0; i < _table.size(); i++) {
		if (_table[i]) {//遍历旧表，移植节点
			pNode cur = _table[i];
			while (cur) {
				pNode temp = cur->_next;
				size_t pos = ha(cur->_kv.first) % newtable._table.size();
				if (newtable._table[pos]) {
					cur->_next = newtable._table[pos];
				}
				newtable._table[pos] = cur;
				cur = temp;
			}
		}
	}
	_table.swap(newtable._table);
}

针对开散列哈希的扩展

如何给上述开散列哈希表类增加迭代器呢？
由于迭代器中需要访问哈希表而不仅仅是桶，而哈希表又属于哈希表类的私有成员。因此，设计迭代器模块时，我们可以将迭代器类设置为哈希表类的内部类。而一个类的内部类是这个类的友元，因此迭代器就能访问到哈希表。给出以迭代器代码，每一个迭代器对象指向哈希表中桶内的一个节点：

//迭代器内部类
template<class Ptr, class Ref>
struct _HashIterator {
	typedef _HashIterator<Ptr, Ref> self;
	pNode _node;
   const HashTable* _pht;

	_HashIterator(pNode pnode, const HashTable* pht)
		:_node(pnode)
		, _pht(pht)
	{

	}
	Ref operator*() {
		return _node->_data;
	}

	Ptr operator->() {
		return &_node->_data;
	}

	self& operator=(const self it) {
		_node = it._node;
		_pht = it._pht;
		return *this;
	}

	self& operator++() {
		if (_node->_next) {
			_node = _node->_next;
		}
		else {
			KeyOft kft;
			Hash ha;
			size_t pos = ha(kft(_node->_data)) % _pht->_table.size();
			size_t i = pos + 1;
			for (; i < _pht->_table.size(); i++) {
				if (_pht->_table[i]) {
					break;
				}
			}
			if (i == _pht->_table.size()) {
				_node = nullptr;
			}
			else {
				_node = _pht->_table[i];
			}

		}
		return *this;
	}

	bool operator!=(const self& iterator) {
		return iterator._node != _node;
	}
};

该迭代器主要实现了重载操作符的作用，希望能将迭代器当指针来使用。

基于开散列拉链法封装哈希表

MyHash.h

该文件封装实现了一个哈希表类，哈希函数是除留余数，解决哈希冲突的方法采用拉链法。
代码：

#pragma once
#include<vector>
#include<iostream>
using namespace std;

namespace hash_bucket {
	template<class T>
	struct HashNode {
		HashNode(const T& data)
			: _data(data)
			, _next(nullptr)
		{

		}
		HashNode()
			:_next(nullptr)
		{

		}
		T _data;
		HashNode<T>* _next;
	};

	template<class K>
	struct HashFunc {
		size_t operator()(const K& k) {
			return (size_t)k;
		}
	};


	template<>
	struct HashFunc<string> {
		size_t operator()(const string& str) {
			size_t res = 0;
			for (auto it : str) {
				int u = it;
				res = res * 131 + u;
			}
			return res;
		}
	};


	template<class K, class T, class KeyOft, class Hash = HashFunc<K>>
	class HashTable {
	public:
		typedef HashNode<T> Node;
		typedef Node* pNode;
		HashTable()
			:_table(10, nullptr)
			, _n(0)
		{

		}
		//迭代器内部类
		template<class Ptr, class Ref>
		struct _HashIterator {
			typedef _HashIterator<Ptr, Ref> self;
			pNode _node;
		   const HashTable* _pht;

			_HashIterator(pNode pnode, const HashTable* pht)
				:_node(pnode)
				, _pht(pht)
			{

			}
			Ref operator*() {
				return _node->_data;
			}

			Ptr operator->() {
				return &_node->_data;
			}

			self& operator=(const self it) {
				_node = it._node;
				_pht = it._pht;
				return *this;
			}

			self& operator++() {
				if (_node->_next) {
					_node = _node->_next;
				}
				else {
					KeyOft kft;
					Hash ha;
					size_t pos = ha(kft(_node->_data)) % _pht->_table.size();
					size_t i = pos + 1;
					for (; i < _pht->_table.size(); i++) {
						if (_pht->_table[i]) {
							break;
						}
					}
					if (i == _pht->_table.size()) {
						_node = nullptr;
					}
					else {
						_node = _pht->_table[i];
					}

				}
				return *this;
			}

			bool operator!=(const self& iterator) {
				return iterator._node != _node;
			}
		};

		typedef _HashIterator<T*, T&> Iterator;
		typedef _HashIterator<const T*, const T&> Const_Iterator;

		//迭代器
		Iterator begin() {
			for (size_t i = 0; i < _table.size(); i++) {
				if (_table[i]) {
					return Iterator(_table[i], this);
				}
			}
			return Iterator(nullptr, this);
		}

		Iterator end() {
			return Iterator(nullptr, this);
		}
		Const_Iterator begin()const {
			for (size_t i = 0; i < _table.size(); i++) {
				if (_table[i]) {
					return Iterator(_table[i], this);
				}
			}
			return Iterator(nullptr, this);
		}

		Const_Iterator end() const {
			return Iterator(nullptr, this);
		}



		//随机访问
		/*V& operator[](const K& k) {
			if (!Find(k)) {
				Insert(make_pair(k,V()));
			}
			return Find(k)->_kv.second;
		}*/

		//插入
		pair<Iterator, bool> Insert(const T& val) {

			Hash ha;
			KeyOft kft;
			Iterator it = Find(kft(val));
			if (it._node != nullptr)return make_pair(it, false);
			//负载因子为1时扩容
			if (_n == _table.size()) {
				HashTable<K, T, KeyOft> newtable;
				newtable._table.resize(_table.size() * 2);

				for (size_t i = 0; i < _table.size(); i++) {
					if (_table[i]) {
						pNode cur = _table[i];
						while (cur) {
							pNode temp = cur->_next;
							size_t pos = ha(kft(cur->_data)) % newtable._table.size();
							if (newtable._table[pos]) {
								cur->_next = newtable._table[pos];
							}
							newtable._table[pos] = cur;
							cur = temp;
						}
					}
				}
				_table.swap(newtable._table);
			}

			size_t pos = ha(kft(val)) % _table.size();
			pNode newnode = new Node(val);
			if (_table[pos] != nullptr) {
				newnode->_next = _table[pos];
			}
			_table[pos] = newnode;
			_n++;
			return make_pair(Iterator(newnode, this), true);
		}

		//查找
		Iterator Find(const K& k) {
			Hash ha;
			KeyOft kft;
			size_t pos = ha(k) % _table.size();
			pNode cur = _table[pos];
			while (cur) {
				if (kft(cur->_data) == k)return Iterator(cur, this);
				cur = cur->_next;
			}
			return Iterator(nullptr, this);
		}

		//删除
		bool Erase(const K& k) {
			Hash ha;
			KeyOft kft;
			if (!Find(k))return false;
			size_t pos = ha(k) % _table.size();
			pNode pre = nullptr;
			pNode cur = _table[pos];
			if (kft(cur->_data) == k) {
				delete cur;
				_table[pos] = nullptr;
				return true;
			}
			else {
				while (cur) {
					if (kft(cur->_data) == k) {
						pre->_next = cur->_next;
						delete cur;
						cur = nullptr;
						return true;
					}
					pre = cur;
					cur = cur->_next;
				}
				return false;
			}

		}

	private:
		vector<pNode> _table;
		int _n;
	};

}

基于哈希表实现unordered_map类

基于MyHash.h封装的哈希表类提供的接口，实现一个unordered_map。unordered_map的值是一个键值对。

Myunordered_map.h

该文件基本实现了unordered_map类。
代码：

#define _CRT_SECURE_NO_WARNINGS 1
#include"MyHash.h"

namespace bit {
	template<class K, class V, class Hash =	hash_bucket::HashFunc<K> >
	class unordered_map {
		struct MapKeyOft {
			const K& operator()(const pair<K, V>& kv) {
				return kv.first;
			}
		};
	public:
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOft, Hash>::Iterator iterator;
		//typedef typename hash_bucket::HashTable<K, pair<const K, V>, KeyOft, Hash>::Const_Iterator const_iterator;
		iterator begin() {
			return _ht.begin();
		}
		iterator end() {
			return _ht.end();
		}

		V& operator[](const K& k) {
			pair<iterator,bool>res=insert(make_pair(k, V()));
			return res.first->second;
		}

		pair<iterator, bool> insert(const pair<K,V>& kv) {
			return _ht.Insert(kv);
		}

		bool erase(const K& k) {
			return _ht.Erase(k);
		}
		pair<iterator, bool> find(const K& k) {
			return _ht.Find(k);
		}

	private:
		hash_bucket::HashTable<K, pair<const K, V>, MapKeyOft, Hash> _ht;
	};

}

在这里插入图片描述

基于哈希表实现unordered_set类

基于MyHash.h封装的哈希表类提供的接口，实现一个unordered_set类。

Myunordered_map.h

该文件基本实现了unordered_set类。
代码：

#include"MyHash.h"

namespace bit {
	template<class K,class Hash = hash_bucket::HashFunc<K> >
	class unordered_set {
		struct SetKeyOft {
			const K& operator()(const K& k) {
				return k;
			}
		};
	public:
		typedef typename hash_bucket::HashTable<K, const K, SetKeyOft, Hash>::Iterator iterator;
		typedef typename hash_bucket::HashTable<K, const K, SetKeyOft, Hash>::Const_Iterator const_iterator;
		iterator begin() {
			return _ht.begin();
		}
		iterator end() {
			return _ht.end();
		}

		

		pair<iterator, bool> insert(const K& kv) {
			return _ht.Insert(kv);
		}

		bool erase(const K& k) {
			return _ht.Erase(k);
		}
		pair<iterator, bool> find(const K& k) {
			return _ht.Find(k);
		}

	private:
		hash_bucket::HashTable<K, const K, SetKeyOft, Hash> _ht;
	};

}