【C++】unordered_map和unordered

哈希表

1. unordered_map
- 1.1 概念
- 1.2 常见接口
2. unordered_set
- 2.1 概念
- 2.1 常见接口
3. 底层实现
- 3.1 哈希
- 3.2 哈希函数
- 3.3 闭散列和开散列
- - 3.3.1 闭散列
  - 3.3.2 开散列
- 3.4 模拟实现
- - 3.4.1 改造哈希桶
  - 3.4.2 模拟实现unordered_set
  - 3.4.3 模拟实现unordered_map

在C++11中，STL新增了4个unordered_xxx系列的关联式容器，它们在用法上和红黑树的关联式容器类似。但在查询效率上，unordered_xxx系列略胜一筹。红黑树中的节点非常多时，查询效率也不理想（需要进行多次比较），最坏情况下要比较高度次，unordered_xx系列容器进行很少的比较次数就能够将元素找到。

1. unordered_map

1.1 概念

在这里插入图片描述

unordered_map存储的是pair<key,value>的键值对。
unordered_map的key是唯一的，通过key可以找到对应的value。
unordered_map存储的元素是无序的。
unordered_map的迭代器是单向迭代器（前向迭代器）。
unordered_map支持[]访问，通过key返回value。

1.2 常见接口

构造

在这里插入图片描述

容量

在这里插入图片描述

迭代器

在这里插入图片描述
只有正向迭代器，没有反向迭代器。

元素查找

（1）find

在这里插入图片描述

（2）count
在这里插入图片描述

（3）equal_range
在这里插入图片描述

插入和删除

在这里插入图片描述

哈希桶

在这里插入图片描述

[]访问

在这里插入图片描述

2. unordered_set

2.1 概念

在这里插入图片描述

2.1 常见接口

unordered_set的常见接口和unordered_map的接口一样，这里就不赘述。

3. 底层实现

unordered系列的关联式容器之所以效率比较高，是因为其底层使用了哈希结构。

3.1 哈希

哈希是存储的值与存储位置建立一个一一映射的关系。哈希表就是用哈希函数建立的结构，通过这层关系就可以找到想要的元素，不需要红黑树需要进行多次比较。
不同关键字通过相同哈希哈数计算出相同的哈希地址，该种现象称为哈希冲突或哈希碰撞。这些不同关键字映射出相同的哈希函数，叫做同义词。
如何解决哈希冲突？可以根据不同的场景，使用合适的哈希函数。

3.2 哈希函数

这里讲的两种方法是最常见的。

直接定址法

取关键字的某个线性函数为散列地址：Hash（Key）= A*Key + B。这种方法比较简单，映射的哈希函数值比较均匀。但是只适用于范围集中的关键字，否则映射的哈希值的差越大，浪费的空间越多。

除留余数法

设散列表中允许的地址数为m，取一个不大于m，但最接近或者等于m的质数p作为除数，
按照哈希函数：Hash(key) = key% p(p<=m),将关键码转换成哈希地址。

3.3 闭散列和开散列

闭散列和开散列是解决哈希冲突的方法，采用的哈希函数是除留余数法。

3.3.1 闭散列

闭散列（也叫开放定址法），在插入关键字时，如果关键字映射的哈希地址内容为空，直接填入关键字，如果关键字映射的哈希地址内容非空，将关键字填入下一个空位置。这下一个空位置，可以由两种方法得到：线性探测和二次探测。

线性探测

（1）在插入元素发现，映射的位置发生冲突，此时向后探测，直到发现空位置即可。
（2）在删除元素时，不能直接删除，因为删除后会影响元素的查找和插入，应该用标记法记录元素此时的状态。
（3）如果不停地插入元素，最终找不到空位置，怎么办？不会找不到空位置的，在哈希表中元素个数和哈希表的长度达到某种关系后，就会扩容。这种关系就是负载因子α = 元素个数/哈希表的长度，α越大，产生冲突的概率越大，α越小，产生冲突的概率越小，但浪费的空间越多。所以选择合适的α是有必要的，一般α的范围是0.7~0.8。也就是说，当α到达这个范围就得扩容。
（4）如果插入元素不是整数，怎么办？想办法把它转换成整数。
在这里插入图片描述

//在取模操作时，如果key不是整形，怎么办？想办法让它转换成整形，确保查找和删除时能准确定位，建立新的映射关系。
//为此，可以写个仿函数
template<class K>
struct ConvertToInt
{
	size_t operator()(const K& key)
	{
		return size_t(key);
	}
};
//如果key是个string，如何将string转换成整形
//返回第一个字符，那就太容易冲突；将所有字符加起来，但string的字符顺序不同，转换成整形相同，也容易冲突
//可以用到一些大佬的算法：BKDR算法。就可以减少冲突。
template<>
struct ConvertToInt< string >
{
	size_t operator()(const string&str)
	{
		size_t hash = 0;
		for (auto e : str)
		{
			hash *= 131;
			hash += e;
		}
		return hash;
	}
};

namespace open_address
{
	//元素状态
	//为什么要设置元素的状态？当元素被删除，与该元素有相同余数的元素就得往前挪一位，效率低
	//所以设置状态标记，将删除元素的位置设置为DELETE，有元素的位置设置为EXIST，没有元素的位置设置为EMPTY
	enum state
	{
		EXIST,
		EMPTY,
		DELETE
	};

	//哈希表存储元素的类型
	//假设存储的是pair
	template<class K,class V>
	struct HashData
	{
		pair<K, V> _data;
		state _state = EMPTY;
	};

	template<class K,class V,class HashFunc = ConvertToInt<K>>
	class HashTable
	{
	public:
		HashTable()
		{
			//初始大小设置为10
			_table.resize(10);
		}

		//查找
		HashData<const K, V>* Find(const K&key)
		{
			HashFunc hf;

			//除留余数法，获得在哈希表中的下标。
			//为什么不%_capacity？如果%capacity，下标可能大于size，但[]访问限制下标<size。
			int hashi = hf(key) % _table.size();
			//为什么查找到状态为EMPTY的元素就结束？因为哈希表是不会满的，否则效率就降低。
			while (_table[hashi]._state != EMPTY)
			{
				if(_table[hashi]._state == EXIST && _table[hashi]._data.first == key)
				{
					return (HashData<const K, V>*) & _table[hashi];
				}
				++hashi;
				//防止越界
				hashi %= _table.size();
			}
			return nullptr;
		}

		//插入
		bool Insert(const pair<K, V>& data)
		{
			if (Find(data.first))
			{
				return false;
			}

			//考虑扩容
			//负载因子 = 填入表中的元素个数（_n）/散列表的长度（_table.size()）
			//负载因子越大，发生冲突的概率越大；负载因子越小，发生冲突的概率越小，空间利用率越低。
			//哈希表不能满了再扩容，控制负载因子到一定值就扩容，比如0.7
			if ((double)_n / _table.size() >= 0.7)
			{
				//扩容后有些值的映射关系可能发生变化。原本冲突的现在不一定冲突；原本不冲突的现在可能冲突
				HashTable<K, V> newHT;
				newHT._table.resize(_table.size() * 2);
				for (auto& e : _table)
				{
					if (e._state == EXIST)
					{
						//newHT空间足够，就不会进到扩容里面，不会造成死循环
						newHT.Insert(e._data);
					}
				}
				_table.swap(newHT._table);
			}

			HashFunc hf;
			int hashi = hf(data.first) % _table.size();
			//不用担心找不到状态为空的位置，因为前面已经扩容
			while (_table[hashi]._state == EXIST)
			{
				++hashi;
				hashi %= _table.size();
			}
			_table[hashi]._data = data;
			_table[hashi]._state = EXIST;
			++_n;
			return true;
		}

		//删除
		bool Erase(const K& key)
		{
			HashData<const K,V>*ret = Find(key);
			if (ret)
			{
				//不能直接删除，应该将状态置成DELETE
				ret->_state = DELETE;
				--_n;
				return true;
			}
			return false;
		}

	private:
		vector<HashData<K, V>> _table;
		size_t _n = 0;//哈希表中有效元素（不包括删除的节点）的个数
	};
}

二次探测

二次探测是在存储位置冲突时，在hashi+i ^2的位置或者在hashi-i ^2的位置，i取决于在查找和插入时冲突的次数。二次探测只是存储位置的规则发生变化，其他与线性探测一样。
在这里插入图片描述

3.3.2 开散列

闭散列的方法空间利用率太低，浪费空间，且冲突会相互影响，你抢占我的位置，我抢占他的位置。所以就有第二种方法开散列（也叫拉链法/哈希桶）。
在这里插入图片描述
开散列的实现

//解决哈希冲突的方法：拉链法/哈希桶

//在取模操作时，如果key不是整形，怎么办？想办法让它转换成整形，确保查找和删除时能准确定位，建立新的映射关系。
//为此，可以写个仿函数
template<class K>
struct ConvertToInt
{
	size_t operator()(const K& key)
	{
		return size_t(key);
	}
};
//如果key是个string，如何将string转换成整形
//返回第一个字符，那就太容易冲突；将所有字符加起来，但string的字符顺序不同，转换成整形相同，也容易冲突
//可以用到一些大佬的算法：BKDR算法。
template<>
struct ConvertToInt< string >
{
	size_t operator()(const string&str)
	{
		size_t hash = 0;
		for (auto e : str)
		{
			hash *= 131;
			hash += e;
		}
		return hash;
	}
};

namespace hash_bucket
{
	//节点
	//不同于开放定址法，哈希桶放的是节点
	template<class K, class V>
	struct HashNode
	{
		pair<K, V> _data;
		HashNode<K, V>* _next;

		HashNode(const pair<K, V>& data)
			:_data(data)
			, _next(nullptr)
		{}
	};

	//哈希表
	template<class K, class V, class HashFunc = ConvertToInt<K>>
	class HashTable
	{
	public:
		typedef HashNode<K, V> Node;
		//构造
		HashTable()
		{
			//先开10个哈希桶
			_table.resize(10,nullptr);
		}
		
		//析构
		//析构函数需要自己定义，因为指针是内置类型，编译器不会调用其析构
		~HashTable()
		{
			for (size_t i = 0; i < _table.size(); i++)
			{
				Node* cur = _table[i];
				Node* next = nullptr;
				while (cur)
				{
					next = cur->_next;
					delete cur;
					cur = next;
				}
				_table[i] = nullptr;
			}
		}

		//查找
		Node* Find(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _table.size();
			Node* cur = _table[hashi];
			while (cur)
			{
				if (cur->_data.first == key)
				{
					return cur;
				}
				cur = cur->_next;
			}
			return nullptr;
		}

		//插入
		bool Insert(const pair<K,V>& data)
		{
			if (Find(data.first))
			{
				return false;
			}

			HashFunc hf;

			//扩容
			//为什么还需要扩容？将元素挂在哈希桶不就可以了，不用考虑容量问题。
			//不扩容，不断插入节点，某些桶越来越长，和链表一样，查找效率就下降了。
			//负载因子可以适当放大，一般负载因子控制在1，平均下来每个桶一个元素，这样查找效率就很高。
			if (_n == _table.size())
			{
				vector<Node*> newtable;
				size_t newsize = _table.size() * 2;
				newtable.resize(newsize, nullptr);
				//遍历旧表，将旧表中的节点牵过来，挂在新表对应的位置
				for (size_t i = 0; i < _table.size(); i++)
				{
					Node* cur = _table[i];
					while (cur)
					{
						Node* next = cur->_next;
						//在新表中，插入位置可能改变
						size_t hashi = hf(cur->_data.first) % newsize;
						//头插
						cur->_next = newtable[hashi];
						newtable[hashi] = cur;
						cur = next;
					}
					_table[i] = nullptr;
				}
				_table.swap(newtable);
			}

			size_t hashi = hf(data.first) % _table.size();
			//头插
			Node* newnode = new Node(data);
			newnode->_next = _table[hashi];
			_table[hashi] = newnode;
			++_n;
			return true;
		}

		//删除
		bool Erase(const K& key)
		{
			HashFunc hf;
			size_t hashi = hf(key) % _table.size();
			Node* cur = _table[hashi];
			Node* prev = nullptr;
			while (cur)
			{
				if (cur->_data.first == key)
				{
					if (prev)
					{
						prev->_next = cur->_next;
					}
					else
					{
						_table[hashi] = cur->_next;
					}
					delete cur;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}
			return false;
		}

		//打印
		void Print()
		{
			for (size_t i = 0; i < _table.size(); i++)
			{
				printf("[%d]->", i);
				Node* cur = _table[i];
				while (cur)
				{
					cout << cur->_data.first << ":" << cur->_data.second << "->";
					cur = cur->_next;
				}
				cout << "NULL" << endl;
			}
			cout << endl;
		}
	private:
		vector<Node*> _table;
		size_t _n;
	};
}

3.4 模拟实现

3.4.1 改造哈希桶

template<class K>
struct ConvertToInt
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};
template<>
struct ConvertToInt< string >
{
	size_t operator()(const string&str)
	{
		size_t hash = 0;
		for (auto e : str)
		{
			hash *= 131;
			hash += e;
		}
		return hash;
	}
};
//改造哈希桶，将它封装成unordered_set和unordered_map
namespace hash_bucket
{
	//节点
	//不同于开放定址法，哈希桶放的是节点
	template<class T>
	struct HashNode
	{
		T _data;
		HashNode<T>* _next;

		HashNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};

	//前置声明
	//哈希表定义在后，在迭代器使用哈希表指针，得前置声明
	//K是关键码的类型；T是存储元素的类型，如果是unordered_set，T就是K，如果是unordereed_map，T就是键值对；
	//KeyOfT是仿函数，可以提取T中的K；HashFunc是仿函数，可以将K转换成整形
	template<class K, class T, class KeyOfT, class HashFunc>
	class HashTable;

	//迭代器
	//为了实现简单，在哈希桶的迭代器类中需要用到HashTable本身
	template<class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
	struct HashTableIterator
	{
		typedef HashNode<T> Node;
		typedef HashTableIterator<K, T, Ref, Ptr, KeyOfT, HashFunc> Self;
		typedef HashTableIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
		
		//当是const_iterator的构造，支持iterator转换成const_iterator
		//当是普通迭代器，这个函数是拷贝构造
		HashTableIterator(const iterator& it)
			:_node(it._node)
			, _pht(it._pht)
		{}


		Node* _node;
		//在迭代器中没有修改_pht的需要，所以可以是const类型
		const HashTable<K, T, KeyOfT, HashFunc>* _pht;//指向哈希表的指针

		HashTableIterator(Node* node,HashTable<K,T,KeyOfT,HashFunc>* pht)
			:_node(node)
			,_pht(pht)
		{}
		//重载一个构造，用来接受const修饰的HashTable
		HashTableIterator(Node* node, const HashTable<K, T, KeyOfT, HashFunc>* pht)
			:_node(node)
			, _pht(pht)//_pht也必须是const类型，否则权限就放大
		{}

		Ref operator*()
		{
			return _node->_data;
		}

		Ptr operator->()
		{
			return &(_node->_data);
		}

		Self& operator++()
		{
			//当前桶还没完
			if (_node->_next)
			{
				_node = _node->_next;
			}
			//当前桶已走到头，通过哈希表指针走到下一个桶
			else
			{
				KeyOfT kot;
				HashFunc hf;
				size_t hashi = hf(kot(_node->_data)) % _pht->_table.size();
				//从下一个位置开始查找一个不为空的桶
				++hashi;
				while (hashi < _pht->_table.size())
				{
					if (_pht->_table[hashi])
					{
						_node = _pht->_table[hashi];
						return *this;
					}
					++hashi;
				}
				_node = nullptr;
			}
			return *this;
		}

		bool operator!=(const Self& s)
		{
			return _node != s._node;
		}

		bool operator==(const Self& s)
		{
			return _node == s._node;
		}
	};

	//哈希表
	template<class K, class T, class KeyOfT, class HashFunc = ConvertToInt<K>>
	class HashTable
	{
	public:
		typedef HashNode<T> Node;
		typedef HashTableIterator<K, T, T&, T*, KeyOfT, HashFunc> iterator;
		typedef HashTableIterator<K, T, const T&, const T*, KeyOfT, HashFunc> const_iterator;


		// 友元声明
		template<class K, class T, class Ref, class Ptr, class KeyOfT, class HashFunc>
		friend struct HashTableIterator;

		iterator begin()
		{
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i])
				{
					return iterator(_table[i], this);
				}
			}
			return iterator(nullptr, this);
		}

		iterator end()
		{
			return iterator(nullptr, this);
		}

		const_iterator begin()const
		{
			for (size_t i = 0; i < _table.size(); i++)
			{
				if (_table[i])
				{
					//const修饰的this指向的内容，传this过去
					//构造必须要必须用const修饰的HashTable接收
					return const_iterator(_table[i], this);
				}
			}
			return const_iterator(nullptr, this);
		}

		const_iterator end()const
		{
			return const_iterator(nullptr, this);
		}

		//构造
		HashTable()
		{
			//先开10个哈希桶
			_table.resize(10,nullptr);
		}
		
		//析构
		//析构函数需要自己定义，因为指针是内置类型，编译器不会调用其析构
		~HashTable()
		{
			for (size_t i = 0; i < _table.size(); i++)
			{
				Node* cur = _table[i];
				Node* next = nullptr;
				while (cur)
				{
					next = cur->_next;
					delete cur;
					cur = next;
				}
				_table[i] = nullptr;
			}
		}

		//查找
		iterator Find(const K& key)
		{
			KeyOfT kot;
			HashFunc hf;
			size_t hashi = hf(key) % _table.size();
			Node* cur = _table[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return iterator(cur, this);
				}
				cur = cur->_next;
			}
			return iterator(nullptr, this);
		}

		//插入
		pair<iterator,bool> Insert(const T& data)
		{
			KeyOfT kot;
			iterator it = Find(kot(data));
			if (it!=end())
			{
				return make_pair(it,false);
			}

			HashFunc hf;

			//扩容
			//为什么还需要扩容？将元素挂在哈希桶不就可以了，不用考虑容量问题。
			//不扩容，不断插入节点，某些桶越来越长，和链表一样，查找效率就下降了。
			//负载因子可以适当放大，一般负载因子控制在1，平均下来每个桶一个元素，这样查找效率就很高。
			if (_n == _table.size())
			{
				vector<Node*> newtable;
				size_t newsize = _table.size() * 2;
				newtable.resize(newsize, nullptr);
				//遍历旧表，将旧表中的节点牵过来，挂在新表对应的位置
				for (size_t i = 0; i < _table.size(); i++)
				{
					Node* cur = _table[i];
					while (cur)
					{
						Node* next = cur->_next;
						//在新表中，插入位置可能改变
						size_t hashi = hf(kot(cur->_data)) % newsize;
						cur->_next = newtable[hashi];
						newtable[hashi] = cur;
						cur = next;
					}
					_table[i] = nullptr;
				}
				_table.swap(newtable);
			}

			size_t hashi = hf(kot(data)) % _table.size();
			//头插
			Node* newnode = new Node(data);
			newnode->_next = _table[hashi];
			_table[hashi] = newnode;
			++_n;
			return make_pair(iterator(newnode, this), true);
		}

		//删除
		bool Erase(const K& key)
		{
			KeyOfT kot;
			HashFunc hf;
			size_t hashi = hf(key) % _table.size();
			Node* cur = _table[hashi];
			Node* prev = nullptr;
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					if (prev)
					{
						prev->_next = cur->_next;
					}
					else
					{
						_table[hashi] = cur->_next;
					}
					delete cur;
					return true;
				}
				prev = cur;
				cur = cur->_next;
			}
			return false;
		}

		//打印
		void Print()
		{
			KeyOfT kot;
			for (size_t i = 0; i < _table.size(); i++)
			{
				printf("[%d]->", i);
				Node* cur = _table[i];
				while (cur)
				{
					cout << cur->_data.first << ":" << cur->_data.second << "->";
					cur = cur->_next;
				}
				cout << "NULL" << endl;
			}
			cout << endl;
		}
	private:
		vector<Node*> _table;
		size_t _n;
	};
}

3.4.2 模拟实现unordered_set

#include"MyHashTable.h"
namespace zn
{
	template<class K>
	class unordered_set
	{
		struct SetKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
	public:
		//unordered_set的key不能修改，所以普通迭代器的底层是const迭代器
		//+typename告诉编译器，这是内置类型，不是静态成员
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator iterator;
		typedef typename hash_bucket::HashTable<K, K, SetKeyOfT>::const_iterator const_iterator;
		//调用const迭代器，得用const修饰的this调用
		iterator begin()const
		{
			return _ht.begin();
		}
		iterator end()const
		{
			return _ht.end();
		}
		//unordered_set的insert返回值中的iterator实际上是const_iterator
		pair<iterator,bool> insert(const K& key)
		{
			//HashTable的insert返回值中的iterator是普通迭代器
			//我们期望iterator转换成const_iterator，就得写一个const_iterator的构造
			pair<typename hash_bucket::HashTable<K, K, SetKeyOfT>::iterator, bool>ret = _ht.Insert(key);
			return pair<iterator, bool>(ret.first, ret.second);//将ret.first赋值给const_iterator，会调用const_iterator的构造
		}
	private:
		hash_bucket::HashTable<K, K, SetKeyOfT> _ht;
	};
}

3.4.3 模拟实现unordered_map

#include"MyHashTable.h"
namespace zn
{
	template<class K, class V>
	class unordered_map
	{
		struct MapKeyOfT
		{
			const K& operator()(const pair<const K,V>& kv)
			{
				return kv.first;
			}
		};

	public:
		typedef typename hash_bucket::HashTable<K, pair<const K,V>, MapKeyOfT>::iterator iterator;
		typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT>::const_iterator const_iterator;
		iterator begin()
		{
			return _ht.begin();
		}
		iterator end()
		{
			return _ht.end();
		}
		const_iterator begin()const
		{
			return _ht.begin();
		}
		const_iterator end()const
		{
			return _ht.end();
		}
		pair<iterator,bool> insert(pair<const K,V>& kv)
		{
			return _ht.Insert(kv);
		}
		V&operator[](const K& key)
		{
			pair<iterator, bool> ret = insert(make_pair(key, V()));
			return ret.first.second;
		}
	private:
		hash_bucket::HashTable<K, pair<const K,V>, MapKeyOfT> _ht;
	};
}