哈希与unordered_set、unordered

1. unordered系列关联式容器

1.1.unordered_map的接口示例

1.2. 底层结构

底层差异

哈希概念

2.哈希表的模拟实现

3.unordered的封装

3.1.哈希表的改造

3.2.上层封装

3.2.1.unordered_set封装

3.2.2.unordered_map封装及operator[]实现

1. unordered系列关联式容器

在C++11中，STL又提供了4个unordered系列的关联式容器：

unordered_set

unordered_multiset

unordered_map

unordered_multimap

这四个容器与红黑树结构的关联式容器使用方式基本类似，只是其底层结构不同，他们的底层为哈希表。

1.1.unordered_map的接口示例

下面给出unordered_map常用的一些函数

1）.unordered_map的构造

函数声明	功能简介
(constructor)	构造的unordered_map对象

2）.unordered_map的容量

函数声明	功能简介
empty	返回容器是否为空
size	返回容器中储存元素个数

3）.unordered_map的修改操作

函数声明	功能简介
operator[]	访问指定key元素，若没有则插入
insert	插入元素
erase	删除指定key元素
clear	清除内容

4）.unordered_map的查询操作

函数声明	功能简介
iterator find(const key_type& k)	查找指定key元素，返回其迭代器
size_type count (const key_type& k)	返回哈希桶中关键码为key的键值对的个数

5）.unordered_map的迭代器

函数声明	功能简介
begin	返回unordered_map第一个元素的迭代器
end	返回unordered_map最后一个元素下一个位置的迭代器
cbegin	返回unordered_map第一个元素的const迭代器
cend	返回unordered_map最后一个元素下一个位置的const迭代器

1.2. 底层结构

unordered系列的关联式容器之所以效率比较高，是因为其底层使用了哈希结构。

底层差异

1.对key的要求不同

set：key支持比较大小

unordered_set：key支持转成整型+比较相等

2.set遍历有序，unordered_set遍历无序

3.性能差异（查找的时间复杂度）

set：O（logN）

unordered_set：O（1）

哈希概念

构造一种存储结构，通过某种函数(hashFunc)使元素的存储位置与它的关键码之间能够建立 一一映射的关系，那么在查找时通过该函数可以很快找到该元素。

哈希思想即为将关键码与储存位置进行映射。

○插入元素时根据待插入元素的关键码，以此函数计算出该元素的存储位置并按此位置进行存放

○搜索元素时对元素的关键码进行同样的计算，把求得的函数值当做元素的存储位置，在结构中按此位置取元素比较，若关键码相等，则搜索成功

该方式即为哈希(散列)方法，哈希方法中使用的转换函数称为哈希(散列)函数，构造出来的结构称为哈希表(Hash Table)(或者称散列表)

建立映射关系有下面两种方法

1.直接定址法

优点：快、没有哈希冲突

缺点：只适合范围相对集中关键码，否则要牺牲空间为代价

2.除留余数法

hash(key) = key % capacity

哈希冲突/碰撞:不同关键字通过哈希函数计算后映射到了相同位置

如何解决哈希冲突？

1.开散列：开放定址法——按某种规则去其他位置找一个空位置储存（a.线性探测；b.二次探测）

2.闭散列:哈希桶/拉链法——首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

2.哈希表的模拟实现

下面给出哈希表的模拟实现

HashFunc是将关键码转为整型的仿函数

//在哈希表中定义负载因子,用于记录哈希表中存储数据个数

size_t _n;

//当_n / _tables.size() 达到一定程度后对哈希表进行扩容

//负载因子过高，进行扩容
           if (_n * 10 / _tables.size() >= 10)
           {
               HashTable<K, T, KeyOfT> newtable;
               int newsize = _tables.size() * 2;
               newtable._tables.resize(newsize);

               for (auto& e : _tables)
               {
                   Node* del = e;
                   while (e)
                   {
                       newtable.Insert(e->_data);
                       e = e->_next;
                   }
                   del = nullptr;
               }

               //调用自己类Insert遵循规则插入新表，最后交换
               _tables.swap(newtable._tables);
           }

// 哈希函数采用除留余数法
template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};

// 哈希表中支持字符串的操作
template<>
struct HashFunc<string>
{
	size_t operator()(const string& key)
	{
		size_t hash = 0;
		for (auto e : key)
		{
			//*31减小冲突的可能
			hash *= 31;
			hash += e;
		}

		return hash;
	}
};


// 以下采用开放定址法，即线性探测解决冲突
namespace open_address
{
    //用枚举体表示表中相应位置状态：存在元素、空、元素删除位置
	enum State
	{
		EXIST,
		EMPTY,
		DELETE

	};

	template<class K, class V>
	struct HashData
	{
		pair<K, V> _kv;
		State _state = EMPTY;
	};

	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{
	public:
		HashTable()
			:_n(0)
		{
			_tables.resize(10);
		}

		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
			{
				return false;
			}

			//负载因子过高，进行扩容
			if (_n * 10 / _tables.size() >= 7)
			{
				HashTable<K, V> newtable;
				int newsize = _tables.size() * 2;
				newtable._tables.resize(newsize);
				for (auto e : _tables)
				{
					if (e._state == EXIST)
					{
						newtable.Insert(e._kv);
					}
				}

				//调用自己类Insert遵循规则插入新表，最后交换
				_tables.swap(newtable._tables);
			}

			Hash hashfun;
			int hashi = hashfun(kv.first) % _tables.size();

			//找非空或删除位置
			while (_tables[hashi]._state == EXIST)
			{

				hashi++;
				hashi %= _tables.size();
			}

			_tables[hashi]._kv = kv;
			_tables[hashi]._state = EXIST;
			++_n;
			return true;

		}
		HashData<K, V>* Find(const K& key)
		{
			Hash hashfun;
			int hashi = hashfun(key) % _tables.size();

			//DELETE位置也要查找，因为相同映射的元素在中间会被删除
			while (_tables[hashi]._state == EXIST || _tables[hashi]._state == DELETE)
			{

				if (_tables[hashi]._state == EXIST && _tables[hashi]._kv.first == key)
				{
					return &_tables[hashi];
				}
				hashi++;
				hashi %= _tables.size();
			}
			return nullptr;
		}
		bool Erase(const K& key)
		{
			//直接复用查找后删除
			HashData<K, V>* pdata = Find(key);
			if (pdata == nullptr)
			{
				return false;
			}
			pdata->_state = DELETE;
			--_n;
			return true;
		}

	private:
		vector<HashData<K, V>> _tables;
		size_t _n = 0;  // 表中存储数据个数
	};
}

//哈希桶/拉链法
namespace hash_bucket
{
	template<class K, class V>
	struct HashNode
	{
		pair<K, V> _kv;
		HashNode<K, V>* _next;
		HashNode(const pair<K, V>& kv)
			:_kv(kv)
			, _next(nullptr)
		{}
	};



	// Hash将key转化为整形，因为哈希函数使用除留余数法
	template<class K, class V, class Hash = HashFunc<K>>
	class HashTable
	{

		typedef HashNode<K, V> Node;
	public:
		HashTable()
		{
			_tables.resize(10, nullptr);
		}

		// 哈希桶的销毁
		//~HashTable();

		// 插入值为data的元素，如果data存在则不插入
		bool Insert(const pair<K, V>& kv)
		{
			if (Find(kv.first))
			{
				return false;
			}
			//负载因子过高，进行扩容
			if (_n * 10 / _tables.size() >= 10)
			{
				HashTable<K, V> newtable;
				int newsize = _tables.size() * 2;
				newtable._tables.resize(newsize);

				for (auto& e : _tables)
				{
					while (e)
					{
						newtable.Insert(e->_kv);
						e = e->_next;
					}
				}

				//调用自己类Insert遵循规则插入新表，最后交换
				_tables.swap(newtable._tables);
			}

			Hash hashfun;
			int hashi = hashfun(kv.first) % _tables.size();

			Node* newnode = new Node(kv);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;


			++_n;
			return true;
		}

		// 在哈希桶中查找值为key的元素，存在返回true否则返回false
		bool Find(const K& key)
		{
			Hash hashfun;
			int hashi = hashfun(key) % _tables.size();

			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					return true;
				}
				cur = cur->_next;
			}
			return false;
		}

		// 哈希桶中删除key的元素，删除成功返回true，否则返回false
		bool Erase(const K& key)
		{
			Hash hashfun;
			int hashi = hashfun(key) % _tables.size();

			Node* cur = _tables[hashi];
			Node* parent = nullptr;
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					Node* next = cur->_next;
					if (cur == _tables[hashi])
					{
						_tables[hashi] = next;
					}
					else
					{
						parent->_next = next;
					}

					delete cur;
					--_n;
					return true;
				}
				parent = cur;
				cur = cur->_next;
			}
			return false;
		}

	private:
		vector<Node*> _tables;  // 指针数组
		size_t _n = 0;			// 表中存储数据个数
	};
}

3.unordered的封装

封装unordered应按照以下步骤进行

1.实现哈希表

2.封装unordered_set、unordered_map，解决KeyOfT问题（取出数据类型中的关键码）

3.实现Iterator

4.operator[]的实现

3.1.哈希表的改造

上面我们已经实现了哈希表，下面我们对哈希表进行改造：解决KeyOfT问题、实现Iterator

//哈希桶/拉链法
namespace hash_bucket
{
	template<class T>
	struct HashNode
	{
		T _data;
		HashNode<T>* _next;
		HashNode(const T& data)
			:_data(data)
			, _next(nullptr)
		{}
	};

	//前置哈希表声明
	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;

	//哈希表迭代器
	template<class K,class T,class Ptr,class Ref,class KeyOfT,class Hash = HashFunc<K>>
	struct HashTableIterator
	{
		typedef HashNode<T> Node;
		typedef HashTable<K, T, KeyOfT,Hash> HashBucket;
		typedef HashTableIterator Self;

		HashTableIterator(Node* node,const HashTable<K, T, KeyOfT,Hash>* pht)
			:_node(node)
			, _pht(pht)
		{

		}


		Self& operator++()
		{
			Hash hashfun;
			KeyOfT kot;
			Node* cur = _node;

			if (_node->_next)
			{
				_node = _node->_next;
			}
			else
			{
				int hashi = hashfun(kot(cur->_data)) % _pht->_tables.size();
				++hashi;

				while (hashi < _pht->_tables.size() && _pht->_tables[hashi] == nullptr)
				{
					++hashi;
				}

				if (hashi >= _pht->_tables.size())
				{
					_node = nullptr;
					return *this;
				}

				_node = _pht->_tables[hashi];

			}

			return  *this;
		}


		Ref operator*()
		{
			return _node->_data;
		}
		Ptr operator->()
		{
			return &_node->_data;
		}

		//因为end()返回为一个临时对象，必须加const
		bool operator!=(const Self& ito)
		{
			return _node != ito._node;
		}


		Node* _node;
		const HashBucket* _pht;
	};


	// Hash将key转化为整形，因为哈希函数使用除留余数法
	template<class K, class T, class KeyOfT, class Hash = HashFunc<K>>
	class HashTable
	{
	public:
		typedef HashNode<T> Node;
		typedef HashTableIterator<K, T,T*, T&, KeyOfT> Iterator;
		typedef HashTableIterator<K, T,const T*,const T&, KeyOfT> ConstIterator;
		template<class K, class T, class KeyOfT,  class Ptr, class Ref, class Hash>
		friend struct HashTableIterator;
	public:
		HashTable()
		{
			_tables.resize(10, nullptr);
		}

		// 哈希桶的销毁
		~HashTable()
		{
			int hashi = 0;
			Node* cur;
			Node* next;

			while (hashi < _tables.size())
			{
				cur = _tables[hashi];
				while (cur)
				{
					 next = cur->_next;
					delete cur;
					cur = next;
				}


				++hashi;
			}

		}

		Iterator Begin()
		{
			if (_n == 0)
				return End();
			int hashi = 0;

			while (hashi <= _tables.size() && _tables[hashi] == nullptr)
			{
				++hashi;
			}

			if (hashi >= _tables.size())
			{
				return Iterator(nullptr, this);
			}
			else
			{
				return Iterator(_tables[hashi],this);
			}
		}

		Iterator End()
		{
			return Iterator(nullptr, this);
		}

		ConstIterator Begin()const
		{
			int hashi = 0;

			while (hashi <= _tables.size() && _tables[hashi] == nullptr)
			{
				++hashi;
			}

			if (hashi >= _tables.size())
			{
				return ConstIterator(nullptr, this);
			}
			else
			{
				return ConstIterator(_tables[hashi],this);
			}
		}

		ConstIterator End()const
		{
			return ConstIterator(nullptr, this);
		}


		// 插入值为data的元素，如果data存在则不插入
		pair<Iterator,bool> Insert(const T& data)
		{
			KeyOfT kot;
			Iterator ret(nullptr,this);
			ret = Find(kot(data));
			if (ret._node != nullptr)
			{
				return make_pair(ret,false);
			}
			//负载因子过高，进行扩容
			if (_n * 10 / _tables.size() >= 10)
			{
				HashTable<K, T, KeyOfT> newtable;
				int newsize = _tables.size() * 2;
				newtable._tables.resize(newsize);

				for (auto& e : _tables)
				{
					Node* del = e;
					while (e)
					{
						newtable.Insert(e->_data);
						e = e->_next;
					}
					del = nullptr;
				}

				//调用自己类Insert遵循规则插入新表，最后交换
				_tables.swap(newtable._tables);
			}

			Hash hashfun;
			int hashi = hashfun(kot(data)) % _tables.size();

			Node* newnode = new Node(data);
			newnode->_next = _tables[hashi];
			_tables[hashi] = newnode;
			ret._node = newnode;

			++_n;
			return make_pair(ret,true);
		}

		// 在哈希桶中查找值为key的元素，存在返回true否则返回false
		Iterator Find(const K& key)
		{
			KeyOfT kot;
			Hash hashfun;
			int hashi = hashfun(key) % _tables.size();

			Node* cur = _tables[hashi];
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					return Iterator(cur,this);
				}
				cur = cur->_next;
			}
			return Iterator(nullptr,this);
		}

		// 哈希桶中删除key的元素，删除成功返回true，否则返回false
		bool Erase(const K& key)
		{
			KeyOfT kot;
			Hash hashfun;
			int hashi = hashfun(key) % _tables.size();

			Node* cur = _tables[hashi];
			Node* parent = nullptr;
			while (cur)
			{
				if (kot(cur->_data) == key)
				{
					Node* next = cur->_next;
					if (cur == _tables[hashi])
					{
						_tables[hashi] = next;
					}
					else
					{
						parent->_next = next;
					}

					delete cur;
					--_n;
					return true;
				}
				parent = cur;
				cur = cur->_next;
			}
			return false;
		}

	private:
		vector<Node*> _tables;  // 指针数组
		size_t _n = 0;			// 表中存储数据个数
	};

}

3.2.上层封装

然后我们对unordered_set、unordered_map完成封装，unordered_map实现operator[]

3.2.1.unordered_set封装

namespace bit
{
	using namespace hash_bucket;
	template<class K>
	class unorderded_set
	{
	public:

		struct setKeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};


		typedef typename HashTable<K,const K, setKeyOfT>::Iterator iterator;
		typedef typename HashTable<K,const K, setKeyOfT>::ConstIterator const_iterator;


		pair<iterator, bool> insert(const K& data)
		{
			return _pht.Insert(data);
		}
		bool erase(const K& key)
		{
			return _pht.Erase(key);
		}
		iterator find(const K& key)
		{
			return _pht.Find(key);
		}

		iterator begin()
		{
			return _pht.Begin();
		}
		iterator end()
		{
			return _pht.End();
		}
		const_iterator begin()const
		{
			return _pht.Begin();
		}
		const_iterator end()const
		{
			return _pht.End();
		}


	private:

		HashTable<K,const K, setKeyOfT> _pht;
	};
}

3.2.2.unordered_map封装及operator[]实现

operator[]实现需注意下层迭代器及Insert的实现

namespace bit
{

	template<class K, class V>
	class unorderded_map
	{

	public:



		struct mapKeyOfT
		{
			const K& operator()(const pair<K, V>& t)
			{
				return t.first;
			}
		};


		typedef typename HashTable<K, pair<const K,V>, mapKeyOfT>::Iterator iterator;
		typedef typename HashTable<K, pair<const K, V>, mapKeyOfT>::ConstIterator const_iterator;


		pair<iterator, bool> insert(const pair<K,V>& data)
		{
			return _pht.Insert(data);
		}
		bool erase(const K& key)
		{
			return _pht.Erase(key);
		}
		iterator find(const K& key)
		{
			return _pht.Find(key);
		}
        
        //要点在于下层迭代器及Insert的实现
		V& operator[](const K& key)
		{
			pair<iterator, bool>  pa = insert(make_pair(key, V()));
			return pa.first->second;

		}


		iterator begin()
		{
			return _pht.Begin();
		}
		iterator end()
		{
			return _pht.End();
		}
		const_iterator begin()const
		{
			return _pht.Begin();
		}
		const_iterator end()const
		{
			return _pht.End();
		}
	private:
		
		hash_bucket::HashTable<K, pair<const K,V>, mapKeyOfT> _pht;

	};