C++ 哈希桶和封装unordered_map和unordered

哈希桶的概念

哈希桶的结构

哈希桶的结点

哈希桶的类

Insert插入函数

Find查找函数

Erase删除函数

哈希的两种仿函数(int) 和(string)

哈希表的改造

编辑

迭代器

改造

unordered_map和unordered_set的封装

前言

上一篇文章讲的哈希表，属于闭散列。可以解决哈希冲突有两种方式：闭散列和开散列。现在要学习的哈希桶就是开散列。

哈希桶的概念

首先对关键码集合用散列函数计算散列地址，具有相同地址的关键码归于同一子集合，每一个子集合称为一个桶，各个桶中的元素通过一个单链表链接起来，各链表的头结点存储在哈希表中。

下面即是哈希桶的简易图

哈希桶的结构

哈希桶的结点

由上图就可以看出来，每个结点必要存一个 pair 和一个指向下一个节点的指针 _next。

template<class T>
struct HashNode
{
	T _data;
	HashNode<T>* _next;
	HashNode(const T& data)
		:_data(data)
		, _next(nullptr)
	{}

};

哈希桶的类

哈希桶类的构成和哈希表类似，都是一个由一个 vector 存放每个节点，但是这里与 HashTable 不同的是需要存放的是节点的指针。还有一个 _n 代表有效数据的个数：

template<class K, class V, class Hash = HashFunc<K>>
class HashBucket
{
    typedef HashNode Node;
private:
    vector<Node*> _ tables;
    size_t _n;
};

Insert插入函数

此时只需要使用 Key 模数组的大小来计算出该节点需要连接在 vector 上的位置，然后使用 new 得到储存 kv 的新节点，当 new 一个新节点时，节点的构造函数必不可少，下面先来看一下单个节点的构造函数以及类的构造函数：

template<class K, class V>
struct HashNode
{
	pair<K, V> _kv;
	HashNode* _next;
	HashNode(const pair<K, V>& kv):_kv(kv), _next(nullptr)
	{}
};
template<class K, class V, class Hash = HashFunc<K>>
class HashBucket
{    
    typedef HashNode<K, V> Node;
    HashBucket()
    {
        _tables.resize(10, nullptr);
        _n = 0;
    }
private:
    vector<Node*> tables;
    size_t _n;
}

这个仿函数HashFunc后面会为大家介绍

在插入的时候，我们是头插还是尾插呢？

很显然，头插的效率要更高；

size_t hashi = hf(kv.first) % _tables.size();
Node* newnode = new Node(kv);

// 头插
newnode->_next = _tables[hashi];
_tables[hashi] = newnode;
++_n;

当插入的值很多的时候我们就要扩容了,这里一直没提到_n的作用是什么

其实_n是负载因子，当负载因子==_tables.size()时就要进行扩容了

扩容逻辑:定义一个新的指针数组,大小扩容到原指针数组大小的2倍，再逐个插入到新数组。

最后交换。

if (_n * 10 / _tables.size() == 1)
{
	vector<Node*> Newtables;
	Newtables.resize(_tables.size() * 2);
	for (int i = 0; i < _tables.size(); i++)
	{
		Node* cur = _tables[i];
		while (cur)
		{
			size_t hashi = hf(kot(cur->_data)) % Newtables.size();
			Node* next = cur->_next;
			cur->_next = Newtables[hashi];
			Newtables[hashi] = cur;
			cur = next;
		}
		_tables[i] = nullptr;

		_tables.swap(Newtables);
	}
}

Find查找函数

Key 模数组的大小来计算出该节点需要连接在 vector 上的位置,找到在数组的位置通过while循环，查找桶内元素是否存在Key值。

Node* Find(const K& key)
		{
			Hash hf;
			size_t hashi = hf(key) % _tables.size();
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					return cur;
				}

				cur = cur->_next;
			}

			return NULL;
		}

Erase删除函数

根据 Insert 函数中，可以得知， HashBucket 的每个节点都是 new 出来的，那删除的时候就要使用 delete ，又因为每个节点都是自定义类型，所以要为 HashBucket 写一个析构函数。
对类的析构就是遍历 vector 的每个节点，再从每个节点遍历每个链表，以此遍历全部节点。

	~HashTable()
	{
		for (int i = 0; i < _tables.size(); i++)
		{
			Node* cur = _tables[i];
			while (cur)
			{
				Node* next = cur->_next;
				delete cur;
				cur = next;
			}
			_tables[i] = nullptr;
		}
	}

Erase删除思路

1.计算哈希值：使用哈希函数 hs 计算给定键 Key 的哈希值，并确定它在桶中的索引 index。

2.遍历链表：从索引 index 开始，遍历链表中的每个节点。

3.查找节点：检查当前节点的键是否等于 Key。

如果找到匹配节点：

如果该节点是链表的第一个节点，将桶的头指针 _bucket[index] 指向下一个节点。
否则，将前一个节点的 _next 指针指向当前节点的下一个节点。
删除当前节点 cur，释放内存。
返回 true，表示删除成功。

如果没有找到匹配节点，继续遍历链表，更新 prev 和 cur。

返回结果：如果遍历完整个链表未找到匹配节点，返回 false，表示删除失败。

定义一个prev值防止删除指定元素时，找不到上一个元素

bool Erase(const K& key)
		{
			Hash hf;
			size_t hashi = hf(key) % _tables.size();
			Node* prev = nullptr;
			Node* cur = _tables[hashi];
			while (cur)
			{
				if (cur->_kv.first == key)
				{
					if (prev == nullptr)
					{
						_tables[hashi] = cur->_next;
					}
					else
					{
						prev->_next = cur->_next;
					}
					delete cur;

					return true;
				}

				prev = cur;
				cur = cur->_next;
			}

			return false;
		}

哈希的两种仿函数(int) 和(string)

是为了直接获取key值

template<class K>
struct HashFunc
{
	size_t operator()(const K& key)
	{
		return (size_t)key;
	}
};
template<>
struct HashFunc<string>
{
	size_t operator()(const string& key)
	{
		// BKDR
		size_t hash = 0;
		for (auto e : key)
		{
			hash *= 31;
			hash += e;
		}

		//cout << key << ":" << hash << endl;
		return hash;
	}
};

这里对string进行了特化，因为对string类的仿函数不同。

哈希表的改造

unordered_map和unordered_set这两个stl容器，从上图可以看到，他们底层似乎都是使用哈希来实现的。那么在理解了哈希的基础上，不妨尝试一下自己模拟实现一下简单的unordered_map和unordered_set也给之前的哈希结构加上迭代器操作。

unordered_map和unordered_set的基本结构

从C++库中可以看出来，map和set一大区别就在于set只有一个关键字key，而map中存在key和value。因此他们的基本结构可以分别这样设计：其中在原来哈希的基础上又多了一个KeyOfT的仿函数，他的作用主要是确定需要进行比较的关键字，例如map中需要比较的是T中的first，而set就是key

template<class K, class V, class Hash = HashFunc<K>>
class unordered_map
{
	struct MapKeyOfT
	{
		const K& operator()(const pair<K, V>& kv)
		{
			return kv.first;
		}
	};
  private:
	hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;
};


	template<class K, class Hash = HashFunc<K>>
	class My_Ordered_set
	{
		struct KeyOfT
		{
			const K& operator()(const K& key)
			{
				return key;
			}
		};
      private:
	   hash_bucket::HashTable<K, K, KeyOfT, Hash> _ht;

    };

迭代器

基本框架

template<class K, class T,class Ref,class Ptr, class KeyOfT, class Hash>
struct HashIterator
{
	typedef HashNode<T> Node;
	Node* _node;
	typedef HashIterator<K, T,Ref,Ptr, KeyOfT, Hash> Self;
	const HashTable<K, T, KeyOfT, Hash>* _pht;
	size_t _hashi;
	HashIterator(Node* node, HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi)
		:_node(node)
		, _pht(pht)
		, _hashi(hashi)
	{}

但在此之前要写一个前置声明，因为我们在基本框架里面定义了_pht，使用了哈希桶类，但代码只会向上查找，找不到哈希桶类

编译器根本不认识_pht前面的代码

前置声明

	template<class K, class T, class KeyOfT, class Hash>
	class HashTable;//前置声明

stl中的unordered_map和unordered_set内部实现了迭代器，可以使用迭代器访问其中的内容，这里我们也在哈希中实现一个迭代器操作，然后通过封装实现unordered_map和unordered_set的迭代器，实现代码如下，迭代器里面主要需要注意的是operator++操作。这个操作需要遍历哈希表以及每个点上所挂的链表。因此可以这样考虑：先判断当前链表是否走完，如果没走完就直接往下一个节点走，如果走完了就将hashi++，前往下一个哈希地址，然后开始遍历链表

迭代器完整代码

template<class K, class T, class KeyOfT, class Hash>
class HashTable;//前置声明

template<class K, class T,class Ref,class Ptr, class KeyOfT, class Hash>
struct HashIterator
{
	typedef HashNode<T> Node;
	Node* _node;
	typedef HashIterator<K, T,Ref,Ptr, KeyOfT, Hash> Self;
	const HashTable<K, T, KeyOfT, Hash>* _pht;
	size_t _hashi;
	HashIterator(Node* node, HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi)
		:_node(node)
		, _pht(pht)
		, _hashi(hashi)
	{}
	HashIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi)
		:_node(node)
		, _pht(pht)
		, _hashi(hashi)
	{}

	Self& operator++()
	{
		if (_node->_next)
		{
			_node = _node->_next;
		}
		else
		{
			_hashi++;
			while (_hashi < _pht->_tables.size())
			{
				if (_pht->_tables[_hashi])
				{
					_node = _pht->_tables[_hashi];
					break;
				}
				_hashi++;
			}
			if (_hashi == _pht->_tables.size())
			{
				_node = nullptr;
			}
		}
		return *this;
	}
	Ref operator*()
	{
		return _node->_data;
	}

	Ptr operator->()
	{
		return &_node->_data;
	}

	bool operator!=(const Self& s)
	{
		return _node != s._node;
	}
};

改造

在完成迭代器之后，可以在原来的哈希表结构体中完善一下insert等参数的返回值，并加入新的begin，end接口函数，代码如下：

template<class K, class T, class KeyOfT, class Hash>
class HashTable
{
	template<class K, class T, class Ref, class Ptr, class KeyOfT, class Hash>
	friend struct HashIterator;
	typedef HashNode<T> Node;
public:
	typedef HashIterator<K, T, T&, T*, KeyOfT, Hash> iterator;
	typedef HashIterator<K, T, const T&, const T*, KeyOfT, Hash> const_iterator;
	iterator begin()
	{
		for (size_t i = 0; i < _tables.size(); i++)
		{
			if (_tables[i])
			{
				return iterator(_tables[i], this, i);
			}
		}

		return end();
	}
	iterator end()
	{
		return iterator(nullptr, this, -1);
	}
	const_iterator begin() const
	{
		for (size_t i = 0; i < _tables.size(); i++)
		{
			if (_tables[i])
			{
				return const_iterator(_tables[i], this, i);
			}
		}

		return end();
	}

	// this-> const HashTable<K, T, KeyOfT, Hash>*
	const_iterator end() const
	{
		return const_iterator(nullptr, this, -1);
	}


	HashTable()
	{
		_tables.resize(10, nullptr);
	}

	// 哈希桶的销毁
	~HashTable()
	{
		for (int i = 0; i < _tables.size(); i++)
		{
			Node* cur = _tables[i];
			while (cur)
			{
				Node* next = cur->_next;
				delete cur;
				cur = next;
			}
			_tables[i] = nullptr;
		}
	}
	pair<iterator, bool>  Insert(const T& data)
	{
		Hash hf;
		KeyOfT kot;

		iterator it = Find(kot(data));
		if (it != end())
			return make_pair(it, false);
		if (_n * 10 / _tables.size() == 1)
		{
			vector<Node*> Newtables;
			Newtables.resize(_tables.size() * 2);
			for (int i = 0; i < _tables.size(); i++)
			{
				Node* cur = _tables[i];
				while (cur)
				{
					size_t hashi = hf(kot(cur->_data)) % Newtables.size();
					Node* next = cur->_next;
					cur->_next = Newtables[hashi];
					Newtables[hashi] = cur;
					cur = next;
				}
				_tables[i] = nullptr;

				_tables.swap(Newtables);
			}
		}
		size_t hashi = hf(kot(data)) % _tables.size();
		Node* newnode = new Node(data);
		newnode->_next = _tables[hashi];
		_tables[hashi] = newnode;

		_n++;
		return make_pair(iterator(newnode, this, hashi), true);
	}
	iterator Find(const K& key)
	{
		Hash hf;
		KeyOfT kot;

		size_t hashi = hf(key) % _tables.size();
		Node* cur = _tables[hashi];
		while (cur)
		{
			if (kot(cur->_data) == key)
			{
				return iterator(cur, this, hashi);
			}

			cur = cur->_next;
		}

		return end();
	}

	// 哈希桶中删除key的元素，删除成功返回true，否则返回false
	bool Erase(const K& key)
	{
		KeyOfT kot;
		Hash hf;
		size_t hashi = hf(kot(key)) % _tables.size();
		Node* cur = _tables[hashi];
		Node* prev = nullptr;
		while (cur)
		{
			if (kot(cur->_data) == key)
			{
				if (prev == nullptr)
				{
					_tables[hashi] = cur->_next;
				}
				else
				{
					prev->_next = cur->_next;
				}

				delete cur;
				--_n;
				return true;
			}

			prev = cur;
			cur = cur->_next;
		}

		return false;
	}

这里写了一个友元函数，是为什么呢？

当我们不写的时候报了一个这样的错误

具体意思就是在迭代器里面我们要访问哈希桶类的私有成员变量，如果没有友元函数就没办法访问私有成员变量。

这里我们Insert返回是pair<iterator,bool>

find返回的是iterator

这也是为了和库里面保持一致

完成了这些之后，接下来就是对unordered_map和unordered_set的封装了

unordered_map和unordered_set的封装

unordered_set的封装

template<class K, class Hash = HashFunc<K>>
class My_Ordered_set
{
	struct KeyOfT
	{
		const K& operator()(const K& key)
		{
			return key;
		}
	};
public:
	typedef typename hash_bucket::HashTable<K, K, KeyOfT, Hash>::const_iterator iterator;
	typedef typename hash_bucket::HashTable<K, K, KeyOfT, Hash>::const_iterator const_iterator;
	const_iterator begin()const
	{
		return _ht.begin();
	}
	const_iterator end()const
	{
		return _ht.end();
	}
	pair<const_iterator, bool> insert(const K& key)
	{
           auto ret= _ht.Insert(key);
		   return pair<const_iterator, bool>(const_iterator(ret.first._node, ret.first._pht, ret.first._hashi), ret.second);
	}
	iterator Find(const K& key)
	{
		return _ht.Find(key);
	}
	bool Erase(const K& key)
	{
		return _ht.Erase(key);
	}
private:
	hash_bucket::HashTable<K, K, KeyOfT, Hash> _ht;

这里需要注意的是，在封装插入函数的时候，我们为什么不写return _ht.insert呢？

这里_ht返回的是iterator迭代器，而封装之后返回的是const_iterator迭代器

这里我们就要写一个将非const转换为const函数了

	HashIterator(Node* node, const HashTable<K, T, KeyOfT, Hash>* pht, size_t hashi)
		:_node(node)
		, _pht(pht)
		, _hashi(hashi)
	{}

这样就构建出const迭代器了

unordered_map的封装

template<class K, class V, class Hash = HashFunc<K>>
class unordered_map
{
	struct MapKeyOfT
	{
		const K& operator()(const pair<K, V>& kv)
		{
			return kv.first;
		}
	};
public:
	typedef typename hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash>::iterator iterator;
	iterator begin()
	{
		return _ht.begin();
	}

	iterator end()
	{
		return _ht.end();
	}
	pair<iterator, bool> insert(const pair<K, V>& kv)
	{
		return _ht.Insert(kv);
	}

	V& operator[](const K& key)
	{
		pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
		return ret.first->second;
	}

	const V& operator[](const K& key) const
	{
		pair<iterator, bool> ret = _ht.Insert(make_pair(key, V()));
		return ret.first->second;
	}
	iterator find(const K& key)
	{
		return _ht.Find(key);
	}

	bool erase(const K& key)
	{
		return _ht.Erase(key);
	}

private:
	hash_bucket::HashTable<K, pair<const K, V>, MapKeyOfT, Hash> _ht;