哈希冲突的常见解决方法【附C++代码】

news2025/4/28 17:04:01

在C++中，哈希表是一种常用的数据结构，用于实现快速的插入、删除和查找操作。

哈希表的核心在于哈希函数，它将输入的关键字转换为一个数组索引。然而，不同的关键字可能映射到相同的索引，这种情况称为哈希冲突。

有效地解决哈希冲突是确保哈希表性能的关键。

1. 开放地址法

概念：开放地址法是指当一个关键字映射的位置已经被占用时，会寻找下一个空闲的位置进行存放。查找时，若原位置没有找到，则按照同样的规则继续查找下一个可能的位置。

优点：实现简单，无需额外的数据结构。

缺点：可能会导致某些区域过于密集，影响性能；删除操作复杂。

代码示例：

#include <iostream>
#include <vector>

class OpenAddressingHashTable {
public:
    explicit OpenAddressingHashTable(size_t size) : table(size, -1), used(size, false) {}

    void insert(int key) {
        size_t index = key % table.size();
        while (used[index]) {
            index = (index + 1) % table.size(); // 线性探测法
        }
        table[index] = key;
        used[index] = true;
    }

    bool search(int key) {
        size_t index = key % table.size();
        while (used[index]) {
            if (table[index] == key) return true;
            index = (index + 1) % table.size();
        }
        return false;
    }

private:
    std::vector<int> table;
    std::vector<bool> used;
};

2. 链地址法（哈希桶）

概念：链地址法是在每个数组位置上挂接一个链表，所有映射到该位置的元素都存储在这个链表中。

优点：冲突少时效率高，支持动态扩容，删除操作简单。

缺点：链表过长时，查找效率降低。

代码示例（基于之前提供的哈希桶示例）：

#include <iostream>
#include <list>
#include <vector>

class HashBucket {
public:
    explicit HashBucket(size_t size = 10) : buckets(size) {}

    void insert(int key, std::string value) {
        size_t index = hashFunction(key);
        buckets[index].push_back({key, value});
    }

    std::string search(int key) {
        size_t index = hashFunction(key);
        for (const auto& pair : buckets[index]) {
            if (pair.first == key) {
                return pair.second;
            }
        }
        return "Not Found";
    }

    void remove(int key) {
        size_t index = hashFunction(key);
        auto& bucket = buckets[index];
        bucket.erase(std::remove_if(bucket.begin(), bucket.end(),
                                    [key](const auto& p){ return p.first == key; }),
                     bucket.end());
    }

private:
    std::size_t hashFunction(int key) const {
        return key % buckets.size(); // 简单的取模哈希函数
    }

    std::vector<std::list<std::pair<int, std::string>>> buckets;
};

int main() {
    HashBucket hashTable;

    hashTable.insert(10, "Apple");
    hashTable.insert(25, "Banana");
    hashTable.insert(20, "Cherry");

    std::cout << "Search 10: " << hashTable.search(10) << std::endl; // 应输出 Apple
    std::cout << "Search 30: " << hashTable.search(30) << std::endl; // 应输出 Not Found

    hashTable.remove(20);
    std::cout << "Search 20 after removal: " << hashTable.search(20) << std::endl; // 应输出 Not Found

    return 0;
}

3. 再哈希法

概念：当发生冲突时，使用第二个哈希函数计算另一个位置，如果仍冲突，则继续使用第三个或更多哈希函数，直到找到空位。

优点：可以减少聚集现象。

缺点：需要设计多个哈希函数，增加了实现复杂度。

代码示例（简略示例）：

class RehashHashTable {
public:
    void insert(int key) {
        size_t index = primaryHash(key);
        if (isOccupied(index)) {
            index = secondaryHash(key); // 假设这是第二个哈希函数
            // 可能需要更多的检查和重哈希直到找到空位
        }
        // 实际插入逻辑省略
    }

private:
    size_t primaryHash(int key) { /* 主哈希函数实现 */ }
    size_t secondaryHash(int key) { /* 辅助哈希函数实现 */ }
    bool isOccupied(size_t index) { /* 检查位置是否已被占用 */ }
};