手撕源码（一）HashMap（JDK8）

- 1.使用示例
- 2.new HashMap<>() 解析
- - 2.1 加载因子
  - 2.2 构造方法
- 3.put() 解析
- - 3.1 原始put(k, v)
  - 3.2 计算哈希
  - - 1）为什么要进行二次hash？
    - 2）二次hash计算示例：
    - 3）为什么使用 (length-1)&hash 而不是 hash%length ？
    - 3）hashMap数组的长度为什么是2的指数次幂？
  - 3.3 putVal() 解析
  - - 1）底层数组、树化阈值、反树化阈值
    - 2）putVal() 解析
    - 3）resize() 解析
    - - 3.1）threshold 容量阈值
      - 3.2）初始容量大小
      - 3.3）修改次数
      - 3.4）最大容量限制
      - 3.5）resize() 解析
      - 3.6）hashMap扩容因子为什么是0.7
      - 3.7）hashMap为什么链表长度大于8 就转变成红黑树？
    - 4）创建新节点
    - 5）没有实现的方法
    - 6）treeifyBin() 解析
    - - 6.1）最小可树化容量
      - 6.2）treeifyBin() 解析
      - 6.2）替换树节点
- 4.get() 解析
- - 4.1 原始get(k)
  - 4.2 getNode() 解析
- 5.补充
- - 5.1 Node<k,v>单向链表图
  - 5.2 Node<k,v>相关UML类图
  - 5.3 table数组扩容详细分析

1.使用示例

import java.util.HashMap;
import java.util.Map;

public class MapExample {
    public static void main(String[] args) {
        Map<String, String> map = new HashMap<>();
        map.put("key", "1");
        String value = map.get("key");
        System.out.println("value=" + value);
    }
}

执行结果：

在这里插入图片描述

2.new HashMap<>() 解析

2.1 加载因子

/**
 * The load factor used when none specified in constructor.
 * ---------------------------
 * 该加载因子应用于空的构造方法
 */
static final float DEFAULT_LOAD_FACTOR = 0.75f;

2.2 构造方法

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 * ---------------------------
 * 构建一个空的 HashMap，初始大小为16，默认加载因子为0.75
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

3.put() 解析

3.1 原始put(k, v)

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 * ---------------------------
 * 在map中，将指定的value和指定的key进行绑定.
 * 如果map中提前包含了一个key的映射，之前的值会被替换掉.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 * ---------------------------
 * @param key 用于绑定指定的value
 * @param value 用于绑定到指定的key
 * @return 返回之前被绑定到指定key上的value值，或者如果之前key没有对应的value
 *         的话就返回null。（返回null也可以表明之前在map中绑定到key上的内容为
 *         null）.
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

3.2 计算哈希

/**
 * Computes key.hashCode() and spreads (XORs) higher bits of hash
 * to lower.  Because the table uses power-of-two masking, sets of
 * hashes that vary only in bits above the current mask will
 * always collide. (Among known examples are sets of Float keys
 * holding consecutive whole numbers in small tables.)  So we
 * apply a transform that spreads the impact of higher bits
 * downward. There is a tradeoff between speed, utility, and
 * quality of bit-spreading. Because many common sets of hashes
 * are already reasonably distributed (so don't benefit from
 * spreading), and because we use trees to handle large sets of
 * collisions in bins, we just XOR some shifted bits in the
 * cheapest possible way to reduce systematic lossage, as well as
 * to incorporate impact of the highest bits that would otherwise
 * never be used in index calculations because of table bounds.
 * ---------------------------
 * 计算key.hashCode()并扩展高位到低位。由于表格使用二进制掩码，只有在当前掩码之
 * 上的位置才会有变化的哈希值。（已知的例子包括小表格中连续整数的浮点哈希值集合。
 * ）因此，我们应用一个扩展高位的变换，将高位的影响向下传播。在速度、功能和哈希值
 * 扩展质量之间存在权衡。由于许多常见的哈希值集合已经分布得相当均匀（因此不会从扩
 * 展中受益），并且由于我们使用树来处理大型哈希值集合中的碰撞，我们只是在最便宜的
 * 方式下扩展一些移位的位置，以减少系统性丢失，并将最高位的影响纳入索引计算，因为
 * 表格边界的限制。
 * 
 */
static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

(h = key.hashCode()) ^ (h >>> 16) 这一步是将 hashcode 和将 hashcode 无符号右移16位后得到的值做异或运算。
当添加元素的时候,用key 的hashCode和hashCode的高位进行异或（二次hash）得到新的hash值，可以减少hash冲突。

1）为什么要进行二次hash？

因为要获得数组的下标是通过取模，而这里的取模并不是hash%length，而是 (length-1)&hash——哈希值和数组的长度-1进行与运算（在 putVal() 方法中）。因为一般情况下 length都不会太长，不会大于2的16次方，所以与运算的时候 length-1 的高位都是0 ，使得 (length-1)&hash 实际上是只用到了hashCode 的低位，为了全部利用hashCode ，使得hashCode更加均匀，于是使用 (h = key.hashCode()) ^ (h >>> 16) ，将hashCode 的高位和hashCode 进行异或，充分利用到了 hashCode 的高位。

2）二次hash计算示例：

String key = "my name is suser";
int hashcode = key.hashCode(); // 十进制为284986057
String binaryString = Integer.toBinaryString(hashcode); // 十进制转二进制
System.out.println(binaryString);

执行结果：

1000100110001100000001011001

然后看下h >>> 16后的值：

int hashcode2 = hashcode >>> 16;
String binaryString2 = Integer.toBinaryString(hashcode2);
System.out.println(binaryString2);

执行结果：

1000011111100

我们把 hashcode 和 hashcode2 分别在前加0变为32位，比较下这两个二进制数：

0  0  0  1  0  0  0  0  1  1  1  1  1  1  0  0         1  0  0  0  1  0  1  0  1  1  0  0  1  0  0  1
0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0         0  0  0  1  0  0  0  0  1  1  1  1  1  1  0  0

可以看出，结果就是h高位16位无符号位移了16位，做异或运算：

int newHashcode = hashcode ^ (hashcode >>> 16);
String newBinaryString = Integer.toBinaryString(newHashcode);
System.out.println(newHashcode);
System.out.println(newBinaryString);

执行结果：

284990005
10000111111001001101000110101

3）为什么使用 (length-1)&hash 而不是 hash%length ？

因为java 的运算符中 % 取模的方式是特别慢的，而 (length-1)&hash 的与运算是特别快的。

3）hashMap数组的长度为什么是2的指数次幂？

HashMap的长度是2的次幂的话，可以让数据更散列更均匀的分布，更充分的利用数组的空间。

数据分布的时候是根据 hash&oldCap 来判断是否分布到刚扩容的位置的，==0 则位置不变， !=0 则位置=位置+oldCap，放到刚扩容的部分。2的指数次幂的话转换为二进制刚好是其中1位为1，其余位为0，经过与运算，该位结果为0的情况为50%，刚好将之前的数组内容均匀分不到新的数组中。

3.3 putVal() 解析

1）底层数组、树化阈值、反树化阈值

/**
 * The table, initialized on first use, and resized as
 * necessary. When allocated, length is always a power of two.
 * (We also tolerate length zero in some operations to allow
 * bootstrapping mechanics that are currently not needed.)
 * ------------------------------
 * table数组，在第一次使用时初始化，必要时进行扩容。在分配时，长度总是2的指数次幂。
 * （为了适配一些当前并不需要的启动机制，在一些操作场景下，我们也允许长度为0）
 */
transient Node<K,V>[] table;

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 * ------------------------------
 * 使用树而不是列表作为桶的容量阈值。当向桶中添加至少具有这个节点数量的元素时，桶
 * 将转换为树。该值必须大于2，并且至少为8，以便与树删除的假设一致，当收缩时将其转
 * 换回普通的桶。
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 * ------------------------------
 * 在进行resize操作时，对桶（分桶）进行反树化操作的桶容量阈值。应该小于 
 * TREEIFY_THRESHOLD，最多为6，以便与删除操作中的收缩检测相匹配。
 */
static final int UNTREEIFY_THRESHOLD = 6;

2）putVal() 解析

/**
 * Implements Map.put and related methods
 * ------------------------------
 * 实现Map.put和相关的方法
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 * ------------------------------
 * @param hash key的哈希值
 * @param key key
 * @param value 需要绑定的value
 * @param onlyIfAbsent 是否保留原有值，为true时不改变原有值
 * @param evict 为false时，table处于创建模式
 * @return 如果原有值存在则返回原有值，不存在则返回null
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
               boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    // 判断table数组是否被初始化过
    if ((tab = table) == null || (n = tab.length) == 0)
        // 第一次初始化长度为16（参考：3.2）初始化 / 扩容长度）
        n = (tab = resize()).length;
    // 判断table数组位置节点是否存在
    // 将length-1与hash做与运算，获取数组下标，再根据下标获取值
    if ((p = tab[i = (n - 1) & hash]) == null)
        // 如果table数组位置节点不存在则新增（参考：4）创建新节点）
        tab[i] = newNode(hash, key, value, null);
    else {
        // 如果table数组位置节点存在
        Node<K,V> e; K k;
        // 判断当前节点的hash和新节点的hash是否一致
        if (p.hash == hash &&
            // 并且 判断当前节点的key和新节点的key是否一致
            ((k = p.key) == key || (key != null && key.equals(k))))
            // 一致则记录当前节点，用于后续有需要替换原有值
            e = p;
        // 判断当前节点是否为TreeNode类型（红黑树）
        else if (p instanceof TreeNode)
            // 放入红黑树中，逻辑比较复杂，感兴趣的自己看下
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        // 其他（单向链表）
        else {
            // 循环遍历单向链表，并记录桶的大小
            for (int binCount = 0; ; ++binCount) {
                // 判断是否为最后一个节点
                if ((e = p.next) == null) {
                    // 尾插法，插入新节点
                    p.next = newNode(hash, key, value, null);
                    // 判断桶的大小
                    if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                        // 树化桶
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    break;
                p = e;
            }
        }
        // 判断是否已存在key的映射
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            // 根据入参判断是否保留原有值
            if (!onlyIfAbsent || oldValue == null)
                // 更新节点的value值
                e.value = value;
    		// 这里是个空方法，HashMap没有实现。（参考：5）没有实现的方法）	
            afterNodeAccess(e);
            return oldValue;
        }
    }
    // 记录当前HashMap在结构上被修改的次数
    ++modCount;
    // 判断是否需要扩容
    if (++size > threshold)
        // （参考：3.2）初始化 / 扩容长度）
        resize();
    // 这里是个空方法，HashMap没有实现。（参考：5）没有实现的方法）
    afterNodeInsertion(evict);
    return null;
}

3）resize() 解析

3.1）threshold 容量阈值

/**
 * The next size value at which to resize (capacity * load factor).
 * ------------------------------
 * 需要扩容时的长度（容量*加载因子）。
 *
 * @serial
 */
// (The javadoc description is true upon serialization.
// Additionally, if the table array has not been allocated, this
// field holds the initial array capacity, or zero signifying
// DEFAULT_INITIAL_CAPACITY.)
// ------------------------------
// （Javadoc描述在序列化时为真。此外，如果table数组尚未分配，则此字段保存初始数组
// 容量，或为零时表示默认初始容量为DEFAULT_INITIAL_CAPACITY。）
int threshold;

3.2）初始容量大小

/**
 * The default initial capacity - MUST be a power of two.
 * ------------------------------
 * 默认初始容量大小 - 必须是2的指数次幂
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

3.3）修改次数

/**
 * The number of times this HashMap has been structurally modified
 * Structural modifications are those that change the number of mappings in
 * the HashMap or otherwise modify its internal structure (e.g.,
 * rehash).  This field is used to make iterators on Collection-views of
 * the HashMap fail-fast.  (See ConcurrentModificationException).
 * ------------------------------
 * 当前HashMap在结构上被修改的次数。在结构上的修改指的是那些像修改HashMap中映射数
 * 量或修改内部结构的操作（例如：rehash）。这个成员变量是用来在HashMap使集合视图迭
 * 代器快速失效的。（参见：ConcurrentModificationException）
 */
transient int modCount;

3.4）最大容量限制

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 * ------------------------------
 * 最大容量值，当通过构造方法的参数隐式指定的值比最大容量值还大时使用最大容量值。
 * 必须是2的指数次幂 例如：1<<30，为2的30次幂，值为1073741824
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

3.5）resize() 解析

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold.
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 * ------------------------------
 * 初始化 / 翻倍扩容table长度。如果为null，则根据成员变量threshold中持有的初始容
 * 量目标进行分配。如果不为null，由于我们正在使用2的指数次幂扩展，元素必须在新表
 * 格中保持相同的索引，或者以二的幂为偏移量在新表格中移动。
 *
 * @return the table
 * ------------------------------
 * @return 当前table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    // 记录当前容量
    int oldCap = (oldTab == null) ? 0 : oldTab.length;
    // 记录扩容大小
    int oldThr = threshold;
    int newCap, newThr = 0;
    // 判断如果之前容量大于0则进行扩容
    if (oldCap > 0) {
        // 判断如果当前容量大约最大容量
        if (oldCap >= MAXIMUM_CAPACITY) {
            // 则将扩容限制改为int的最大值，2147483647
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        // 先将容量翻倍，然后判断如果将当然容量翻倍后小于最大容量限制，
        // 并且当前容量大于初始容量16时
        else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
                 oldCap >= DEFAULT_INITIAL_CAPACITY)
            // 将扩容大小翻倍
            newThr = oldThr << 1; // double threshold
    }
    // 判断扩容大小是否大于0
    else if (oldThr > 0) // initial capacity was placed in threshold
        newCap = oldThr;
    else {               // zero initial threshold signifies using defaults
        // 第一次初始化，newCap=16, newThr=(int)(0.75 * 16)=12
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == 0) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                  (int)ft : Integer.MAX_VALUE);
    }
    // 更新扩容大小字段
    threshold = newThr;
    // 根据新长度，创建数组，赋值到table
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    // 判断旧table数组是否已存在
    if (oldTab != null) {
        // 遍历旧table数组
        for (int j = 0; j < oldCap; ++j) {
            Node<K,V> e;
            // 获取当前非空节点
            if ((e = oldTab[j]) != null) {
                // 清空当前节点在旧table数组中的位置
                oldTab[j] = null;
                // 判断当前位置是否只有一个节点
                if (e.next == null)
                    // 将当前节点放到table数组中
                    newTab[e.hash & (newCap - 1)] = e;
                // 判断当前节点是否为TreeNode类型（红黑树）
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                // 其他（单向链表）
                else { // preserve order 保持顺序
                    // lo开头用于记录低链（位置不变），loTail用于尾插新节点
                    Node<K,V> loHead = null, loTail = null;
                    // hi开头用于记录高链（位置+oldCap），hiTail用于尾插新节点
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        // == 0，记录低位链（位置不变）
                        // 这里之所以没用长度-1和hash进行与运算，应该是因为这里
                        // 并不是获取下标，而是为了将旧数组的内容平均扩散到新数组
                        // 中，因为oldCap是2的指数次幂，通过与运算在二进制中只影
                        // 响了其中一位，计算后可以将二进制位为0和为1的平均分成两
                        // 组。（这里逻辑比较复杂，可以参考：5.3 树化桶详细分析）
                        if ((e.hash & oldCap) == 0) {
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        // != 0，记录高位链（位置变为：位置+扩容前容量）
                        else {
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    // 如果当前节点不是尾节点，则迭代下一个
                    } while ((e = next) != null);
                    // 复制低位链数组，位置不变
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    // 复制高位链数组，位置变为：位置+扩容前容量
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

3.6）hashMap扩容因子为什么是0.7

太大：产生大量的链表和hash碰撞

太小：消耗大量的空间

3.7）hashMap为什么链表长度大于8 就转变成红黑树？

遵循泊松分布，8平均查找长度是8/2=4 6的平均查找长度是6/2=3。

4）创建新节点

// Create a regular (non-tree) node
// ------------------------------
// 创建一个普通节点（区别于TreeNode）
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
    return new Node<>(hash, key, value, next);
}

5）没有实现的方法

// Callbacks to allow LinkedHashMap post-actions
void afterNodeAccess(Node<K,V> p) { }
void afterNodeInsertion(boolean evict) { }
void afterNodeRemoval(Node<K,V> p) { }

6）treeifyBin() 解析

6.1）最小可树化容量

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 * ------------------------------
 * 可以将桶树化的最小表容量（当桶中节点过多时将进行扩容）。
 * 为避免扩容和树化阈值的冲突，至少应该是4倍的树化阈值。
 */
static final int MIN_TREEIFY_CAPACITY = 64;

6.2）treeifyBin() 解析

/**
 * Replaces all linked nodes in bin at index for given hash unless
 * table is too small, in which case resizes instead.
 * ------------------------------
 * 将指定索引位置的所有链接节点替换为当前桶的根节点，除非桶已满，在此情况下将进行
 * 扩容。
 */
final void treeifyBin(Node<K,V>[] tab, int hash) {
    int n, index; Node<K,V> e;
    // 判断桶是否为空，或是否容量低于最小可树化容量
    if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY)
        // 初始化/扩容
        resize();
    // 判断节点不能为空
    else if ((e = tab[index = (n - 1) & hash]) != null) {
        // hd存放树头，tl存放树尾
        TreeNode<K,V> hd = null, tl = null;
        do {
            // 类型转换为树节点
            TreeNode<K,V> p = replacementTreeNode(e, null);
            if (tl == null)
                hd = p;
            else {
                // 双向链表
                p.prev = tl;
                tl.next = p;
            }
            tl = p;
        // 非尾节点，继续迭代
        } while ((e = e.next) != null);
        if ((tab[index] = hd) != null)
            // 真正的树化节点，逻辑比较复杂，感兴趣的自己看下
            hd.treeify(tab);
    }
}

6.2）替换树节点

// For treeifyBin 为了树化桶
TreeNode<K,V> replacementTreeNode(Node<K,V> p, Node<K,V> next) {
    return new TreeNode<>(p.hash, p.key, p.value, next);
}

4.get() 解析

4.1 原始get(k)

/**
 * Returns the value to which the specified key is mapped,
 * or {@code null} if this map contains no mapping for the key.
 *
 * <p>More formally, if this map contains a mapping from a key
 * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
 * key.equals(k))}, then this method returns {@code v}; otherwise
 * it returns {@code null}.  (There can be at most one such mapping.)
 *
 * <p>A return value of {@code null} does not <i>necessarily</i>
 * indicate that the map contains no mapping for the key; it's also
 * possible that the map explicitly maps the key to {@code null}.
 * The {@link #containsKey containsKey} operation may be used to
 * distinguish these two cases.
 *
 * @see #put(Object, Object)
 */
public V get(Object key) {
    Node<K,V> e;
    return (e = getNode(hash(key), key)) == null ? null : e.value;
}

4.2 getNode() 解析

/**
 * Implements Map.get and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(int hash, Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 判断table数组是否为空，并且槽位是否存在节点
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & hash]) != null) {
        // 判断第一个节点是否为所需节点
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        // 判断是否存在下一个节点
        if ((e = first.next) != null) {
            // 判断当前节点是否为TreeNode类型（红黑树）
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            // 循环迭代单向链表查找
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}