Go语言入门之Map详解

1.基础定义

map是一个无序的，k-v键值对格式的集合

（1）特点

类型特点：map为引用类型，所以在函数中更新value值会永久改变
顺序特点：map的遍历是无序的，因为底层是哈希表，哈希表无序
初始化使用：0值或者未初始化值为nil，未初始化不可赋值使用，否则直接panic
键值对：key和value总是成对出现，key是唯一的，可以是任何可比较类型
动态性：可以在运行时动态地增加或删除键值对，而不需要预先声明大小
快速查找：map提供了快速查找、插入和删除操作，平均时间复杂度O(1)
并发：非线程安全的，保证安全需要加锁

（2）定义声明

var name map[key_type]value_type

name：变量名
key_type：键的类型
value_type：值的类型

// 方式一
var m map[int]string = map[int]string{}

// 方式二
m := map[int]string{
    1 : "老一",
    2 : "老二",
    3 : "老三",
}

// 方式三：5代表容量，也就是在内存中占用多大的空间，可以省略
m := make(map[int]string,5)

2.基本使用

（1）添加元素

1.最常见的就是通过字面量声明map的时候进行添加，如上方式2
2.其次是直接给指定键设置对应的值

mapName[key] = value

// 假设map名为m，key为int，value为string
m[5] = "老五"

（2）删除元素

根据键删除元素，删除不存在的key也不会报错

delete(mapName, key)  

// 假设map名为m，key为int，value为string
delete(m, 5)

（3）修改元素

修改直接修改指定键对应的值就可以

mapName[key] = newValue 

// 假设map名为m，key为int，value为string
m[5] = "五"

（4）获取元素

根据键获取值，ok 为是否找到的标志位，类型为布尔

如果未找到值，不会报错，会返回对应类型的空值

value, ok := mapName[key]
if !ok {
		fmt.Println(ok)
	}

（5）遍历所有元素

注意：map的遍历是无序的

for key, value := range myMap {
   // 处理每对键值
}

// 例子
for i, s := range m {
		fmt.Println(i, s)
	}

3.底层原理

golang语言的map底层本质是用hashmap进行实现的，所以map本质上是哈希表。

哈希表是一种使用哈希函数组织数据，以支持快速插入和搜索的数据结构。

哈希函数，又名散列函数，是一种将任意长度的输入（如字符串）通过特定的散列算法，变换成固定长度的输出的函数。通常会使用类似数组的形式来存储哈希值，从而保证哈希值的访问性能。

如果输入的范围超出映射输出的范围，就可能会导致不同的输入得到相同的输出，这就是哈希冲突。

解决这种问题通常两种方式：开放地址法和拉链法

（1）map的实现方案

开放地址法：

通常使用数组实现数据结构

1.首先数组是由不同的哈希值组成，称为哈希表
2.然后进行很多键，进行哈希函数确认地址放入相应位置，如果哈希表的这个槽位已经被占用，使用探测序列（如线性探测、二次探测或双重散列等）来找到下一个可用的槽位，并将冲突的键存储在那里。

缺点：

这种方法要求更多的空间来解决冲突，因为不仅要存储数据，还需要额外的空间来解决碰撞。

拉链法（go语言的map使用了该方法）：

通常使用数组和链表作为底层数据结构

1.首先数组是由不同的哈希值组成，称为哈希表，哈希表每个槽位都将存储一个链表
2.然后会进来许多键，不同键进行hash后，求模算出hash值，链接到数组上，哈希值相同的情况（哈希碰撞），新进来的键就会挂在已有键链表的后面
3.当需要查找特定键时，首先使用哈希函数确定其位置，然后在该位置的链表上进行线性搜索，直到找到匹配的键或者达到链表的末尾。

数组不同索引处链接的链表也被称之为桶(Bucket)

（2）map的底层结构

hmap

type hmap struct {
 count     int // 当前哈希表中的元素数量，即键值对数量，可用内置函数len()获取
 flags     uint8  // 标志位，标记map状态和属性的字段，如正在迭代等状态
 B         uint8  // 表示哈希表桶（buckets）的数量为2的B次方
 noverflow uint16 // 溢出桶的大致数量，扩容时会用到
 hash0     uint32 // 哈希种子，对key做哈希是加入种子计算哈希值，确保map安全性

 buckets    unsafe.Pointer // 存储桶数组的指针
 oldbuckets unsafe.Pointer // 扩容时用于保存旧桶数组的指针 , 大小为新桶数组的一半
 nevacuate  uintptr        // 扩容时的迁移进度器，迁移桶下标小于此值说明完成迁移

 extra *mapextra // 溢出桶的指针，指向mapextra结构体，用于存储一些额外的字段和信息
}

// mapextra 处理桶溢出的结构体
type mapextra struct {
 overflow    *[]*bmap    // 溢出桶数组指针，仅当key和elem非指针时才使用
 oldoverflow *[]*bmap    // 旧的溢出桶数组指针，仅当key和elem非指针时才使用

 nextOverflow *bmap      // 下一个可用的溢出桶地址
}

bmap

在源码中，bmap类型只有一个tophash字段。但在编译时期，Go编译器会根据用户代码自动注入相应的key，value等结构

表面的bmap

type bmap struct {
	// tophash generally contains the top byte of the hash value
	// for each key in this bucket. If tophash[0] < minTopHash,
	// tophash[0] is a bucket evacuation state instead.
	tophash [bucketCnt]uint8
	// Followed by bucketCnt keys and then bucketCnt elems.
	// NOTE: packing all the keys together and then all the elems together makes the
	// code a bit more complicated than alternating key/elem/key/elem/... but it allows
	// us to eliminate padding which would be needed for, e.g., map[int64]int8.
	// Followed by an overflow pointer.
}

实际的bmap

// 编译期间会动态地创建一个新的结构：
type bmap struct {
   topbits  [8]uint8   // 这里存储哈希值的高八位，用于在确定key的时候快速试错，加快增删改查寻址效率,有时候也叫tophash
   keys     [8]keytype   // 存储key的数组，这里bmap最多存储8个键值对
   elems   [8]valuetype    // 存储value的数组，这里bmap也最多存储8个键值对
   ...
   overflow uintptr     // 溢出桶指针
}

map底层图解

在这里插入图片描述

map的扩容

在go语言中，map的扩容是自动进行的，用于维护map的性能

首先，map在写入时会通过runtime.mapassign判断是否需要扩容

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
 // If we hit the max load factor or we have too many overflow buckets,
 // and we're not already in the middle of growing, start growing.
 if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {
  hashGrow(t, h)
  goto again // Growing the table invalidates everything, so try again
 }
...
}

// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {
 return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}

func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {
 if B > 15 {
  B = 15
 }
 return noverflow >= uint16(1)<<(B&15)
}

根据上面代码判断扩容有下面两个条件：

负载因子超过阈值6.5：overLoadFactor(h.count+1, h.B) ，负载因子 = 元素数量÷桶数量
使用了太多溢出桶（超出32768）：tooManyOverflowBuckets(h.noverflow, h.B))

扩容方式：

增量扩容：

当负载因子过大时，就新建一个bucket，新的bucket长度是原来的2倍，然后旧bucket数据搬迁到新的bucket。

等量扩容

数据不多，但是溢出桶太多。扩容时buckets数量不变，重新做一遍类似增量扩容的搬迁动作，把松散的键值对重新排列一次，以使bucket的使用率更高，进而保证更快的存取

扩容步骤：

1.新桶数组：新建一个大小为原来两倍的新的桶数组,map会标记为扩容状态。

func hashGrow(t *maptype, h *hmap) {
 ...
 // 原有桶设置给oldbuckets
 oldbuckets := h.buckets   
 // 创建新桶
 newbuckets, nextOverflow := makeBucketArray(t, h.B+bigger, nil)

 flags := h.flags &^ (iterator | oldIterator)
 if h.flags&iterator != 0 {
  flags |= oldIterator
 }
 // commit the grow (atomic wrt gc)
 h.B += bigger
 h.flags = flags
 h.oldbuckets = oldbuckets
 h.buckets = newbuckets
 h.nevacuate = 0
 h.noverflow = 0

 ...
}

2.重新哈希：用oldbuckets指向原来的桶数组，buckets指向新的桶数组，遍历旧的桶数组中的所有键值对，并使用哈希函数重新计算每个键的位置，将它们插入到新的桶数组中。

// 这个是mapdelete函数中的处理迁移的位置
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {
...
 if h.growing() {
  // 
  growWork(t, h, bucket)
 }
...
}

3.逐步迁移：为了避免在扩容时暂停整个程序，Go的Map实现可能会选择渐进式驱逐进行迁移键值对。这意味着在扩容期间，旧的桶数组和新的桶数组会同时存在，新插入的键值对会直接放入新的桶中，而对旧桶的访问会触发迁移操作。

// 进入后是一个简单的判断，之后的evacuate是核心逻辑处理，特别多，感兴趣自己看源码
func growWork(t *maptype, h *hmap, bucket uintptr) {
	// make sure we evacuate the oldbucket corresponding
	// to the bucket we're about to use
	evacuate(t, h, bucket&h.oldbucketmask())

	// evacuate one more oldbucket to make progress on growing
	if h.growing() {
		evacuate(t, h, h.nevacuate)
	}
}