【C++11】并发⽀持库

🌈 个人主页：Zfox_
🔥 系列专栏：C++从入门到精通

前言：🚀 并发⽀持库

🧑‍💻 学习本节知识应该在学习了 Linux多线程博客 之后，也就是说我们并不是从零讲解并发相关的库，⽽是默认⼤家已经有进程线程的基础，所以本章节重点讲解库的使⽤，不会讲解进程线程相关的概念及基础知识。

一：🔥 thread库

thread 库⽂档 https://zh.cppreference.com/w/cpp/thread/thread 和 https://legacy.cplusplus.com/reference/thread/thread/
thread 库底层是对各个系统的线程库进⾏封装 ，如 Linux 下的 pthread 库和 Windows下 Thread 库等，所以 C++11 thread 库的第⼀个特点是可以跨平台，第⼆个特点是 Linux 和 Windows 下提供的线程库都是⾯向过程的，C++11 thread 是库⾯向对象的，并且融合了⼀些 C++11 语⾔特点，如右值引⽤的移动语义，可变模板参数等，⽤起来会更好⽤⼀些。
下⾯线程创建这⾥有 4 个构造函数，⽇常最常⽤的是第 2 个，他⽀持传⼀个可调⽤对象和参数即可，相⽐ pthread_create ⽽⾔，这⾥不再局限于只传递函数指针，其次就是参数传递也更⽅便，pthread_create 调⽤时，要传递多个参数需要打包成⼀个结构体，传结构体对象的指针过去。
另外也可以⽤第 1 个和第 4 个配合来创建线程，我们可以把右值线程对象移动构造或者移动赋值给另⼀个线程对象。
第 3 个可以看到线程对象是不⽀持拷⻉的。
join 是主线程结束前需要阻塞等待创建的从线程，否则主线程结束，进程就结束了，从线程可能还在运⾏就被强⾏终⽌了。
class thread::id 是⼀个 thread 的内部类⽤来表⽰线程 id，⽀持⽐较⼤⼩，流插⼊和提取，通过特化 hash仿函数做 unordered_map 和 unordered_set 的 id 等。底层的⻆度看 thread 本质还是封装各个平台的线程库接⼝。各个平台的线程 id 表⽰类型不同，所以只能⽤⼀个类来进⾏封装。线程对象可以通过 get_id 获取线程 id，在执⾏体内可以通过 this_thread::get_id() 获取线程 id。

default (1)
thread() noexcept;

initialization (2)
template <class Fn, class... Args>
explicit thread (Fn&& fn, Args&&... args);

copy [deleted] (3)
thread (const thread&) = delete;
copy [deleted] (3)
thread& operator= (const thread&) = delete;

move (4)
thread (thread&& x) noexcept;
move (4)
thread& operator= (thread&& rhs) noexcept;

// pthread库
int pthread_create(pthread_t *tidp, const pthread_attr_t *attr, void * (*start_rtn)(void*), void *arg);

// windows线程创建API
HANDLE CreateThread(
	LPSECURITY_ATTRIBUTES lpThreadAttributes,  //SD
	SIZE_T dwStackSize,  //initialstacksize
	LPTHREAD_START_ROUTINE lpStartAddress,  //threadfunction
	LPVOID lpParameter,  //threadargument
	DWORD dwCreationFlags,  //creationoption
	LPDWORD lpThreadId  //threadidentifier
)

void join();

#include<iostream>
#include<thread>
#include<vector>
#include<mutex>

using namespace std;

void Print(int n, int i)
{
	for (; i < n; i++)
	{
		cout << this_thread::get_id() << ":" << i << endl;
	}
	cout << endl;
}

int main()
{
	thread t1(Print, 10, 0);
	thread t2(Print, 20, 10);

	// 获取线程id
	//cout << t1.get_id() << endl;
	//cout << t2.get_id() << endl;

	t1.join();
	t2.join();

	// 获取当前运⾏线程id
	cout << this_thread::get_id() << endl;

	return 0;
}

二：🔥 this_thread

https://legacy.cplusplus.com/reference/thread/this_thread/
this_thread 是⼀个命名空间，主要封装了线程相关的 4 个全局接⼝函数。
get_id 是当前执⾏线程的线程 id。
yield 是主动让出当前线程的执⾏权，让其他线程先执⾏。此函数的确切⾏为依赖于实现，特别是取决于使⽤中的 OS 调度器机制和系统状态。例如，先进先出实时调度器（Linux的 SCHED_FIFO）会挂起当前线程并将它放到准备运⾏的同优先级线程的队列尾，⽽若⽆其他线程在同优先级，则 yield ⽆效果。
sleep_for 阻塞当前线程执⾏，⾄少经过指定的 sleep_duration。因为调度或资源争议延迟，此函数可能阻塞⻓于 sleep_duration。
sleep_until 阻塞当前线程的执⾏，直⾄抵达指定的 sleep_time。函数可能会因为调度或资源纠纷延迟⽽阻塞到 sleep_time 之后的某个时间点。
https://legacy.cplusplus.com/reference/chrono/ chrono 是⼀个计时相关的类型。
https://legacy.cplusplus.com/reference/chrono/duration/ 是⽤来管理⼀个相对时间段的类。
https://legacy.cplusplus.com/reference/chrono/time_point/ 是⽤来管理⼀个绝对时间点的类。

template <class Clock, class Duration>
void sleep_until (const chrono::time_point<Clock, Duration>& abs_time);

template <class Rep, class Period>
void sleep_for (const chrono::duration<Rep, Period>& rel_time);

this_thread::sleep_for example

#include <iostream> 	// std::cout, std::endl
#include <thread> 		// std::this_thread::sleep_for
#include <chrono> 		// std::chrono::seconds

int main()
{
	std::cout << "countdown:\n";
	for (int i = 10; i > 0; --i) {
		std::cout << i << std::endl;
		std::this_thread::sleep_for(std::chrono::seconds(1));
	}
	std::cout << "Lift off!\n";
	return 0;
}

this_thread::sleep_for example*

#include <iostream> // std::cout
#include <iomanip> 	// std::put_time
#include <thread> 	// std::this_thread::sleep_until
#include <chrono> 	// std::chrono::system_clock
#include <ctime> 	// std::time_t, std::tm, std::localtime, std::mktime

int main()
{
	using std::chrono::system_clock;
	std::time_t tt = system_clock::to_time_t(system_clock::now());

	struct std::tm* ptm = std::localtime(&tt);
	std::cout << "Current time: " << std::put_time(ptm, "%X") << '\n';

	std::cout << "Waiting for the next minute to begin...\n";
	++ptm->tm_min; ptm->tm_sec = 0;
	std::this_thread::sleep_until(system_clock::from_time_t(mktime(ptm)));

	std::cout << std::put_time(ptm, "%X") << " reached!\n";

	return 0;
}

在这里插入图片描述

三：🔥 mutex

https://legacy.cplusplus.com/reference/mutex/
mutex 是封装的互斥锁的类，⽤于保护临界区的共享数据。mutex 主要提供 lock 和 unlock 两个接⼝函数。 mutex 提供排他性⾮递归所有权语义：
- 调⽤⽅线程从它成功调⽤ lock 或 try_lock 开始，到它调⽤ unlock 为⽌占有 mutex。
- 线程占有 mutex 时，其他线程如果试图要求 mutex 的所有权，那么就会阻塞（对于 lock 的调⽤）, 对于 try_lock 就会返回 false 。
如果 mutex 在仍为任何线程所占有时即被销毁，或在占有 mutex 时线程终⽌，那么⾏为未定义。
⽰例1代码 展⽰了 mutex 的使⽤，其实如果线程对象传参给可调⽤对象时，使⽤引⽤⽅式传参，实参位置需要加上 ref(obj) 的⽅式，主要原因是 thread 本质还是系统库提供的线程 API 的封装，thread 构造取到参数包以后，要调⽤创建线程的 API，还是需要将参数包打包成⼀个结构体传参过去，那么打包成结构体时，参考包对象就会拷⻉给结构体对象，使⽤ ref 传参的参数，会让结构体中的对应参数成员类型推导为引⽤，这样才能实现引⽤传参，⽰例2代码 截取了 vs2019下 thread 库中的部分源码帮助理解。https://legacy.cplusplus.com/reference/functional/ref/?kw=ref

⽰例1：

#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>

using namespace std;

void Print(int n, int& rx, mutex& rmtx)
{
	rmtx.lock();
	for (int i = 0; i < n; i++)
	{
		// t1 t2
		++rx;
	}
	rmtx.unlock();
}

int main()
{
	int x = 0;
	mutex mtx;
	// 这⾥必须要⽤ref()传参，现成中拿到的才是x和mtx的引⽤，具体原因需要看下⾯thread源码中的分析
	// httpt2.join();
	cout << x << endl;
	return 0;
}

int main()
{
	int x = 0;
	mutex mtx;

	// 将上⾯的代码改成使⽤lambda捕获外层的对象，也就可以不⽤传参数，间接解决了上⾯的问题
	auto Print = [&x, &mtx](size_t n) {
		mtx.lock();
		for (size_t i = 0; i < n; i++)
		{
			++x;
		}
		mtx.unlock();
	};

	thread t1(Print, 1000000);
	thread t2(Print, 2000000);

	t1.join();
	t2.join();

	cout << x << endl;

	return 0;
}

template <class _Fn, class... _Args,
	enable_if_t<!is_same_v<_Remove_cvref_t<_Fn>, thread>, int> = 0>
_NODISCARD_CTOR explicit thread(_Fn&& _Fx, _Args&&... _Ax) {
	_Start(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);
}

template <class _Fn, class... _Args>
void _Start(_Fn&& _Fx, _Args&&... _Ax) {
	// 从下⾯可以看到，线程要调⽤系统库的线程，最终还是要把参数包打包成⼀个结构体对象再传给线程，所以线程中拿到的参数包值是我们传的参数包值的拷⻉，所以要⽤ref才传参才能解决问题
    using _Tuple                 = tuple<decay_t<_Fn>, decay_t<_Args>...>;
    auto _Decay_copied           = _STD make_unique<_Tuple>(_STD forward<_Fn>(_Fx), _STD forward<_Args>(_Ax)...);
    constexpr auto _Invoker_proc = _Get_invoke<_Tuple>(make_index_sequence<1 + sizeof...(_Args)>{});
    // pointer or reference to potentially throwing function passed to
	// extern C function under -EHc. Undefined behavior may occur
	// if this function throws an exception. (/Wall)

    _Thr._Hnd =
        reinterpret_cast<void*>(_CSTD _beginthreadex(nullptr, 0, _Invoker_proc, _Decay_copied.get(), 0, &_Thr._Id));

    if (_Thr._Hnd) { // ownership transferred to the thread
        (void) _Decay_copied.release();
    } else { // failed to start thread
        _Thr._Id = 0;
        _Throw_Cpp_error(_RESOURCE_UNAVAILABLE_TRY_AGAIN);
    }
}

time_mutex 跟 mutex 完全类似，只是额外提供 try_lock_for 和try_lock_untile 的接⼝，这两个接⼝跟 try_lock 类似，只是他不会⻢上返回，⽽是直接进⼊阻塞，直到时间条件到了或者解锁了就会唤醒试图获取锁资源。
recursive_mutex 跟 mutex 完全类似，recursive_mutex 提供排他性递归所有权语义：
- 调⽤⽅线程在从它成功调⽤ lock 或 try_lock 开始的时期⾥占有 recursive_mutex。此时期之内，线程可以进⾏对 lock 或 try_lock 的附加调⽤。所有权的时期在线程进⾏匹配次数的 unlock 调⽤时结束。
- 线程占有 recursive_mutex 时，若其他所有线程试图要求 recursive_mutex 的所有权，则它们将阻塞（对于调⽤ lock）或收到 false 返回值（对于调⽤ try_lock）

timed_mutex::try_lock_for example

#include <iostream> // std::cout
#include <chrono>	// std::chrono::milliseconds
#include <thread>	// std::thread
#include <mutex>	// std::timed_mutex

std::timed_mutex mtx;

void fireworks(int i)
{
	//std::cout << i;
	// waiting to get a lock: each thread prints "-" every 200ms:
	while (!mtx.try_lock_for(std::chrono::milliseconds(1000)))
	{
		std::cout << "-";
	}
	std::cout << i;

	// got a lock! - wait for 1s, then this thread prints "*"
	std::this_thread::sleep_for(std::chrono::milliseconds(5000));
	std::cout << "*\n";
	mtx.unlock();
}

int main()
{
	std::thread threads[2];

	// 利⽤移动赋值的⽅式，将创建的临时对象(右值对象)移动赋值给创建好的空线程对象
	for (int i = 0; i < 2; ++i)
		threads[i] = std::thread(fireworks, i);

	for (auto& th : threads)
		th.join();

	return 0;
}

四：🔥 lock_guard

🐳 lock_guard 是 C++11 提供的⽀持 RAII ⽅式管理互斥锁资源的类，这样可以更有效的防⽌因为异常等原因导致的死锁问题。他们的⼤致原理如下⾯模拟提供的⽰例代码1 的 LockGuard 类似。
🐳 lock_guard 的功能简单纯粹，仅仅⽀持 RAII 的⽅式管理锁对象。也可以在构造的时候通过传参 adopt_lock_t 的 adopt_lock 对象管理已经 lock 的锁对象。其次 lock_guard 类不⽀持拷⻉构造。

⽰例1

#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>

using namespace std;

template<class Mutex>
class LockGuard
{
public:
	LockGuard(Mutex& mtx)
		:_mtx(mtx)
	{
		_mtx.lock();
	}

	~LockGuard()
	{
		_mtx.unlock();
	}
private:
	Mutex& _mtx;
};

int main()
{
	int x = 0;
	mutex mtx;

	auto Print = [&x, &mtx](size_t n) {
		//lock_guard<mutex> lock(mtx);
		LockGuard<mutex> lock(mtx);

		//mtx.lock();
		for (size_t i = 0; i < n; i++)
		{
			++x;
		}
		//mtx.unlock();
		};

	thread t1(Print, 1000000);
	thread t2(Print, 2000000);

	t1.join();
	t2.join();

	cout << x << endl;
	return 0;
}

locking (1)
explicit lock_guard (mutex_type& m);

adopting (2)
lock_guard (mutex_type& m, adopt_lock_t tag);

copy [deleted](3)
lock_guard (const lock_guard&) = delete;


#include <iostream>
#include <chrono>
#include <thread>
#include <mutex>

using namespace std;

std::mutex mtx; // mutex for critical section

void print_thread_id(int id) 
{
	mtx.lock();
	std::lock_guard<std::mutex> lck(mtx, std::adopt_lock);
	std::cout << "thread #" << id << '\n';
}

int main()
{
	std::thread threads[10];

	// spawn 10 threads:
	for (int i = 0; i < 10; ++i)
		threads[i] = std::thread(print_thread_id, i + 1);

	for (auto& th : threads) th.join();

	return 0;
}

📚 在如上代码中：

lock_guard (mutex_type& m, adopt_lock_t tag);

🦅 std::adopt_lock_t 是一个标记类型，用于告诉 std::lock_guard 或 std::unique_lock，锁已经被当前线程持有，不需要再次锁定，而只需要在作用域结束时自动释放锁。

五：🔥 unique_lock

🐳 unique_lock 也是 C++11 提供的⽀持 RAII ⽅式管理互斥锁资源的类，相⽐ lock_guard 他的功能⽀持更丰富复杂。这是 unique_lock 的 https://legacy.cplusplus.com/reference/mutex/unique_lock/
🦈 unique_lock ⾸先在构造的时候传不同的 tag，⽤以⽀持在构造的时候不同的⽅式处理锁对象
unique_lock ⾸先在构造的时候传时间段和时间点，⽤来管理 time_mutex 系统，构造时调⽤ try_lock_for 和 try_lock_until
unique_lock 不⽀持拷⻉和赋值，⽀持移动构造和移动赋值。
unique_lock 还提供了 lock / try_lock/ unlock 等系列的接⼝等系统的接⼝。
unique_lock 还可以通过 operator bool 去检查是否 lock 了锁对象。
- 可以直接使用 if 判断是否 lock

default (1)
unique_lock() noexcept;

locking (2)
explicit unique_lock (mutex_type& m);

try-locking (3)
unique_lock (mutex_type& m, try_to_lock_t tag);

deferred (4)
unique_lock (mutex_type& m, defer_lock_t tag) noexcept;

adopting (5)
unique_lock (mutex_type& m, adopt_lock_t tag);

locking for (6)
template <class Rep, class Period>
unique_lock (mutex_type& m, const chrono::duration<Rep,Period>& rel_time);

locking until (7)
template <class Clock, class Duration>
unique_lock (mutex_type& m, const chrono::time_point<Clock,Duration>& abs_time);

copy [deleted] (8)
unique_lock (const unique_lock&) = delete;

move (9)
unique_lock (unique_lock&& x);

六：🔥 lock和 try_lock

lock 是⼀个函数模板，可以⽀持对多个锁对象同时锁定，如果其中⼀个锁对象没有锁住，lock 函数会把已经锁定的对象解锁⽽进⼊阻塞，直到锁定所有的所有的对象。
try_lock 也是⼀个函数模板，尝试对多个锁对象进⾏同时尝试锁定，如果全部锁对象都锁定了，返回 -1，如果某⼀个锁对象尝试锁定失败，把已经锁定成功的锁对象解锁，并则返回这个对象的下标（第⼀个参数对象，下标从1开始算）。

template <class Mutex1, class Mutex2, class... Mutexes>
void lock (Mutex1& a, Mutex2& b, Mutexes&... cde);

template <class Mutex1, class Mutex2, class... Mutexes>
int try_lock (Mutex1& a, Mutex2& b, Mutexes&... cde);

// std::lock example
#include <iostream> 	// std::cout
#include <thread> 		// std::thread
#include <mutex> 		// std::mutex, std::lock

std::mutex foo, bar;

void task_a() {
	 // foo.lock(); bar.lock(); // replaced by:
	 std::lock(foo, bar);
	 std::cout << "task a\n";
	 foo.unlock();
	 bar.unlock();
}

void task_b() {
	 // bar.lock(); foo.lock(); // replaced by:
	 std::lock(bar, foo);
	 std::cout << "task b\n";
	 bar.unlock();
	 foo.unlock();
}

int main()
{
	foo.lock();
	std::thread th1(task_a);
	std::thread th2(task_b);
	std::cout << "xxxxxx" << std::endl;
	bar.lock();
	foo.unlock();
	std::cout << "yyyyyy" << std::endl;
	bar.unlock();
	th1.join();
	th2.join();
	return 0;
}

// std::lock example
#include <iostream> 		// std::cout
#include <thread> 		// std::thread
#include <mutex> 		// std::mutex, std::try_lock

std::mutex foo, bar;

void task_a() {
	foo.lock();
	std::cout << "task a\n";
	bar.lock();
	// ...
	foo.unlock();
	bar.unlock();
}

void task_b() {
	int x = try_lock(bar, foo);
	if (x == -1) {
		std::cout << "task b\n";
	 	// ...
	 	bar.unlock();
	 	foo.unlock();
	 }
	 else {
	 	std::cout << "[task b failed: mutex " << (x ? "foo" : "bar")  << " locked]\n";
	 }
}

int main()
{
	std::thread th1(task_a);
	std::thread th2(task_b);
	
	th1.join();
	th2.join();
	
	return 0;
}

七：🔥 call_once

多线程执⾏时，让第⼀个线程执⾏ Fn (可执行对象) ⼀次，其他线程不再执⾏ Fn

template <class Fn, class... Args>
void call_once (once_flag& flag, Fn&& fn, Args&&... args);

call_once example

#include <iostream>
#include <thread>
#include <chrono>
#include <mutex>
 
int winner;
void set_winner (int x) { winner = x; }
std::once_flag winner_flag;

void wait_1000ms (int id) 
{
	// count to 1000, waiting 1ms between increments:
	for (int i=0; i<1000; ++i)
		std::this_thread::sleep_for(std::chrono::milliseconds(1));
	// claim to be the winner (only the first such call is executed):
 
	std::call_once (winner_flag,set_winner,id);
}

int main ()
{
	std::thread threads[10];
	// spawn 10 threads:
	for (int i=0; i<10; ++i)
		threads[i] = std::thread(wait_1000ms,i+1);

	std::cout << "waiting for the first among 10 threads to count 1000 ms...\n";
	
	for (auto& th : threads) 
		th.join();	
	
	std::cout << "winner thread: " << winner << '\n';
	
	return 0;
}

🌊 通过以上代码就可以测试出哪个线程跑的最快

八：🔥 atomic

🪜 atomic 是⼀个模板的实例化和全特化均定义的原⼦类型，他可以保证对⼀个原⼦对象的操作是线程安全的。
atomic 对T类型的要求模板可⽤任何满⾜ 可复制构造 (CopyConstructible) 及 可复制赋值 (CopyAssignable) 的可平凡复制 (TriviallyCopyable) 类型 T 实例化，T类型⽤以下⼏个函数判断时，如果⼀个返回 false，则⽤于 atomic 不是原⼦操作。

std::is_trivially_copyable<T>::value
std::is_copy_constructible<T>::value
std::is_move_constructible<T>::value
std::is_copy_assignable<T>::value
std::is_move_assignable<T>::value
std::is_same<T, typename std::remove_cv<T>::type>::value

atomic 对于整形和指针⽀持基本加减运算和位运算，具体如下图 :
load 和 store 可以原⼦的读取和修改 atomic 封装存储的T对象。
atomic 的原理主要是硬件层⾯的⽀持，现代处理器提供了原⼦指令来⽀持原⼦操作。例如，在 x86 架构中有 CMPXCHG（⽐较并交换）指令。这些原⼦指令能够在⼀个不可分割的操作中完成对内存的读取、⽐较和写⼊操作，简称 CAS， Compare And Swap。另外为了处理多个处理器缓存之间的数据⼀致性问题，硬件采⽤了缓存⼀致性协议，当⼀个 atomic 操作修改了⼀个变量的值，缓存⼀致性协议会确保其他处理器缓存中的相同变量副本被正确地更新或标记为⽆效。

📚 具体可以参考下⾯的代码结合理解⼀下。

// gcc⽀持的CAS接⼝
bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval);
type __sync_val_compare_and_swap (type *ptr, type oldval type newval);
// Windows⽀持的CAS接⼝
InterlockedCompareExchange ( __inout LONG volatile *Target, 
							__in LONG Exchange, 
							__in LONG Comperand);
							
// C++11⽀持的CAS接⼝
template <class T>
bool atomic_compare_exchange_weak (atomic<T>* obj, T* expected, T val) 
noexcept;

template <class T>
bool atomic_compare_exchange_strong (atomic<T>* obj, T* expected, T val) 
noexcept;

// C++11中atomic类的成员函数
bool compare_exchange_weak (T& expected, T val,
		memory_order sync = memory_order_seq_cst) noexcept;
bool compare_exchange_strong (T& expected, T val,
		memory_order sync = memory_order_seq_cst) noexcept;

C++11 的 CAS 操作⽀持，atomic 对象跟 expected 按位⽐较相等，则⽤ val 更新 atomic 对象并返回值 true；若 atomic 对象跟 expected 按位⽐较不相等，则更新 expected 为当前的 atomic 对象并返回值 false
compare_exchange_weak 在某些平台上，即使原⼦变量的值等于 expected，也可能“虚假地”失败（即返回 false）。这种失败是由于底层硬件或编译器优化导致的，但不会改变原⼦变量的。
compare_exchange_strong 保证在原⼦变量的值等于 expected 时不会虚假地失败。只要原⼦变量的值等于 expected，操作就会成功。compare_exchange_weak 在某些平台上可能⽐ compare_exchange_strong 更快。compare_exchange_weak 可能会虚假的失败主要是由于硬件层间的缓存⼀致性和编译器优化等等， compare_exchange_strong 要避免这些原因就要付出⼀定的代价，⽐如要使⽤硬件的缓存⼀致性协议（如 MESI 协议）。
关于 CPU 缓存的⼀些相关知识，这⾥我们贴⼀篇陈皓⼤佬的博客，有兴趣的可以扩展了解⼀下， 与程序员相关的 CPU 缓存知识 | 酷壳 - CoolShell
关于⽆锁编程的⼀些知识，这⾥我们也再贴⼀篇陈皓⼤佬的博客，有兴趣的可以扩展了解⼀下，⽆锁队列的实现 | 酷壳 - CoolShell
在 C++11 标准库中， std::atomic 提供了多种内存顺序（ memory_order ）选项，⽤于控制原⼦操作的内存同步⾏为。这些内存顺序选项允许开发者在性能与正确性之间进⾏权衡，特别是在多线程编程中。以下是 std::atomic ⽀持的六种内存顺序选项：

memory_order_relaxed 最宽松的内存顺序，仅保证原⼦操作的原⼦性，不提供任何同步或顺序约束。使⽤场景：适⽤于不需要同步的场景，例如计数器或统计信息。

std::atomic<int> x(0);
x.store(42, std::memory_order_relaxed); 	// 仅保证原⼦性

memory_order_consume 限制较弱的内存顺序，仅保证依赖于当前加载操作的数据的可⻅性。通常⽤于数据依赖的场景。使⽤场景：适⽤于某些特定的依赖链场景，但实际使⽤较少。

std::atomic<int*> ptr(nullptr);
int* p = ptr.load(std::memory_order_consume);
if (p) {
	int value = *p; // 保证 p 指向的数据是可⻅的
}

memory_order_acquire 保证当前操作之前的所有读写操作（在当前线程中）不会被重排序到当前操作之后。通常⽤于加载操作。使⽤场景：⽤于实现锁或同步机制中的 “获取” 操作

std::atomic<bool> flag(false);
int data = 0;

// 线程 1
data = 42;
flag.store(true, std::memory_order_release);

// 线程 2
while (!flag.load(std::memory_order_acquire)) {}
std::cout << data; // 保证看到 data = 42

memory_order_release 保证当前操作之后的所有读写操作（在当前线程中）不会被重排序到当前操作之前。通常⽤于存储操作。使⽤场景：⽤于实现锁或同步机制中的 “释放” 操作。

std::atomic<bool> flag(false);
int data = 0

// 线程 1
data = 42;
flag.store(true, std::memory_order_release); // 保证 data = 42 在 flag = true 之前可⻅

// 线程 2
while (!flag.load(std::memory_order_acquire)) {}
std::cout << data; // 保证看到 data = 42

memory_order_acq_rel 结合了 memory_order_acquire 和 memory_order_release 的语义。适⽤于读-修改-写操作（如 fetch_add 或 compare_exchange_strong）。使⽤场景：⽤于需要同时实现“获取”和“释放”语义的操作。

std::atomic<int> x(0);
x.fetch_add(1, std::memory_order_acq_rel); // 保证前后的操作不会被重排序

memory_order_seq_cst 最严格的内存顺序，保证所有线程看到的操作顺序是⼀致的（全局顺序⼀致性）。默认的内存顺序。使⽤场景：适⽤于需要强⼀致性的场景，但性能开销较⼤。

std::atomic<int> x(0);
x.store(42, std::memory_order_seq_cst); // 全局顺序⼀致性
int value = x.load(std::memory_order_seq_cst);

内存顺序的关系，宽松到严格：memory_order_relaxed < memory_order_consume < memory_order_acquire < memory_order_release < memory_order_acq_rel < memory_order_seq_cst 。宽松的内存顺序（如 memory_order_relaxed ）性能最好，但同步语义最弱。严格的内存顺序（如 memory_order_seq_cst ）性能最差，但同步语义最强。

总结⼀下，根据具体需求选择合适的内存顺序，可以在保证正确性的同时最⼤化性能。
atomic_flag 是⼀种原⼦布尔类型。与所有 atomic 的特化不同，它保证是免锁的。与 atomic<bool> 不同，atomic_flag 不提供加载或存储操作。主要提供 test_and_set 操作将 flag 原⼦的设置为 true 并返回之前的值，clear 原⼦将 flag 设置为 false。下⾯⼀个样例演⽰了⽤ atomic_flag 实现⾃旋锁。

#include <atomic>
#include <iostream>
#include <thread>
#include <vector>

using namespace std;


atomic<int> acnt;
// atomic_int acnt;
int cnt;

// 原子++如何实现?
void Add1(atomic<int>& cnt)
{
	int old = cnt.load();
	// 如果cnt的值跟old相等，则将cnt的值设置为old+1，并且返回true，这组操作是原子的
	// 那么如果在1oad和compare exchange_weak操作之间cnt对象被其他线程改了
	// 则old和cnt不相等，则将old的值改为cnt的值，并目返回false。
	while (!atomic_compare_exchange_weak(&cnt, &old, old + 1));
	// while(!cnt.compare_exchange_weak(old,old + 1));
}

void f()
{
	for (int n = 0; n < 100000; ++n)
	{
		// ++acnt;
		// Add1的用 CAS 模拟atomic的operator++原子操作 结果是一样的
		Add1(acnt);
		++cnt;
	}
}

int main()
{
	std::vector<thread> pool;

	for (int i = 0; i < 4; i++)
	{
		pool.emplace_back(f);
	}

	for (auto& e : pool)
	{
		e.join();
	}

	cout << "原子计数器为: " << acnt << '\n' << "非原子计数器为: " << cnt << '\n';

	return 0;
}

struct Date
{
	int _year = 1;
	int _month = 1;
	int _day = 1;
};

template<class T>
void check()
{
	cout << typeid(T).name() << endl;
 	cout << std::is_trivially_copyable<T>::value << endl;
	cout << std::is_copy_constructible<T>::value << endl;
 	cout << std::is_move_constructible<T>::value << endl;
 	cout << std::is_copy_assignable<T>::value << endl;
 	cout << std::is_move_assignable<T>::value << endl;
 	cout << std::is_same<T, typename std::remove_cv<T>::type>::value << endl << endl;
}

int main()
{
	check<int>();
 	check<double>();
	check<int*>();
	check<Date>();
	check<Date*>();
	check<string>();
	check<string*>();

	return 0;
}

⾃旋锁（SpinLock）

#include <atomic>
#include <thread>
#include <iostream>
#include <vector>

// ⾃旋锁（SpinLock）是⼀种忙等待的锁机制，适⽤于锁持有时间⾮常短的场景。
// 在多线程编程中，当⼀个线程尝试获取已被其他线程持有的锁时，⾃旋锁会让该
// 线程在循环中不断检查锁是否可⽤，⽽不是进⼊睡眠状态。这种⽅式可以减少上
// 下⽂切换的开销，但在锁竞争激烈或锁持有时间较⻓的情况下，会导致CPU资源的浪费。
// 以下是使⽤C++11实现的⼀个简单⾃旋锁⽰例：
class SpinLock
{
private:
	// ATOMIC_FLAG_INIT默认初始化为false
	std::atomic_flag flag = ATOMIC_FLAG_INIT;
public:
	void lock()
	{
		// test_and_set将内部值设置为true，并且返回之前的值
		// 第⼀个进来的线程将值原⼦的设置为true，返回false
		// 后⾯进来的线程将原⼦的值设置为true，返回true，所以卡在这⾥空转，
		// 直到第⼀个进去的线程unlock，clear，将值设置为false
		while (flag.test_and_set(std::memory_order_acquire));
	}

	void unlock()
	{
		// clear将值原⼦的设置为false
		flag.clear(std::memory_order_release);
	}
};

// 测试⾃旋锁
void worker(SpinLock& lock, int& sharedValue) {
	lock.lock();

	// 模拟⼀些⼯作
	for (int i = 0; i < 1000000; ++i) {
		++sharedValue;
	}

	lock.unlock();
}

int main() 
{
	SpinLock lock;
	int sharedValue = 0;
	std::vector<std::thread> threads;

	// 创建多个线程
	for (int i = 0; i < 4; ++i) {
		threads.emplace_back(worker, std::ref(lock), std::ref(sharedValue));
	}

	// 等待所有线程完成
	for (auto& thread : threads) {
		thread.join();
	}
	std::cout << "Final shared value: " << sharedValue << std::endl;

	return 0;
}

九：🔥 condition_variable

🪜 condition_variable 需要配合互斥锁系列进⾏使⽤，主要提供 wait 和 notify 系统接⼝。
wait 需要传递⼀个 unique_lock<mutex> 类型的互斥锁，wait 会阻塞当前线程直到被 notify。在进⼊阻塞的⼀瞬间，会解开互斥锁，⽅便其他线程获取锁，访问条件变量。当被 notify 唤醒时，他会同时获取到锁，再继续往下运⾏。
notify_one 会唤醒当前条件变量上等待的其中⼀个线程，使⽤时他也需要⽤互斥锁保护，如果没有现成阻塞等待，他啥事都不做；notify_all 会唤醒当前条件变量上等待的所有线程线程。
condition_variable_any 类是 std::condition_variable 的泛化。相对于只在 std::unique_lock<std::mutex> 上⼯作的 std::condition_variable， condition_variable_any 能在任何满⾜可基本锁定 (BasicLockable) 要求的锁上⼯作。

condition_variable::notify_all

#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable

std::mutex mtx;
std::condition_variable cv;
bool ready = false;

void print_id(int id) 
{
	std::unique_lock<std::mutex> lck(mtx);

	while (!ready)
		cv.wait(lck);

	// ...
	std::cout << "thread " << id << '\n';
}

void go() 
{
	std::unique_lock<std::mutex> lck(mtx);
	ready = true;

	// 通知所有阻塞在条件变量上的线程
	cv.notify_all();
}

int main()
{
	std::thread threads[10];

	// spawn 10 threads:
	for (int i = 0; i < 10; ++i)
		threads[i] = std::thread(print_id, i);

	std::cout << "10 threads ready to race...\n";

	std::this_thread::sleep_for(std::chrono::milliseconds(100));

	go(); // go!
	for (auto& th : threads)
		th.join();

	return 0;
}

下⾯演⽰⼀个经典问题，两个线程交替打印奇数和偶数

🧐 分析通过条件变量和锁是如何保证交替打印的

情况1：t1 先启动，t2 过了⼀会才启动(未启动或者还在排队)

t1 启动以后先获取锁，flag 是 true 不会被条件变量阻塞，打印 i 为 0，flag 修改为 false，i 修改为 2，再⽤条件变量唤醒其他阻塞线程，但是没有线程等待，循环再继续，再次获取锁，flag 刚修改为 false 了，这时会阻塞在条件变量上，并且解锁，这⾥的逻辑保证了 t1 不会连续打印。
t2 这时开始运⾏，先获取锁，flag 被t1修改为false了所以t2不会被条件变量阻塞，t1 打印 j 为1， flag 修改为 true，j 修改为3，再⽤条件变量唤醒其他阻塞线程，t1 被唤醒。那么这⾥ t1 被唤醒以后，也是需要分配时间⽚排队执⾏，这时有 2 种情况，第⼀种 t1 没有⽴即执⾏，t2 继续执⾏，t2 获取锁，但是 flag 为 true，所以阻塞在条件变量并且解锁，过⼀会 t1 开始执⾏了，flag 为 true 不会被条件变量继续阻塞，打印 2，继续上述循环逻辑，就交替打印了。第⼆种 t1 ⽴即执⾏，t1 抢占到锁，flag 为 true 不会被条件变量继续阻塞，打印 2，i 修改为 4，flag 修改为 false，再⽤条件变量唤醒其他阻塞线程，但是没有线程被阻塞，再继续循环逻辑就是 t1 和 t2 新⼀轮谁先执⾏或者抢到锁资源的逻辑了，这样也实现了交替打印。

**情况2：t2 先启动，t1 过了⼀会才启动(未启动或者还在排队)

t2 启动以后先获取锁，flag 是 true 会被条件变量阻塞，并且同时解锁。
⼀会后，t1 开始运⾏，获取到锁资源，flag 是 true 不会被条件变量阻塞，打印 i 为 0，flag 修改为 false，i 修改为 2，再⽤条件变量唤醒阻塞线程t2。跟上⾯类似，t2 被唤醒以后也是需要分配时间⽚排队执⾏，这时有 2种情况，第⼀种 t2 没有⽴即执⾏，t1 继续执⾏循环，获取锁，但是 flag 为 false，所以阻塞在条件变量并且解锁。过⼀会 t2 开始执⾏了，flag 为 false不会被条件变量继续阻塞，打印 1，j 修改为 3，flag 修改为 true，唤醒阻塞线程t1，这时跟上述逻辑类似，循环往复，就可以实现交替打印了。第⼆种t2 ⽴即执⾏，t2 抢到锁，flag 为 false 不会被条件变量继续阻塞，打印 1，j修改为 3，flag 修改为 true，唤醒其他阻塞线程，这会没有线程被其他条件变量阻塞，再继续循环逻辑就是 t1 和 t2 新⼀轮谁先执⾏或者抢到锁资源的逻辑了，这样也实现了交替打印。

**情况3：t1 和 t2 ⼏乎同时启动

这种情况，本质就是两个线程抢夺先锁资源，t1 先抢到就类似情况1， t2 先抢到就类似情况2，这⾥就不再细节分析了。

下⾯演⽰⼀个经典问题，两个线程交替打印奇数和偶数

#include <iostream> // std::cout
#include <thread> // std::thread
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable

using namespace std;

int main()
{
	std::mutex mtx;
	condition_variable c;
	int n = 100;
	bool flag = true;

	// 第⼀个打印的是t1打印0
	thread t1([&]() {
		int i = 0;
		while (i < n)
		{
			unique_lock<mutex> lock(mtx);
			// flag == false t1⼀直阻塞
			// flag == true t1不会阻塞
			while (!flag)
			{
				c.wait(lock);
			}
			cout << i << endl;
			flag = false;
			i += 2; // 偶数
			c.notify_one();
		}
		});

	// this_thread::sleep_for(std::chrono::milliseconds(3000));
	thread t2([&]() 
		{
		int j = 1;
		while (j < n)
		{
			unique_lock<mutex> lock(mtx);

			// 只要flag == true t2⼀直阻塞
			// 只要flag == false t2不会阻塞
			while (flag)
				c.wait(lock);
			cout << j << endl;

			j += 2; // 奇数
			flag = true;

			c.notify_one();
		}
		});

	t1.join();
	t2.join();

	return 0;
}