干爆源码系列之Step by step lldb/gdb调试多线程

Step by step lldb/gdb调试多线程

0.叙谈1.断点分析2.多线程切换 2.1 并发队列 2.1.1 两次入队 2.2 线程调度 2.2.1 执行build端子MetaPipeline 2.2.1.1 Thread6调度第一个PipelineInitializeTask 2.2.1.2 Thread7调度第二个PipelineInitializeTask 2.2.1.3 Thread8调度build端PipelineEvent 2.2.2 执行下一个Metapipeline

书接上回，我们分析了InitializeInternal的ScheduleEvents函数，了解了如何从MetaPipeline构建各种Event事件，上一节中还提到在最终会进行调度，对无依赖节点发起Schedule()，那么本节就继续这一内容，详细从多线程角度看看这些Event对应的Task如何被调度的呢？

本节将会从lldb/gdb角度Step by step断点调试分析多线程如何玩转task执行。

0.叙谈

下面展示了一段无依赖的事件调度，初始化阶段会循环所有events，找到无依赖的event，并发起Schedule()，这里的event是PipelineInitializeEvent，两个MetaPipline各自一个，按照顺序入并发队列，接下来详细聊聊如何调试以及具体怎么调度多任务的呢？

for (auto &event : events) {
  if (!event->HasDependencies()) {
    event->Schedule();
  }
}

1.断点分析

下面是本次调试的断点list，每个都break一下便可以快速学习了。

Enqueue函数

Task入队的函数。

b task_scheduler.cpp:47

ExecuteForever函数

线程从队列中获取Task的函数。

b task_scheduler.cpp:135

打上这两个断点后，就可以调试多线程了。

Event::CompleteDependency函数

处理事件依赖关系，能够定位当前线程处理了哪些event。

Event::Finish函数

能够知道当前事件的依赖有哪些，下一步处理哪一个，这个跟上面的函数一起用。

2.多线程切换

2.1 并发队列

初始化时，当调用Schedule()后，会把PipelineInitializeTask放入并发队列中，见下图的queue(蓝色部分)。

2.1.1 两次入队

1）入队第一个PipelineInitializeTask

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 9.1
    frame #0: 0x00000001160be794 duckdb`duckdb::ConcurrentQueue::Enqueue(this=0x0000616000000680, token=0x00006070000414e0, task=std::__1::shared_ptr<duckdb::Task>::element_type @ 0x000060600007cb20 strong=1 weak=2) at task_scheduler.cpp:47:8
   44   
   45   void ConcurrentQueue::Enqueue(ProducerToken &token, shared_ptr<Task> task) {
   46    lock_guard<mutex> producer_lock(token.producer_lock);
-> 47    if (q.enqueue(token.token->queue_token, std::move(task))) {
   48     semaphore.signal();
   49    } else {
   50     throw InternalException("Could not schedule task!");
(lldb) p task.get()
(duckdb::PipelineInitializeTask *) $126 = 0x000060600007cb20
(lldb) p ((duckdb::PipelineInitializeTask *)task.get())->event->PrintPipeline()

此时在sql终端输出的pipeline为：

┌───────────────────────────┐
│      RESULT_COLLECTOR     │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│            name           │
│           score           │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│        stu_id = id        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 4           │
│          Cost: 4          │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           score           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           stu_id          │
│           score           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 4           │
└───────────────────────────┘

2）入队第二个PipelineInitializeTask

(lldb) c
Process 23807 resuming
Process 23807 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 9.1
    frame #0: 0x00000001160be794 duckdb`duckdb::ConcurrentQueue::Enqueue(this=0x0000616000000680, token=0x00006070000414e0, task=std::__1::shared_ptr<duckdb::Task>::element_type @ 0x000060600007d300 strong=1 weak=2) at task_scheduler.cpp:47:8
   44   
   45   void ConcurrentQueue::Enqueue(ProducerToken &token, shared_ptr<Task> task) {
   46    lock_guard<mutex> producer_lock(token.producer_lock);
-> 47    if (q.enqueue(token.token->queue_token, std::move(task))) {
   48     semaphore.signal();
   49    } else {
   50     throw InternalException("Could not schedule task!");
(lldb) p task.get()
(duckdb::PipelineInitializeTask *) $127 = 0x000060600007d300
(lldb) p ((duckdb::PipelineInitializeTask *)task.get())->event->PrintPipeline()

此时的pipeline为:

┌───────────────────────────┐
│         HASH_JOIN         │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           INNER           │
│        stu_id = id        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 4           │
│          Cost: 4          │
└─────────────┬─────────────┘                             
┌─────────────┴─────────────┐
│         SEQ_SCAN          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          student          │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             id            │
│            name           │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│           EC: 3           │
└───────────────────────────┘

2.2 线程调度

2.2.1 执行build端子MetaPipeline

2.2.1 Thread6调度第一个PipelineInitializeTask

继续c之后，可以看到出队，第一个PipelineInitializeTask出来了，会发现被thread 6调度。

* thread #6, stop reason = breakpoint 8.1
    frame #0: 0x00000001160c19cc duckdb`duckdb::TaskScheduler::ExecuteForever(this=0x000060c000000280, marker=0x0000602000003fb0) at task_scheduler.cpp:135:26
   132    // wait for a signal with a timeout
   133    queue->semaphore.wait();
   134    if (queue->q.try_dequeue(task)) {
-> 135     auto execute_result = task->Execute(TaskExecutionMode::PROCESS_ALL);
   136  
   137     switch (execute_result) {
   138     case TaskExecutionResult::TASK_FINISHED:
(lldb) p task.get()
(duckdb::PipelineInitializeTask *) $128 = 0x000060600007cb20

此时会调用ExecuteTask，里面会FinishTask()。

event->FinishTask();

而FinishTask表示我完成了当前event的事情，接着处理父节点，由于由两个依赖，所以不会进行Schedule()。

(lldb) p total_dependencies
(duckdb::idx_t) $129 = 2

只有当满足下面条件时才会Schedule()，所以这个线程任务完成了，只处理了PipelineInitializeTask。

if (current_finished == total_dependencies) {
  // all dependencies have been completed: schedule the event
  D_ASSERT(total_tasks == 0);
  Schedule();
  if (total_tasks == 0) {
    Finish();
  }
}

2.2.2 Thread7调度第二个PipelineInitializeTask

随后切到另外一个线程，此时处理的是右侧build端的pipeline，可以看到获取到的是队列当中的第二个PipelineInitializeTask，此时线程是7号线程。

* thread #7, stop reason = breakpoint 17.1
(lldb) f 2
frame #2: 0x00000001161d8910 duckdb`duckdb::PipelineInitializeTask::ExecuteTask(this=0x000060600007d300, mode=PROCESS_ALL) at pipeline_initialize_event.cpp:23:10
   20   public:
   21    TaskExecutionResult ExecuteTask(TaskExecutionMode mode) override {
   22     pipeline.ResetSink();
-> 23     event->FinishTask();
   24     return TaskExecutionResult::TASK_FINISHED;
   25    }
   26   };
(lldb) p this
(duckdb::PipelineInitializeTask *) $130 = 0x000060600007d300

此时取PipelineInitializeTask的父亲，也就是右侧build端PipelineEvent，由于只有一个依赖，直接调度了，此时会放入队列中，当前线程完成任务，继续等待。

* thread #7, stop reason = breakpoint 9.1
    frame #0: 0x00000001160be794 duckdb`duckdb::ConcurrentQueue::Enqueue(this=0x0000616000000680, token=0x00006070000414e0, task=std::__1::shared_ptr<duckdb::Task>::element_type @ 0x00006060000462e0 strong=1 weak=2) at task_scheduler.cpp:47:8
   44   
   45   void ConcurrentQueue::Enqueue(ProducerToken &token, shared_ptr<Task> task) {
   46    lock_guard<mutex> producer_lock(token.producer_lock);
-> 47    if (q.enqueue(token.token->queue_token, std::move(task))) {
   48     semaphore.signal();
   49    } else {
   50     throw InternalException("Could not schedule task!");

2.2.3 Thread8调度build端PipelineEvent

c之后，可以看到又切到了另一个线程，此时出队，拿到上一个入队的PipelineEvent。

* thread #8, stop reason = breakpoint 8.1
    frame #0: 0x00000001160c19cc duckdb`duckdb::TaskScheduler::ExecuteForever(this=0x000060c000000280, marker=0x00006020000040b0) at task_scheduler.cpp:135:26
   132    // wait for a signal with a timeout
   133    queue->semaphore.wait();
   134    if (queue->q.try_dequeue(task)) {
-> 135     auto execute_result = task->Execute(TaskExecutionMode::PROCESS_ALL);
   136  
   137     switch (execute_result) {
   138     case TaskExecutionResult::TASK_FINISHED:
(lldb) p task.get()
(duckdb::PipelineTask *) $132 = 0x00006060000462e0

可以对比这个地址与上述的线程7号放入的task地址一样。

那么接着调度，我们可以看到依次处理了PipelineFinishEvent->PipelineCompleteEvent->PipelineEvent(HashJoin对应event)。

(lldb) p this
(duckdb::PipelineFinishEvent *) $134 = 0x000060e00002fb58
(lldb) p this
(duckdb::PipelineCompleteEvent *) $135 = 0x000060d0000028f8
(lldb) p this
(duckdb::PipelineEvent *) $137 = 0x000060e00002f6f8

这两个Event的Schedule()时空实现，没有入队操作，所以直接调度即可，而PipelineEvent实现了Schedule()，所以会放入队列，可以看到进入了入队断点：

* thread #8, stop reason = breakpoint 9.1
    frame #0: 0x00000001160be794 duckdb`duckdb::ConcurrentQueue::Enqueue(this=0x0000616000000680, token=0x00006070000414e0, task=std::__1::shared_ptr<duckdb::Task>::element_type @ 0x0000606000080660 strong=1 weak=2) at task_scheduler.cpp:47:8
   44   
   45   void ConcurrentQueue::Enqueue(ProducerToken &token, shared_ptr<Task> task) {
   46    lock_guard<mutex> producer_lock(token.producer_lock);
-> 47    if (q.enqueue(token.token->queue_token, std::move(task))) {
   48     semaphore.signal();
   49    } else {
   50     throw InternalException("Could not schedule task!");

2.2.2 执行下一个Metapipeline

此时切回主线程thread 1，执行下一个MetaPipeline，可以看到Executor::ExecuteTask()的completed_pipelines已经为1，说明前面的ChildMetaPipeline已经finish了，并且从队列中拿到了上面的PipelineTask。

PendingExecutionResult Executor::ExecuteTask() {
 while (completed_pipelines < total_pipelines) {
  // there are! if we don't already have a task, fetch one
  if (!task) {
   scheduler.GetTaskFromProducer(*producer, task);  // 队列中获取任务
  }
  if (task) {
   auto result = task->Execute(TaskExecutionMode::PROCESS_PARTIAL);
  }
 }
 return execution_result;
}

debug信息：

(lldb) p completed_pipelines
(std::atomic<unsigned long long>) $142 = {
  Value = 1
}
(lldb) p total_pipelines
(duckdb::idx_t) $143 = 2
(lldb) p task
(std::shared_ptr<duckdb::Task>) $147 = std::__1::shared_ptr<duckdb::Task>::element_type @ 0x0000606000080660 strong=1 weak=3 {
  __ptr_ = 0x0000606000080660
}

接下来要做的就是调度另外一个MetaPipeline，依次分别是：PipelieEvent(probe端)->PipelineFinishEvent->PipelineCompleteEvent->PipelieEvent(hashjoin->project->result的event)。

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 9.1
    frame #0: 0x00000001160be794 duckdb`duckdb::ConcurrentQueue::Enqueue(this=0x0000616000000680, token=0x00006070000414e0, task=std::__1::shared_ptr<duckdb::Task>::element_type @ 0x000060600007e320 strong=1 weak=2) at task_scheduler.cpp:47:8
   44   
   45   void ConcurrentQueue::Enqueue(ProducerToken &token, shared_ptr<Task> task) {
   46    lock_guard<mutex> producer_lock(token.producer_lock);
-> 47    if (q.enqueue(token.token->queue_token, std::move(task))) {
   48     semaphore.signal();
   49    } else {
   50     throw InternalException("Could not schedule task!");

随后另一个线程继续出队列调度任务，依次在该线程上经历PipelieEvent->PipelineFinishEvent->PipelineCompleteEvent。

* thread #9, stop reason = breakpoint 8.1
    frame #0: 0x00000001160c19cc duckdb`duckdb::TaskScheduler::ExecuteForever(this=0x000060c000000280, marker=0x0000602000004130) at task_scheduler.cpp:135:26
   132    // wait for a signal with a timeout
   133    queue->semaphore.wait();
   134    if (queue->q.try_dequeue(task)) {
-> 135     auto execute_result = task->Execute(TaskExecutionMode::PROCESS_ALL);
   136  
   137     switch (execute_result) {
   138     case TaskExecutionResult::TASK_FINISHED:

最后这个线程完成继续等待信号。