GPORCA通过规则对表达式做转换,规则分为两类:Exploration,是对逻辑表达式做等价变换的;Implementation,是把逻辑操作符转换为物理操作符。Except the types of operator generated during Exploration (Logical => Logical) and Implementation (Logical => Physical) Phase, the process is similar. There are no Transform / CXform for Optimization Phase, so have excluded it from the below diagram. Will talk about Optimization phase a little later.
由于Exploration和Implementation流程一致,因此合在一起描述主要流程:(本篇文章仅关注第一步和第二步)
- Get the Group Expression to optimize
- Get the CXform Applicable for it
- Xform Pattern directs the Expected Child Group Expression to be extracted from the MEMO for that transform (CXform).
exploration xform pattern code snippet:
CXformExpandNAryJoinDPv2::CXformExpandNAryJoinDPv2(CMemoryPool *mp)
: CXformExploration(
// pattern
GPOS_NEW(mp) CExpression(mp, GPOS_NEW(mp) CLogicalNAryJoin(mp),
GPOS_NEW(mp) CExpression(mp, GPOS_NEW(mp) CPatternMultiLeaf(mp)),
GPOS_NEW(mp) CExpression(mp, GPOS_NEW(mp) CPatternTree(mp))))
{}
Above exploration pattern represents the below tree shape
+-CLogicalNAryJoin
|--CPatternMultiLeaf
+--CPatternTree
implementation xform pattern code snippet:
CXformImplementUnionAll::CXformImplementUnionAll(CMemoryPool *mp)
: // pattern
CXformImplementation(GPOS_NEW(mp) CExpression(
mp, GPOS_NEW(mp) CLogicalUnionAll(mp),
GPOS_NEW(mp) CExpression(mp, GPOS_NEW(mp) CPatternMultiLeaf(mp))))
{}
Above implementation pattern represents the below tree shape
+--CLogicalUnionAll
+--CPatternMultiLeaf
- Construct the expression tree by pulling out Group Expression from the MEMO matching the pattern. If there is no match, that’s fine, it just means that the transform is not applicable in this case
- Apply the ::Transform method of the xform to produce the resulting expressions
- Insert the resulting expression back to the MEMO
- Newly inserted expressions will be eligible to be pulled if they have PxfsCandidates to be applied, and will go through Step 1 -> 6
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 6.Insert the resulting Expressions Back to the MEMO │
│ │
│ │
▼ │
│
┌───────────────────────────────┐ │
│MEMO │ │
│ │ │
│ │ │
│ ROOT │ 4.Extract the Group Expressions from the MEMO based on │
│ ┌───────────────────────────┐ │ the Xform Pattern │
│ │ Group N │ │ ◄─────────────────────────────────────────────────────────────────────┐ │
│ │ Group Expression 1[x x] │ │ │ │
│ │ ├─┼──────┐ │ │
│ └───────────────────────────┘ │ │ │ │
│ . │ │ │ │
│ . │ │ │ │
│ . │ │ │ Possible Number of Outputs >= 0 │
│ ┌───────────────────────────┐ │ │1. Get the Group Expression │ │
│ │ Group 2 │ │ │ │ │
│ │ Group Expression 2[x x x] │ │ │ │ ┌─────────────────────────────────────────────┐ │
│ └───────────────────────────┘ │ ▼ │ ┌──►│ Output Logical / Physical Expression Tree 1 ├──┤
│ │ ┌──────────────────────────────────┐ ┌──────────────────────────┴────────────────────────────┐ ┌───────────────┐ │ └─────────────────────────────────────────────┘ │
│ ┌───────────────────────────┐ │ │ │ │ Xform Pattern │ │4.Constructed │ │ │
│ │ Group 1 │ │ │ Group Expression 2 - Xform 1 ├─────►│ Group Expression [x x x] C<Operator><Identifier> │ │ Logical │ ┌────────┴───────────┐ │
│ │ Group Expression 3[x x x] │ │ │ │ │ +-CPatternLeaf ├──►│ or ├────►│ 5.Apply Xform │ │
│ │ │ │ │ Group Expression 2 - Xform 2 │ │ +-CPatternLeaf │ │ Physical │ └────────┬───────────┘ │
│ └───────────────────────────┘ │ │ │ │ +-CPatternTree │ │ Expression │ │ │
│ │ │ Group Expression 2 - Xform 3 │ │3.Take the 1st GE - Xform combination and lookup the │ │ │ │ ┌─────────────────────────────────────────────┐ │
└───────────────────────────────┘ │ (All Combination for GE ) │ │ child GEs from MEMO based on the Xform Pattern │ │ │ └──►│ Output Logical / Physical Expression Tree 2 ├──┘
└──────────────────────────────────┘ └──────────────┬────────────────────────────────────────┘ └───────────────┘ └─────────────────────────────────────────────┘
┌───────────────────────────────┐ ▲ ▲ │
│ Xform PxfsCandidates for │ │ │ │
│ operator of Group Expression │ │ │ │
│ │ │2. Get Exploration │ │
│ Exploration: │ │ or │ │
│ Xform 1 │ │ Implementation └──────────────────────┘
│ Xform 2 │ │ Xform 7. Take the next combination
│ Xform 3 │ │ for the Group
│ │ │ Expression
│ Implementation: │ ◄────┘
│ Xform 4 │
│ Xform 5 │
│ Xform n │
│ │
└───────────────────────────────┘
CJobGroupExpressionImplementation和CJobGroupExpressionExploration
如上图所示导出红线的虚线直角框图就是CJobGroupExpressionImplementation和CJobGroupExpressionExploration任务进行Apply CXform(Get the Group Expression to optimize和Get the CXform Applicable for it)的流程。获取Group Expression就无需解释,其实就是这些任务中的m_pgexpr成员。而Get the CXform Applicable for it流程中,首先需要获取Group Expression中的操作副的规则位图 【每个操作符,都存有自己需要的规则ID构成的规则的子集,这里面包括了逻辑变换和实现变换的规则】,如下代码中的PxfsCandidates就是获取操作符的规则位图,从下图中的CLogicalExternalGet代码中可以看出其规则位图为xform_set->ExchangeSet(cxform::ExfExternalGet2ExternalScan)
;然后需要和逻辑变换集合或实现变换集合取与【Exploration阶段,拿操作符的规则集和逻辑变换集合取与,得到操作符这个阶段需要的规则;Implementation阶段,拿操作符的规则集和实现变换集合取与,得到操作符这个阶段需要的规则集】,如下图代码CXformFactory::Pxff()->PxfsExploration()
和CXformFactory::Pxff()->PxfsImplementation()
就是获取CXformFactory中的m_pxfsExploration和m_pxfsImplementation,它们分别为bitset of exploration xforms和bitset of implementation xforms。
void CJobGroupExpressionExploration::ScheduleApplicableTransformations(CSchedulerContext *psc) {
// get all applicable xforms
COperator *pop = m_pgexpr->Pop();
CXformSet *xform_set = CLogical::PopConvert(pop)->PxfsCandidates(psc->GetGlobalMemoryPool());
// intersect them with required xforms and schedule jobs
xform_set->Intersection(CXformFactory::Pxff()->PxfsExploration());
xform_set->Intersection(psc->Peng()->PxfsCurrentStage());
ScheduleTransformations(psc, xform_set);
xform_set->Release();
SetXformsScheduled();
}
void CJobGroupExpressionImplementation::ScheduleApplicableTransformations(CSchedulerContext *psc){
// get all applicable xforms
COperator *pop = m_pgexpr->Pop();
CXformSet *xform_set = CLogical::PopConvert(pop)->PxfsCandidates(psc->GetGlobalMemoryPool());
// intersect them with required xforms and schedule jobs
xform_set->Intersection(CXformFactory::Pxff()->PxfsImplementation());
xform_set->Intersection(psc->Peng()->PxfsCurrentStage());
ScheduleTransformations(psc, xform_set);
xform_set->Release();
SetXformsScheduled();
}
总结概述一下就是:Exploration阶段,拿操作符的规则集和逻辑变换集合取与,得到操作符这个阶段需要的规则;Implementation阶段,拿操作符的规则集和实现变换集合取与,得到操作符这个阶段需要的规则集。
GPORCA通过规则工程管理规则集,包括Exploration和Implementation。上图中,0和1有8列,表示规则集。为了简单,我们假设CLogicalGet需要的规则ID是4。LogicalGet所在的行,左边的规则集记录了LogicalGet需要的规则。可以看到第四位被设置为1,可以知道LogicalGet需要ID为4的规则进行转换。Exploration所在的行的规则集,表示所有的规则的集合。5,6,7,8位被设置为1,所以Exploration的规则集是ID为{5,6,7,8}的规则集。同样的,Implementation的规则集为{1,2,3,4}
如下代码即是从上述流程中获取交集的xform规则(即是CXform类的子类)集合中遍历CXform类,为表达式m_pgexpr应用对应的xform规则创建CJobTransformation任务。
//---------------------------------------------------------------------------
// @function:
// CJobGroupExpression::ScheduleTransformations
// @doc:
// Schedule transformation jobs for the given set of xforms
//---------------------------------------------------------------------------
void CJobGroupExpression::ScheduleTransformations(CSchedulerContext *psc, CXformSet *xform_set) {
CXformSetIter xsi(*(xform_set)); // iterate on xforms
while (xsi.Advance()){
CXform *pxform = CXformFactory::Pxff()->Pxf(xsi.TBit());
CJobTransformation::ScheduleJob(psc, m_pgexpr, pxform, this);
}
}
CJobTransformation
如下为CJobTransformation任务实际执行的函数,其从类成员中获取到表达式和规则,并调用表达式的Transform方法实现【Xform Pattern directs the Expected Child Group Expression to be extracted from the MEMO for that transform (CXform);Construct the expression tree by pulling out Group Expression from the MEMO matching the pattern. If there is no match, that’s fine, it just means that the transform is not applicable in this case;Apply the ::Transform method of the xform to produce the resulting expressions】,最终Insert the resulting expression back to the MEMO。
CJobTransformation::EEvent CJobTransformation::EevtTransform(CSchedulerContext *psc, CJob *pjOwner) {
CJobTransformation *pjt = PjConvert(pjOwner); // get a job pointer
CMemoryPool *pmpGlobal = psc->GetGlobalMemoryPool(); CMemoryPool *pmpLocal = psc->PmpLocal();
CGroupExpression *pgexpr = pjt->m_pgexpr; CXform *pxform = pjt->m_xform;
ULONG ulElapsedTime = 0; ULONG ulNumberOfBindings = 0;
// insert transformation results to memo
CXformResult *pxfres = GPOS_NEW(pmpGlobal) CXformResult(pmpGlobal);
pgexpr->Transform(pmpGlobal, pmpLocal, pxform, pxfres, &ulElapsedTime, &ulNumberOfBindings);
psc->Peng()->InsertXformResult(pgexpr->Pgroup(), pxfres, pxform->Exfid(), pgexpr, ulElapsedTime, ulNumberOfBindings);
pxfres->Release();
return eevCompleted;
}
https://github.com/greenplum-db/gpdb/wiki
https://mp.weixin.qq.com/s?__biz=MzA4NjQzNjUyOA==&mid=2650571780&idx=1&sn=1b14e76622cb70b6dd008b4927df8c5f&chksm=87c09bdbb0b712cd8228fab3f2f44fa2aecab0ece6f5d9cdb199efb1ca34bdce58d8f46b92bd&scene=27