Orca是Pivotal数据管理产品的新查询优化器,包括GPDB和HAWQ。Orca是一个基于Cascades操作时序框架的现代自上而下的查询优化器。虽然许多Cascades优化器与其主机系统紧密耦合,但Orca的一个独特功能是它能够作为独立的优化器在数据库系统之外运行。这种能力对于使用一个优化器支持具有不同计算架构(例如MPP和Hadoop)的产品至关重要。它还允许在Hadoop等新的查询处理范式中利用关系优化的广泛遗留问题。此外,将优化器作为一个独立的产品运行,可以在不经过数据库系统的整体结构的情况下进行精细的测试。Orca is the new query optimizer for Pivotal data management products, including GPDB and HAWQ. Orca is a modern top-down query optimizer based on the Cascades optimization framework. While many Cascades optimizers are tightly-coupled with their host systems, a unique feature of Orca is its ability to run outside the database system as a stand-alone optimizer. This ability is crucial to supporting products with different computing architectures (e.g., MPP and Hadoop) using one optimizer. It also allows leveraging the extensive legacy of relational optimization in new query processing paradigms like Hadoop. Furthermore, running the optimizer as a stand-alone product enables elaborate testing without going through the monolithic structure of a database system.
将优化器与数据库系统解耦需要构建一个处理查询的通信机制。Orca包括一个用于在优化器和数据库系统之间交换信息的框架,称为数据交换语言(DXL)。该框架使用基于XML的语言对必要的信息进行编码。Decoupling the optimizer from the database system requires building a communication mechanism to process queries. Orca includes a framework for exchanging information between the optimizer and the database system called Data eXchange Language (DXL). The framework uses an XML-based language to encode the necessary information
用于通信,例如输入查询、输出计划和元数据。DXL之上是一个简单的通信协议,用于发送初始查询结构和检索优化的计划。DXL的一个主要好处是将Orca打包为一个独立的产品。图2显示了Orca和外部数据库系统之间的交互。Orca的输入是一个DXL查询。奥卡的输出是一个DXL计划。在优化期间,可以向数据库系统查询元数据(例如,表定义)。Orca通过允许数据库系统注册元数据提供者(MD提供者)来抽象元数据访问细节,该提供者负责在将元数据发送到Orca之前将元数据序列化到DXL中。元数据也可以从包含以DXL格式序列化的元数据对象的常规文件中使用。for communication, such as input queries, output plans and metadata. Overlaid on DXL is a simple communication protocol to send the initial query structure and retrieve the optimized plan. A major benefit of DXL is packaging Orca as a stand-alone product. Figure 2 shows the interaction between Orca and an external database system. The input to Orca is a DXL query. The output of Orca is a DXL plan. During optimization, the database system can be queried for metadata (e.g., table definitions). Orca abstracts metadata access details by allowing database system to register a metadata provider (MD Provider) that is responsible for serializing metadata into DXL before being sent to Orca. Metadata can also be consumed from regular files containing metadata objects serialized in DXL format.
数据库系统需要包括使用/发出DXL格式数据的翻译器。Query2DXL翻译器将查询解析树转换为DXL查询,而DXL2Plan翻译器将DXL计划转换为可执行计划。这种翻译器的实现完全在Orca之外完成,这允许多个系统通过提供适当的翻译器来使用Orca。Orca的体系结构具有高度的可扩展性;所有组件都可以单独更换和单独配置。图3显示了奥卡的不同组成部分。我们将这些组件简要描述如下。The database system needs to include translators that consume/emit data in DXL format. Query2DXL translator converts a query parse tree into a DXL query, while DXL2Plan translator converts a DXL plan into an executable plan. The implementation of such translators is done completely outside Orca, which allows multiple systems to use Orca by providing the appropriate translators. The architecture of Orca is highly extensible; all components can be replaced individually and configured separately. Figure 3 shows the different components of Orca. We briefly describe these components as follows.
CTranslatorQueryToDXL
QueryToDXL的主要调用流程在OptimizeTask函数中,主要功能由CTranslatorQueryToDXL类完成,QueryToDXLInstance是CTranslatorQueryToDXL类的工厂函数。CTranslatorQueryToDXL类依赖于元数据访问接口mda和Query查询树执行构造函数,并通过TranslateQueryToDXL这个主要函数进行转换动作的执行。
CTranslatorQueryToDXL::QueryToDXLInstance作为静态工厂函数,用于Creates a new CTranslatorQueryToDXL object for translating the given top-level query. 注意这里用到了CContextQueryToDXL类。
src\backend\gpopt\translate\CTranslatorQueryToDXL.cpp CTranslatorQueryToDXL类的实现
- CTranslatorQueryToDXL.h涉及到的文件CContextQueryToDXL.h + CMappingVarColId.h + CTranslatorScalarToDXL.h + CTranslatorUtils.h + CDXLNode.h
- CTranslatorQueryToDXL.cpp涉及到的文件CCTEListEntry.h + CQueryMutators.h + CTranslatorDXLToPlStmt.h + CTranslatorRelcacheToDXL.h + CDXLDatumInt8.h + CDXLScalarBooleanTest.h + dxlops.h + dxltokens.h + CMDIdGPDBCtas.h + CMDTypeBoolGPDB.h + IMDAggregate.h + IMDScalarOp.h + IMDTypeBool.h + IMDTypeInt8.h。其重要成员如下所示
CTranslatorScalarToDXL *m_scalar_translator; // scalar translator used to convert scalar operation into DXL.
CMappingVarColId *m_var_to_colid_map; // holds the var to col id information mapping
HMUlCTEListEntry *m_query_level_to_cte_map; // hash map that maintains the list of CTEs defined at a particular query level key: query level value: the list of CTE
CDXLNodeArray *m_dxl_cte_producers; // list of CTE producers
UlongBoolHashMap *m_cteid_at_current_query_level_map; // CTE producer IDs defined at the current query level
CTranslatorQueryToDXL::CTranslatorQueryToDXL(CContextQueryToDXL *context, CMDAccessor *md_accessor, const CMappingVarColId *var_colid_mapping, Query *query, ULONG query_level, BOOL is_top_query_dml, HMUlCTEListEntry *query_level_to_cte_map)
- CheckSupportedCmdType(query) CheckRangeTable(query) WITH CHECK OPTION views are not supported yet
- 如果var_colid_mapping不为null,将var_colid_mapping拷贝为m_var_to_colid_map;否则就直接初始化新的
- 如果query_level_to_cte_map不为null,按照cte query level逐层将小于当前query level外层的cte list插入m_query_level_to_cte_map,保证当前层的query只能看到外层定义的cte
- CheckUnsupportedNodeTypes(query) 检查查询树中是否有不支持的结点类型 CheckSirvFuncsWithoutFromClause(query) check if the query has SIRV functions in the targetlist without a FROM clause
- first normalize the query m_query = CQueryMutators::NormalizeQuery(m_mp, m_md_accessor, query, query_level)
- 如果m_query->cteList不为空 ConstructCTEProducerList(m_query->cteList, query_level)
- m_scalar_translator = GPOS_NEW(m_mp)CTranslatorScalarToDXL(m_context, m_md_accessor, m_query_level, m_query_level_to_cte_map, m_dxl_cte_producers)
TranslateQueryToDXL main driver函数,以TranslateSelectQueryToDXL函数为例描述其流程
TranslateSelectQueryToDXL函数Translates a Query into a DXL tree. The function allocates memory in the translator memory pool, and caller is responsible for freeing it.
- CTranslatorUtils::CheckRTEPremissions(m_query->rtable)
- construct CTEAnchor operators for the CTEs defined at the top level
CDXLNode *dxl_cte_anchor_top = NULL; CDXLNode *dxl_cte_anchor_bottom = NULL; ConstructCTEAnchors(m_dxl_cte_producers, &dxl_cte_anchor_top, &dxl_cte_anchor_bottom);
- 如果m_query->setOperations不为null,说明是union等操作
child_dxlnode = TranslateSetOpToDXL(m_query->setOperations, m_query->targetList, output_attno_to_colid_mapping)
CDXLLogicalSetOp *dxlop = CDXLLogicalSetOp::Cast(child_dxlnode->GetOperator());
const CDXLColDescrArray *dxl_col_descr_array = dxlop->GetDXLColumnDescrArray();
ForEach(lc, target_list) {
TargetEntry *target_entry = (TargetEntry *) lfirst(lc);
if (0 < target_entry->ressortgroupref) {
ULONG colid = ((*dxl_col_descr_array)[resno - 1])->Id();
AddSortingGroupingColumn( target_entry, sort_group_attno_to_colid_mapping, colid);
}
resno++;
}
如果m_query->windowClause不为null
CDXLNode *dxlnode = TranslateFromExprToDXL(m_query->jointree)
child_dxlnode = TranslateWindowToDXL(dxlnode, m_query->targetList, m_query->windowClause, m_query->sortClause, sort_group_attno_to_colid_mapping, output_attno_to_colid_mapping)
其他情况 child_dxlnode = TranslateGroupingSets(m_query->jointree, m_query->targetList, m_query->groupClause,m_query->hasAggs, sort_group_attno_to_colid_mapping,output_attno_to_colid_mapping); - translate limit clause CDXLNode *limit_dxlnode = TranslateLimitToDXLGroupBy(m_query->sortClause, m_query->limitCount, m_query->limitOffset, child_dxlnode, sort_group_attno_to_colid_mapping);
- 如果m_query->target不为NULL,需要为m_dxl_query_output_cols调用CreateDXLOutputCols(m_query->targetList, output_attno_to_colid_mapping)创建
- result_dxlnode = limit_dxlnode
- 如果dxl_cte_anchor_top不为NULL,需要加入CTE anchors. dxl_cte_anchor_bottom->AddChild(result_dxlnode); result_dxlnode = dxl_cte_anchor_top;
CDXLLogical
CDXLNode类所拥有的重要成员有4个(目前仅介绍两个),m_dxl_op是CDXLOperator类型的变量,在QueryToDXL流程中,其代表的是CDXLOperator的子类CDXLLogical和CDXLScalar;m_dxl_array是CDXLOperator类型Array,用于存放所属该节点的子节点,也是CDXLOperator类型的变量(CDXLLogical和CDXLScalar)。ORCA中目前支持的CDXLLogical子类如下所示。
以TranslateRTEToDXLLogicalGet【Returns a CDXLNode representing a from relation range table entry】为例,说明一下Query树子节点转换为DXL节点的流程。首先介绍一下RangeTblEntry节点:A range table entry may represent a plain relation, a sub-select in FROM, or the result of a JOIN clause. (Only explicit JOIN syntax produces an RTE, not the implicit join resulting from multiple FROM items. This is because we only need the RTE to deal with SQL features like outer joins and join-output-column aliasing.) Other special RTE types also exist, as indicated by RTEKind 【 RTE_RELATION(ordinary relation reference), RTE_SUBQUERY(subquery in FROM), RTE_JOIN(join), RTE_FUNCTION(function in FROM), RTE_VALUES(VALUES (<exprlist>), (<exprlist>), ...)
, RTE_VOID(CDB: deleted RTE)
, RTE_CTE(common table expr (WITH list element))
, RTE_TABLEFUNCTION(CDB: Functions over multiset input )
】。TranslateRTEToDXLLogicalGet函数只关注于处理RTE_RELATION(ordinary relation reference)类型的RangeTblEntry节点。
- 首先为range table entry的节点构造table descriptor
- 通过元数据访问接口为table descriptor获取IMDRelation元数据对象md_rel
- 通过md_rel元数据对象的存储类型,创建不同的CDXLLogical:为外部表创建CDXLLogicalExternalGet,其他表创建CDXLLogicalGet
- 创建CDXLNode结构体,并将第3步创建的dxl_op对象赋值给m_dxl_op成员
- 向CTranslatorQueryToDXL.m_var_to_colid_map中记录该表的列信息
- make note of the operator classes used in the distribution key
CDXLNode *CTranslatorQueryToDXL::TranslateRTEToDXLLogicalGet(const RangeTblEntry *rte, ULONG rt_index, ULONG //current_query_level) {
if (false == rte->inh){
GPOS_ASSERT(RTE_RELATION == rte->rtekind);
// RangeTblEntry::inh is set to false iff there is ONLY in the FROM clause. c.f. transformTableEntry, called from transformFromClauseItem
GPOS_RAISE(gpdxl::ExmaDXL, gpdxl::ExmiQuery2DXLUnsupportedFeature,GPOS_WSZ_LIT("ONLY in the FROM clause"));
}
// construct table descriptor for the scan node from the range table entry
CDXLTableDescr *dxl_table_descr = CTranslatorUtils::GetTableDescr(m_mp, m_md_accessor, m_context->m_colid_counter, rte, &m_context->m_has_distributed_tables);
CDXLLogicalGet *dxl_op = NULL;
const IMDRelation *md_rel = m_md_accessor->RetrieveRel(dxl_table_descr->MDId());
if (IMDRelation::ErelstorageExternal == md_rel->RetrieveRelStorageType()){
dxl_op = GPOS_NEW(m_mp) CDXLLogicalExternalGet(m_mp, dxl_table_descr);
}else{
dxl_op = GPOS_NEW(m_mp) CDXLLogicalGet(m_mp, dxl_table_descr);
}
CDXLNode *dxl_node = GPOS_NEW(m_mp) CDXLNode(m_mp, dxl_op);
// make note of new columns from base relation
m_var_to_colid_map->LoadTblColumns(m_query_level, rt_index, dxl_table_descr);
// make note of the operator classes used in the distribution key
NoteDistributionPolicyOpclasses(rte);
return dxl_node;
}