Greenplum数据库优化器——新Path类型CdbMotionPath

news2024/11/24 12:47:47

Path表示了一种可能的计算路径(比如顺序扫描或哈希关联),更复杂的路径会继承Path结构体并记录更多信息以用于优化。Greenplum为Path结构体(src/include/nodes/relation.h/Path)新加CdbPathLocus locus字段,用于表示结果元组在当前路径下的重分布和执行策略。Greenplum中表的分布键决定了元组存储时的分布情况,影响元组在那个segment的磁盘上的存储。CdbPathLocus决定了在执行时一个元组(元组可能来自表,也可能来自函数)在不同的进程间(不同segment的QE)的重分布情况,即一个元组该被哪个进程处理。
在这里插入图片描述

typedef enum CdbLocusType {
    CdbLocusType_Null,          // 不用Locus
    CdbLocusType_Entry,         /* a single backend process on the entry db: usually the qDisp itself, but could be a qExec started by the entry postmaster. */ // 表示entry db(即master)上单个backend进程,可以是QD(Query Dispatcher),也可以是entry上的QE(Query Executor)
    CdbLocusType_SingleQE,      /* a single backend process on any db: the qDisp itself, or a qExec started by a segment postmaster or the entry postmaster. */ // 任何节点上的单个backend进程,可以是QD或任意QE进程
    CdbLocusType_General,       /* compatible with any locus (data is self-contained in the query plan or generally available in any qExec or qDisp) */ // 和任何locus都兼容
    CdbLocusType_SegmentGeneral,/* generally available in any qExec, but not available in qDisp */ 
    CdbLocusType_Replicated,    /* replicated over all qExecs of an N-gang */ // 在所有QEs都有副本 
    CdbLocusType_Hashed,        /* hash partitioned over all qExecs of N-gang */ // 哈希分布到所有QEs
    CdbLocusType_HashedOJ,      /* result of hash partitioned outer join, NULLs can be anywhere */
    CdbLocusType_Strewn,        /* partitioned on no known function */ // 数据分布存储,但是分布键未知
    CdbLocusType_End            /* = last valid CdbLocusType + 1 */
} CdbLocusType;

CdbMotionPath

Greenplum引入了一个新路径CdbMotionPath,用于表示子路径的结果如何从发送方进程传送给接收方进程。优化器使用locus的类型选择最合适的结点进行最合适的数据流向处理,确定合适Motion。

cdbpath_create_motion_path函数Returns a Path that delivers the subpath result to the given locus, or NULL if it can’t be done. ‘pathkeys’ must specify an ordering equal to or weaker than the subpath’s existing ordering. If no motion is needed, the caller’s subpath is returned unchanged. Else if require_existing_order is true, NULL is returned if the motion would not preserve an ordering at least as strong as the specified ordering; also NULL is returned if pathkeys is NIL meaning the caller is just checking and doesn’t want to add motion. Else a CdbMotionPath is returned having either the specified pathkeys (if given and the motion uses Merge Receive), or the pathkeys of the original subpath (if the motion is order-preserving), or no pathkeys otherwise (the usual case).

cdbpath_create_motion_path主要接受两个参数subpath和locus,根据subpath的CdbPathLocus locus字段(表示结果元组在当前路径下的重分布和执行策略)和目标locus,添加相应的motion节点,以更改数据流向。其主要流程如下所示:

  • 如果target locus和subpath->locus相同,说明无需添加motion节点
  • 确定target locus和subpath->locus所包含的segment数量,以其中segment数量最少的为准
  • 如果target locus是segment或master上的singleton QE或QD进程【a singleton qExec (1-gang) on a segment db or the entry db, or the qDisp process,CdbLocusType_Entry表示entry db(即master)上单个backend进程,可以是QD(Query Dispatcher),也可以是entry上的QE(Query Executor),CdbLocusType_SingleQE任何节点上的单个backend进程,可以是QD或任意QE进程】: 1. entry–>entry不需要motion 2. singleQE–>singleQE不需要motion 3. entry–>singleQE不需要motion(slice’s QE再master上运行) 4. singleQE–>entry创建将任何节点上的单个backend进程的数据流向更改到master单个backend进程(singleton gang)的CdbMotionPath节点(只使用这个节点来表示路径具有’Entry’ locus;将不会在“计划”树中创建相应的“运动”节点,也就是目前没有这种数据流向)5. SegmentGeneral–>entry/singleQE 源locus为master/segment上的QE(generally available in any qExec, but not available in qDisp),为这种情况创建motion。Data is only available on segments, to distingush it with CdbLocusType_General, adding a motion to indicated this slice must be executed on a singleton gang.This motion may be redundant for segmentGeneral --> singleQE if the singleQE is not promoted to executed on qDisp in the end, so in apply_motion_mutator(), we will omit it.数据仅在segment上可用,为了用CdbLocusType_General来区分它,添加一个动作来指示该切片必须在单个singleton gang上执行。如果singleQE最终没有被提升为在qDisp上执行,那么这个动作对于segmentGeneral–>singleQE来说可能是多余的,所以在apply_motion_mutator中,我们将省略它。6. Generall–>entry/singleQE 无需添加morion,可以再任何segment上直接运行(general兼容任何locus)。7. replicated–>singleton/entry 非法 8. Hashed/HashedOJ/Strewn–>singleton/entry 非法 源locus的策略为哈希分布到所有QEs(hash partitioned over all qExecs of N-gang)、哈希分布outer join的结果null分布到任何segment(result of hash partitioned outer join, NULLs can be anywhere)、数据分布存储,但是分布键未知(partitioned on no known function),为这种情况创建motion。
  • 如果subpath->locus是segment或master上的singleton QE或QD进程【a singleton qExec (1-gang) on a segment db or the entry db, or the qDisp process,CdbLocusType_Entry表示entry db(即master)上单个backend进程,可以是QD(Query Dispatcher),也可以是entry上的QE(Query Executor),CdbLocusType_SingleQE任何节点上的单个backend进程,可以是QD或任意QE进程】: entry/singleQE–>partitioned或entry/singleQE–>replicated才合法,为这种情况创建motion。
  • 如果subpath->locus是Hashed/HashedOJ/Strewn【哈希分布到所有QEs(hash partitioned over all qExecs of N-gang)、哈希分布outer join的结果null分布到任何segment(result of hash partitioned outer join, NULLs can be anywhere)、数据分布存储,但是分布键未知(partitioned on no known function)】:1. 如果目标locus也是Hashed/HashedOJ/Strewn,即partitioned–>partitioned,如果目标locus和源locus相等,则无需创建motion,否则为这种情况创建motion。2. 如果目标locus是replicated,即partitioned–>replicated为这种情况创建motion。其他情况为非法。
  • 如果subpath->locus是General和任何locus都兼容:1. 如果源和目标locus满足general–>general或general–>replicated或general–>segmentGeneral(master/segment上的QE(generally available in any qExec, but not available in qDisp)),则无需创建motion。2. 如果目标locus是Hashed/HashedOJ/Strewn【哈希分布到所有QEs(hash partitioned over all qExecs of N-gang)、哈希分布outer join的结果null分布到任何segment(result of hash partitioned outer join, NULLs can be anywhere)、数据分布存储,但是分布键未知(partitioned on no known function)】,即general–>partitioned,为这种情况创建motion。其他情况为非法。
  • 如果subpath->locus是replicated:如果目标locus也是replicated,即replicated–>replicated,无需创建motion。其他情况为非法。
  • 如果subpath->locus是SegmentGeneral(master/segment上的QE(generally available in any qExec, but not available in qDisp)):如果目标locus是Hashed哈希分布到所有QEs(hash partitioned over all qExecs of N-gang),目标locus的segment数量大于subpath->locus,In such a case we redistribute SegmentGeneral to Hashed,为这种情况创建motion。2. 如果目标locus是replicated,且不是UPDATE/DELETE复制表,则为这种情况创建motion。3. 如果目标locus是SegmentGeneral,为这种情况创建motion。其他情况为非法。
Path *cdbpath_create_motion_path(PlannerInfo *root, Path *subpath, List *pathkeys, bool require_existing_order, CdbPathLocus locus){			
	if (cdbpathlocus_equal(subpath->locus, locus)) return subpath; /* Motion is to change path's locus, if target locus is the same as the subpath's, there is no need to add motion. */
	
	CdbMotionPath *pathnode; int numsegments = CdbPathLocus_CommonSegments(subpath->locus, locus);
	
	if (CdbPathLocus_IsBottleneck(locus)){ /* Moving subpath output to a single executor process (qDisp or qExec)? */	
		
		if (CdbPathLocus_IsEntry(subpath->locus) && CdbPathLocus_IsEntry(locus)) { /* entry-->entry?  No motion needed. */
			subpath->locus.numsegments = getgpsegmentCount(); return subpath;
		}		
		
		if (CdbPathLocus_IsSingleQE(subpath->locus) && CdbPathLocus_IsSingleQE(locus)) { /* singleQE-->singleQE?  No motion needed. */
			subpath->locus.numsegments = numsegments; return subpath;
		}		
		
		if (CdbPathLocus_IsEntry(subpath->locus)) { /* entry-->singleQE?  Don't move.  Slice's QE will run on entry db. */
			subpath->locus.numsegments = numsegments; return subpath;
		}		
		
		if (CdbPathLocus_IsSingleQE(subpath->locus)) { /* singleQE-->entry?  Don't move.  Slice's QE will run on entry db. */			
			if (!bms_is_empty(PATH_REQ_OUTER(subpath))) return NULL; /* If the subpath requires parameters, we cannot generate Motion atop of it. */
			/* Create CdbMotionPath node to indicate that the slice must be dispatched to a singleton gang running on the entry db.  We merely use this node to note that the path has 'Entry' locus; no corresponding Motion node will be created in the Plan tree. */
			pathnode = makeNode(CdbMotionPath); pathnode->path.pathtype = T_Motion;
			pathnode->path.parent = subpath->parent; pathnode->path.locus = locus; pathnode->path.pathkeys = pathkeys; pathnode->subpath = subpath;
            pathnode->path.rows = subpath->rows; pathnode->path.startup_cost = subpath->total_cost; pathnode->path.total_cost = subpath->total_cost;
			pathnode->path.memory = subpath->memory; pathnode->path.motionHazard = subpath->motionHazard;			
			pathnode->path.rescannable = false; /* Motion nodes are never rescannable. */
			return (Path *) pathnode;
		}

		if (CdbPathLocus_IsSegmentGeneral(subpath->locus)) {			
			if (!bms_is_empty(PATH_REQ_OUTER(subpath))) return NULL; /* If the subpath requires parameters, we cannot generate Motion atop of it. */
			/* Data is only available on segments, to distingush it with CdbLocusType_General, adding a motion to indicated this slice must be executed on a singleton gang. This motion may be redundant for segmentGeneral --> singleQE if the singleQE is not promoted to executed on qDisp in the end, so in apply_motion_mutator(), we will omit it. */
			pathnode = makeNode(CdbMotionPath); pathnode->path.pathtype = T_Motion;
			pathnode->path.parent = subpath->parent; pathnode->path.locus = locus; pathnode->path.pathkeys = pathkeys; pathnode->subpath = subpath;
            pathnode->path.rows = subpath->rows; pathnode->path.startup_cost = subpath->total_cost; pathnode->path.total_cost = subpath->total_cost;
			pathnode->path.memory = subpath->memory; pathnode->path.motionHazard = subpath->motionHazard;
			pathnode->path.rescannable = false; /* Motion nodes are never rescannable. */
			return (Path *) pathnode;
		}
		
		if (CdbPathLocus_IsGeneral(subpath->locus)){ /* No motion needed if subpath can run anywhere giving same output. */
			/* general-->(entry|singleqe), no motion is needed, can run directly on any of the common segments */
			subpath->locus.numsegments = numsegments; return subpath;
		}
	
		if (require_existing_order && !pathkeys) return NULL; /* Fail if caller refuses motion. */		
		if (CdbPathLocus_IsReplicated(subpath->locus)) /* replicated-->singleton would give redundant copies of the rows. */ goto invalid_motion_request;

		/* Must be partitioned-->singleton. If caller gave pathkeys, they'll be used for Merge Receive. If no pathkeys, Union Receive will arbitrarily interleave the rows from the subpath partitions in no special order. */
		if (!CdbPathLocus_IsPartitioned(subpath->locus)) goto invalid_motion_request;
	}

	
	else if (CdbPathLocus_IsBottleneck(subpath->locus)) { /* Output from a single process to be distributed over a gang? */ /* Must be bottleneck-->partitioned or bottleneck-->replicated */	
		if (!CdbPathLocus_IsPartitioned(locus) && !CdbPathLocus_IsReplicated(locus)) goto invalid_motion_request;		
		if (require_existing_order && !pathkeys) return NULL; /* Fail if caller disallows motion. */		
		pathkeys = subpath->pathkeys; /* Each qExec receives a subset of the rows, with ordering preserved. */
	}
	
	else if (CdbPathLocus_IsPartitioned(subpath->locus)) { /* Redistributing partitioned subpath output from one gang to another? */		
		if (CdbPathLocus_IsPartitioned(locus)) { /* partitioned-->partitioned? */			
			if (cdbpathlocus_equal(subpath->locus, locus)) return subpath; /* No motion if subpath partitioning matches caller's request. */
		}
		
		else if (!CdbPathLocus_IsReplicated(locus)) goto invalid_motion_request; /* Must be partitioned-->replicated */
		
		if (require_existing_order) return NULL; /* Fail if caller insists on ordered result or no motion. */		
		pathkeys = NIL; /* Output streams lose any ordering they had. Only a qDisp or singleton qExec can merge sorted streams (for now). */
	}

	
	else if (CdbPathLocus_IsGeneral(subpath->locus)) { /* If subplan uses no tables, it can run on qDisp or a singleton qExec. */
		/*
		 * No motion needed if general-->general or general-->replicated or
		 * general-->segmentGeneral
		 */
		if (CdbPathLocus_IsGeneral(locus) ||
			CdbPathLocus_IsReplicated(locus) ||
			CdbPathLocus_IsSegmentGeneral(locus))
		{
			subpath->locus.numsegments = numsegments;
			return subpath;
		}

		/* Must be general-->partitioned. */
		if (!CdbPathLocus_IsPartitioned(locus))
			goto invalid_motion_request;

		/* Fail if caller wants no motion. */
		if (require_existing_order &&
			!pathkeys)
			return NULL;

		/* Since the motion is 1-to-many, the rows remain in the same order. */
		pathkeys = subpath->pathkeys;
	}

	/* Does subpath produce same multiset of rows on every qExec of its gang? */
	else if (CdbPathLocus_IsReplicated(subpath->locus))
	{
		/*
		 * If the subpath requires parameters, we cannot generate Motion atop of it.
		 */
		if (!bms_is_empty(PATH_REQ_OUTER(subpath)))
			return NULL;
		/* No-op if replicated-->replicated. */
		if (CdbPathLocus_IsReplicated(locus))
		{

			subpath->locus.numsegments = numsegments;
			return subpath;
		}

		/* Other destinations aren't used or supported at present. */
		goto invalid_motion_request;
	}

	/* Most motions from SegmentGeneral (replicated table) are disallowed */
	else if (CdbPathLocus_IsSegmentGeneral(subpath->locus))
	{
		/*
		 * The only allowed case is a SegmentGeneral to Hashed motion,
		 * and SegmentGeneral's numsegments is smaller than Hashed's.
		 * In such a case we redistribute SegmentGeneral to Hashed.
		 *
		 * FIXME: HashedOJ?
		 */
		if (CdbPathLocus_IsHashed(locus) &&
			(CdbPathLocus_NumSegments(locus) >
			 CdbPathLocus_NumSegments(subpath->locus)))
		{
			pathkeys = subpath->pathkeys;
		}
		else if (CdbPathLocus_IsReplicated(locus))
		{
			/*
			 * Assume that this case only can be generated in
			 * UPDATE/DELETE statement
			 */
			if (root->upd_del_replicated_table == 0)
				goto invalid_motion_request;

		}
		else if (CdbPathLocus_IsSegmentGeneral(locus))
		{
			subpath->locus.numsegments = Min(subpath->locus.numsegments, locus.numsegments);
			return subpath;
		}
		else goto invalid_motion_request;
	}

	else goto invalid_motion_request;

	
	if (IsA(subpath, MaterialPath)) subpath = ((MaterialPath *) subpath)->subpath; /* Don't materialize before motion. */
	/* MPP-3300: materialize *before* motion can never help us, motion pushes data. other nodes pull. We relieve motion deadlocks by adding materialize nodes on top of motion nodes */

	/* Create CdbMotionPath node. */
	pathnode = makeNode(CdbMotionPath); pathnode->path.pathtype = T_Motion;
	pathnode->path.parent = subpath->parent; pathnode->path.locus = locus; pathnode->path.rows = subpath->rows; pathnode->path.pathkeys = pathkeys; pathnode->subpath = subpath;	
	cdbpath_cost_motion(root, pathnode); /* Cost of motion */
	/* Tell operators above us that slack may be needed for deadlock safety. */
	pathnode->path.motionHazard = true; pathnode->path.rescannable = false;
	return (Path *) pathnode;
	
invalid_motion_request: /* Unexpected source or destination locus. */
	return NULL;
}

query_planner最后一个步骤执行make_one_rel --> make_rel_from_joinlist

make_one_rel --> set_base_rel_pathlist --> set_rel_pathlist --> set_append_rel_pathlist -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path
make_one_rel --> set_base_rel_pathlist --> set_rel_pathlist -->set_append_rel_pathlist --> get_cheapest_parameterized_child_path --> reparameterize_path -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path
set_append_rel_pathlist --> generate_mergeappend_paths -> create_merge_append_path -> set_append_path_locus --> cdbpath_create_motion_path
make_one_rel --> set_base_rel_sizes --> set_rel_size
set_rel_size --> set_dummy_rel_pathlist -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path
set_rel_size --> set_append_rel_size --> set_dummy_rel_pathlist -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path
set_rel_size -->set_subquery_pathlist --> set_dummy_rel_pathlist -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path

standard_join_search --> join_search_one_level --> make_join_rel --> join_is_legal --> create_unique_path --> cdbpath_create_motion_path
standard_join_search --> join_search_one_level -->make_rels_by_clause_joins --> make_join_rel --> join_is_legal --> create_unique_path --> cdbpath_create_motion_path
standard_join_search --> join_search_one_level -->make_rels_by_clauseless_joins --> make_join_rel --> join_is_legal --> create_unique_path --> cdbpath_create_motion_path
make_join_rel --> add_paths_to_joinrel --> sort_inner_and_outer --> create_unique_path --> cdbpath_create_motion_path
make_join_rel -->add_paths_to_joinrel --> match_unsorted_outer --> create_unique_path --> cdbpath_create_motion_path
make_join_rel -->add_paths_to_joinrel --> hash_inner_and_outer --> create_unique_path --> cdbpath_create_motion_path
make_join_rel --> mark_dummy_rel -> create_append_path -> set_append_path_locus --> cdbpath_create_motion_path
standard_join_search --> join_search_one_level --> have_join_order_restriction --> has_legal_joinclause --> join_is_legal --> create_unique_path --> cdbpath_create_motion_path
standard_join_search --> join_search_one_level -->make_rels_by_clause_joins --> have_join_order_restriction --> has_legal_joinclause --> join_is_legal --> create_unique_path --> cdbpath_create_motion_path


cdbpath_motion_for_join --> cdbpath_create_motion_path

CdbPathLocus
cdbpath_motion_for_join(PlannerInfo root,
JoinType jointype, /
JOIN_INNER/FULL/LEFT/RIGHT/IN */
Path *p_outer_path, / INOUT */
Path *p_inner_path, / INOUT */
List redistribution_clauses, / equijoin RestrictInfo list */
List *outer_pathkeys,
List *inner_pathkeys,
bool outer_require_existing_order,
bool inner_require_existing_order)

cdbpath_dedup_fixup
cdbpath_contains_wts
has_redistributable_clause
turn_volatile_seggen_to_singleqe

以ForeignPath为例

/* ForeignPath represents a potential scan of a foreign table
 * fdw_private stores FDW private data about the scan.  While fdw_private is not actually touched by the core code during normal operations, it's generally a good idea to use a representation that can be dumped by nodeToString(), so that you can examine the structure during debugging with tools like pprint(). */
typedef struct ForeignPath{
	Path		path;	List	   *fdw_private;
} ForeignPath;
ForeignPath *create_foreignscan_path(PlannerInfo *root, RelOptInfo *rel, double rows, Cost startup_cost, Cost total_cost, List *pathkeys, Relids required_outer, List *fdw_private) {
	ForeignPath *pathnode = makeNode(ForeignPath);

	/* Since the path's required_outer should always include all the rel's lateral_relids, forcibly add those if necessary.  This is a bit of a hack, but up till early 2019 the contrib FDWs failed to ensure that, and it's likely that the same error has propagated into many external FDWs.  Don't risk modifying the passed-in relid set here. */
	if (rel->lateral_relids && !bms_is_subset(rel->lateral_relids, required_outer)) required_outer = bms_union(required_outer, rel->lateral_relids);

	/* Although this function is only designed to be used for scans of baserels, before v12 postgres_fdw abused it to make paths for join and upper rels.  It will work for such cases as long as required_outer is empty (otherwise get_baserel_parampathinfo does the wrong thing), which fortunately is the expected case for now. */ // 尽管此函数仅设计用于扫描baserel,但在v12 postgres_fdw滥用它为join和upper rel创建路径之前。只要required_outer为空(否则get_baserel_parampathinfo会做错误的事情),它就会适用于这种情况,幸运的是,这是目前预期的情况。
	if (!bms_is_empty(required_outer) && !(rel->reloptkind == RELOPT_BASEREL || rel->reloptkind == RELOPT_OTHER_MEMBER_REL)) elog(ERROR, "parameterized foreign joins are not supported yet");

	pathnode->path.pathtype = T_ForeignScan;
	pathnode->path.parent = rel;
	pathnode->path.param_info = get_baserel_parampathinfo(root, rel, required_outer);
	pathnode->path.rows = rows; pathnode->path.startup_cost = startup_cost; pathnode->path.total_cost = total_cost;
	pathnode->path.pathkeys = pathkeys;
	switch (rel->ftEntry->exec_location){
		case FTEXECLOCATION_ANY: CdbPathLocus_MakeGeneral(&(pathnode->path.locus), getgpsegmentCount()); break;
		case FTEXECLOCATION_ALL_SEGMENTS: CdbPathLocus_MakeStrewn(&(pathnode->path.locus), getgpsegmentCount()); break;
		case FTEXECLOCATION_MASTER: CdbPathLocus_MakeEntry(&(pathnode->path.locus)); break;
		default: elog(ERROR, "unrecognized exec_location '%c'", rel->ftEntry->exec_location);
	}
	pathnode->fdw_private = fdw_private;

	return pathnode;
}
RelOptInfo *make_one_rel(PlannerInfo *root, List *joinlist)
  | -- set_base_rel_pathlists(PlannerInfo *root)
    | -- set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte)
           switch (rel->rtekind) case RTE_RELATION: if (rte->relkind == RELKIND_FOREIGN_TABLE)
             | -- set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte)
  else if (rte->inh) set_append_rel_pathlist(root, rel, rti, rte); /* It's an "append relation", process accordingly */
    | -- set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte)
      | -- set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte)
        | -- set_foreign_pathlist(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)
               rel->fdwroutine->GetForeignPaths(root, rel, rte->relid); /* Call the FDW's GetForeignPaths function to generate path(s) */
               set_cheapest(rel);  /* Select cheapest path */
/*
 * CdbPathLocus
 *
 * Specifies a distribution of tuples across segments.
 *
 * If locustype is CdbLocusType_Hashed or CdbLocusType_HashedOJ:
 *      Rows are distributed based on values of a partitioning key.  The
 *      partitioning key is often called a "distribution key", to avoid
 *      confusion with table partitioning.
 *
 *      A tuple's partitioning key consists of one or more key columns.
 *      When CdbPathLocus represents the distribution of a table, the
 *      partitioning key corresponds to the columns listed in DISTRIBUTED BY
 *      of the table, but the planner can distribute intermediate results
 *      based on arbitrary expressions.
 *
 *      The partitioning key is represented by a List of DistributionKeys,
 *      one for each key column. Each DistributionKey contains a list of
 *      EquivalenceClasses, which contain expressions that can be used
 *      to compute the value for the key column. Any of the expressions
 *      can be used to compute the value, depending on what relations are
 *      available at that part of the plan.
 *
 *      For example, if the query contains a "WHERE a=b" clause, the planner
 *      would form an EquivalenceClass that contains two members, "a" and "b".
 *      Because of the WHERE clause, either "a" and "b" can be used to
 *      compute the hash value. Usually, a DistributionKey contains only one
 *      EquivalenceClass, because whenever there is an equijoin on two
 *      expressions, the planner puts them in the same EquivalenceClass.
 *      However, if there are FULL JOINs in the query, the FULL JOIN quals do
 *      not form equivalence classes with other quals, because the NULL
 *      handling is different. See src/backend/optimizer/README for
 *      discussion on "outerjoin delayed" equivalence classes.
 *
 *      When a path locus is constructed for a FULL JOIN, CdbLocusType_HashedOJ
 *      is used instead of CdbLocusType_Hashed. The important distinction
 *      between Hashed and HashedOJ is the semantics for NULLs. In a Hashed
 *      distribution, a NULL is hashed like any other value, and all NULLs are
 *      located on a particular segment, based on the hash value of a NULL
 *      datum. But with HashedOJ, NULL values can legitimately appear on any
 *      segment!
 *
 *      For join optimization, Hashed and HashedOJ can both be used. In an inner
 *      join on A=B, NULL rows won't match anyway. And for an OUTER JOIN, it
 *      doesn't matter which segment the NULL rows appear on, as long as we
 *      correctly mark the resulting locus also as HashedOJ. But for grouping,
 *      HashedOJ can not be used, because you might end up with multiple NULL
 *      NULL groups, one for each segment!
 *
 * If locustype == CdbLocusType_Strewn:
 *      Rows are distributed according to a criterion that is unknown or
 *      may depend on inputs that are unknown or unavailable in the present
 *      context.  The 'distkey' field is NIL.
 *
 * If the distribution is not partitioned, then the 'distkey' field is NIL.
 *
 * The numsegments attribute specifies how many segments the tuples are
 * distributed on, from segment 0 to segment `numsegments-1`.  In the future
 * we might further change it to a range or list so discontinuous segments
 * can be described.  This numsegments has different meaning for different
 * locustype:
 * - Null: numsegments is usually meaningless in Null locus as it will be
 *   remade to other locus types later.  But there is also cases that we set
 *   a valid numsegments in Null locus, this value will be kept when remade
 *   it to other locus types, and it becomes meaningful after that;
 * - Entry: numsegments in Entry locus specify the candidate segments to put
 *   the Entry node on, it's master and all the primary segments in current
 *   implementation;
 * - SingleQE: numsegments in SingleQE locus specify the candidate segments
 *   to put the SingleQE node on, although SingleQE is always executed on one
 *   segment but numsegments usually have a value > 1;
 * - General: similar with Entry and SingleQE;
 * - SegmentGeneral, Replicated, Hashed, HashedOJ, Strewn: numsegments in
 *   these locus types specify the segments that contain the tuples;
 */
typedef struct CdbPathLocus
{
	CdbLocusType locustype;
	List	   *distkey;
	int			numsegments;
} CdbPathLocus;

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/685253.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

ConstraintLayout使用指南

ConstraintLayout ConstraintLayout 可让您使用扁平视图层次结构(无嵌套视图组)创建复杂的大型布局。它与 RelativeLayout 相似,其中所有的视图均根据同级视图与父布局之间的关系进行布局,但其灵活性要高于 RelativeLayout&#x…

【读书笔记】《软件工程导论》

目录 一、软件工程概述 二、启动阶段 三、计划阶段 四、实施阶段 五、收尾阶段 一、软件工程概述 软件危机:在计算机软件的开发和维护过程中遇到的一系列严重问题。 软件危机的产生与自身的特点有关,还与软件开发、管理的方法不正确有关。 软件危…

ELK 多用户登录

先搭建ELK集群环境 请移步至:FilebeatELK 搭建日志收集平台 ES开启TLS加密通信与身份认证 进入ES集群任意一台安装目录,生成ca证书 这里最好使用ES启动账号操作,证书生成过程中一直回车到完成,不要去输入密码。 # 生成CA证书 bi…

UNIX网络编程卷一 学习笔记 第二十一章 多播

单播地址标识单个IP接口,广播地址标识某个子网的所有IP接口,而多播地址标识一组IP接口。单播和广播是寻址方案的两个极端(要么单个要么全部),多播则意在两者之间提供一种折衷方案。多播数据报只应由对它感兴趣的接口接…

【软考网络管理员】2023年软考网管初级常见知识考点(8)-广域网技术

涉及知识点 广域网交换方式,广域网流量控制,广域网链路层协议,广域网传输标准,软考网络管理员常考知识点,软考网络管理员网络安全,网络管理员考点汇总。 原创于:CSDN博主-《拄杖盲学轻声码》&a…

MATLAB——Matlab R2018b软件安装教程

Matlab R2018b软件安装教程 1.选中【Matlab R2018b】压缩包,鼠标右击选择【解压到Matlab R2018b】。 2.双击打开【Matlab R2018b】文件夹。 3.双击打开【R2018b_win64】文件夹。 4.选中【setup】可执行文件,鼠标右击选择【以管理员身份运行】。 5.选择【…

AQS源码分析——以ReentrantLock为例

AQS自身属性: private transient volatile Node head; private transient volatile Node tail; private volatile int state; Node属性: // 共享 static final Node SHARED new Node(); // 独占 static final Node EXCLUSIVE null; // 线程被…

360手机黑科技“位置穿越”功能修复 360手机位置穿越不能用了 360手机刷机

360手机黑科技“位置穿越”功能修复 360手机位置穿越不能用了 360手机刷机 参考:360手机-360刷机360刷机包twrp、root 360刷机包360手机刷机:360rom.github.io 【前言】 360手机,内置的黑科技“位置穿越”,引用高德地图&#xff…

基于NSGA-II算法的多目标多旅行商问题建模求解

基于NSGA-II算法的多目标多旅行商问题建模求解 1引言2多目标多旅行商问题3多目标遗传算法NSGA-II3.1 编码3.2选择(锦标赛选择)3.3 交叉(顺序交叉)3.4 变异3.5快速非支配排序3.5.1符号说明3.5.2快速非支配排序[^7][^6]3.5.3快速非支…

C语言王国探险记之变量的前世今生

王国探险记系列 文章目录(2) 前言 一、变量是什么? 二、变量如何定义? 注意:当你赋值小数的时候,编译器会自然认为你是一个double类型的,所以你在使用float类型的时候要在小数后面加个f&…

Go语言基础:基本数据类型

一、整型 1. 整型分类 整型分为以下两个大类: (1)按长度分为:int8、int16、int32、int64 (2)对应的无符号整型:uint8、uint16、uint32、uint64 其中,uint8就是我们熟知的byte型&…

关于spring mvc 的ViewResolver的问题

今天在一个项目使用springboot的spring mvc的时候遇到了一个问题:在controller中使用返回字符串redirect:/xxxx/xxx进行重定向失败,出现了无法解析redirect:/xxxx/xxx。 找了半天才发现视图解析器ViewResolver中没有添加InternalResourceViewResolver 项…

微服务(总): eureka与nacos的区别及心跳配置修改

目录 简介: 1. nacos与eureka的区别 1.0 功能方面 1.1 连接方式不同 1.2 服务异常后多久剔除 1.2.1 eureka介绍: 1.2.2 nacos介绍: 1.3 操作的方式 1.3.1 nacos 专门的可视化界面(如图) 1.3.2 eureka 比较简约(如图) 1.4 保护机制介绍 1.4.1 CAP原则简介:(取自百…

Redis入门(5)-set

Redis中set的元素具有无序性与不可重复性 1.sadd key member[member] 添加元素,若元素存在返回0若不存在则添加 sadd DB mysql oracle sadd DB mysql sadd DB db22.smembers key 查看set中所有元素 smembers DB3.sismember key member 判断元素在set中是否存…

好书精读】网络是怎样连接的 —— 信号在网线和集线器中传输

( 该图由我使用 AI 绘制 ) 目录 每个包都是独立传输的 防止网线中的信号衰减很重要 “双绞”是为了抑制噪声 集线器将信号发往所有线路 每个包都是独立传输的 从计算机发送出来的网络包会通过集线器 、 路由器等设备被转发 , 最 终到达…

智安网络|弹性网络与网络安全:保卫数字世界的关键联结

随着数字化时代的来临,弹性网络和网络安全成为了信息交流和数据传输的关键基础。弹性网络为用户提供了高度可靠、灵活可扩展和强大的网络基础设施,而网络安全则旨在保护这个网络生态系统不受威胁。 弹性网络的定义与重要性 弹性网络的重要性&#xff1a…

【Leetcode60天带刷】day36——56. 合并区间,738.单调递增的数字

​ 题目: 56. 合并区间 以数组 intervals 表示若干个区间的集合,其中单个区间为 intervals[i] [starti, endi] 。请你合并所有重叠的区间,并返回 一个不重叠的区间数组,该数组需恰好覆盖输入中的所有区间 。 示例 1:…

python基础学习9【MinMaxScale()、StandScale()、DecimalScale、transformer】

标准化数据【离差标准化数据、标准差标准化数据、小数定标标准化数据】 离差标准化数据: 数据的整体分布情况并不会随离差标准化而发生改变,原先取值较大的数据,在做完离差标准化后的值依旧较大; 对原始数据的一种线性变换&…

模拟电路系列分享-运放的关键参数

目录 概要 整体架构流程 技术名词解释 1.输入失调电压 2.输入失调电压对电路的影响 3.数据手册中关于失调电压的描述 技术细节 小结: 概要 提示:这里可以添加技术概要 实际运放与理想运放具有很多差别。理想运放就像一个十全十美的人&#xff0…

C语言王国探险记之数据类型

王国探险记系列 文章目录(1) 目录 1,写程序和数据类型有啥关系? 2.C语言里面都有啥数据类型? 1,数据类型的介绍 2,为什么浮点数描述的是小数 3,区分程序中的数字那些是整型&am…