Non-deterministic Transitions
AND-OR Search Trees
• In deterministic environments在确定性环境中,分支仅由智能体的选择引起。, branching only occurs due to agent’s choice (OR Nodes)
• In non-deterministic environments在非确定性环境中,除了智能体的选择,环境的随机性也会导致分支, the environment’s choice must also be taken into account (AND Nodes)
• Solution is a subtree of the AND-OR tree that:
— Has a goal node at every leaf
— Specifies an action at each OR node
— Includes every outcome branch of its AND nodes
AND-OR Graph Search
Adversarial Optimal Decisions
• Time Complexity O(bm)
• Space Complexity O(bm)
• Chess, on average: b = 30 m = 40
Reducing Complexity
• Reducing complexity of bm
— Reduce branching factor (b)?
— Reduce maximum search depth (m)?
— Searching in a graph rather than a tree? 在树形结构中,状态之间的连接是分层的,而在图形结构中,状态之间的连接可以是任意形式的。
Reducing Branching Factor
• Alpha-Beta Pruning
— Evaluate which nodes/branches would not affect MIN/MAX’s decision
— Based on keeping track of two parameters:
◦ α - value of the best (highest) choice we have in MAX’s path
◦ β - value of the best (lowest) choice we have in MIN’s path
• Updates these values as one goes along the tree
Move Ordering
• Pruning is strongly affected by the ordering of the moves in the tree
— A good ordering*, would enable us to prune many nodes
• Move ordering is often game-dependent knowledge (heuristic)
• Dynamic move-ordering (killer-move heuristic) 可以利用搜索树中已知的有效剪枝信息。
Reducing Depth - Killer Move
• Dynamic heuristic to determine a “good” ordering
• Search two plies ahead until Max (alt. Min) causes a beta (alt. alpha) cutoff
• The move that caused the cutoff is the killer move
在搜索过程中,算法会搜索两步,直到MAX(或MIN)玩家导致一个剪枝。
如果一个移动导致剪枝,那么这个移动被称为killer move。
Reducing M - Eval Function 减少评估函数的复杂性
Weighted linear function over features of a state
示例:国际象棋当前状态:棋子和位置(结构)
示例:万智牌(纸牌游戏)当前状态:生命值、游戏卡牌和手牌
Graph Search
• As in non-adversarial search, many states will be revisited 搜索可能需要探索不同的路径
• However, only recording visited states is not enough (since MIN can deviate in the future)
• Need to store actual loop paths (memory intensive)
— Requires “caching” strategy
Stochastic Games
• Outcome of agent choices is not deterministic
— Games must take into account multiple outcomes for the player
• Solution: weight outcomes by their probability
— Expected value
Expectiminimax