watcher学习小结

news2026/1/5 10:54:12

架构

主要是watcher-api，watcher-applier，watcher-decision-engine

watcher-applier

watcher-decision-engine

将DecisionEngineManager和DecisionEngineSchedulingService封装到oslo_service，然后调service的launch_service，实际调用start方法

默认加载nova，然后execute

DecisionEngineManager

conductor_topic默认为watcher.decision.control

conductor_endpoints为audit_endpoint, data_model_endpoint, strategy_endpoint

notification_topics默认为nova.versioned_notifications和watcher.watcher_notifications

notification_endpoints为NovaClusterDataModelCollector

conductor监听

audit endpoint

传入一个audit uuid参数，执行对应audit对象的audit

根据audit类型判断接下来的动作，如果audit type是oneshot或event，则有对应类型的action，如果audit类型为continuous则无对应动作，可能什么都不做

oneshot

execute方法定义在父类，分为pre_execute，execute和post_execute

audit pre execute

检查watcher是否有ongoing状态的actionplan，如果有则抛异常，如果没有则继续

设置audit的状态为ongoing

audit execute

此阶段会执行strategy

发送通知：执行strategy前后会发通知，默认通知级别是INFO。通知包含了audit的goal和strategy，还包含了audit当前的阶段（phase），执行strategy前的phase是start，执行strategy后是end。通知的eventtype靠拼接生成：object + action (+ phase)可能形如audit.strategy.start/audit.strategy.end这种。

选择strategy：执行strategy前需要选择对应的strategy。先看audit对象有没有strategy名，有则直接加载对应strategy，如果没有strategy但有goal名称，则先获取所有strategy，遍历strategy寻找goal为对应名称的strategy，如果没找到strategy则抛异常

校验strategy并处理参数：将audit的scope更新给找到的strategy，获取strategy的schema，如果audit的parameter为空且scheme不为空则校验schema与audit的parameter，然后将audit的parameter更新给strategy

执行strategy：strategy也分为preexecute，execute和postexecute等阶段（以workload_balance strategy为例）

strategy pre execute

主要是获取compute model

先从collector manager获取collector（为compute对应的类），再将audit的audit_scope赋值给collector的_data_model_scope属性并返回一个scope处理器，调collector获取最新的compute model，然后scope处理器会根据这个compute model是否有scope属性再做处理，如果没有scope属性则直接返回compute model

通过collector获取最新compute model是通过调用collector的execute方法获取的，execute方法通过调用NovaModelBuilder返回一个ModelRoot实例作为compute model。获取model前会先做scope检查，然后整合scope，然后返回ModelRoot，再给ModeRoot里添加节点

添加节点：先从scope获取aggregates和availabilities_zones底下的主机，如果scope为空或者最后没获取到主机，则调hypervisors list获取所有不为ironic的节点。然后获取hypervisor详情列表，可能通过hypervisor-show，最后在hypervisor详情列表添加第一个hypervisor和其下的vms添加到ModelRoot的nodelist和instance list

赋值：获取compute model后，将audit的parameter里参数赋值到strategy对象的实例属性，比如threshold，period，metrics，granularity等