1 窗口基本概念
1.1 概述
窗口,就是把无界的数据流,依据一定规则划分成一段一段的有界数据流来计算;
既然划分成有界数据段,通常都是为了"聚合";
Keyedwindow重要特性:任何一个窗口,都绑定在自己所属的key上;不同key的数据肯定不会划分到相同窗口中去!
1.2 窗口分类
滚动窗口
滑动窗口
会话窗口
没有固定的窗口长度,也没有固定的滑动步长,而是根据数据流中前后两个事件的时间间隔是否超出阈值(session gap)来划分;
1.2 窗口函数模板
KeyedWindows
stream
.keyBy(...) <- keyed versus non-keyed windows
.window(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"
NonKeyedWindows
stream
.windowAll(...) <- required: "assigner"
[.trigger(...)] <- optional: "trigger" (else default trigger)
[.evictor(...)] <- optional: "evictor" (else no evictor)
[.allowedLateness(...)] <- optional: "lateness" (else zero)
[.sideOutputLateData(...)] <- optional: "output tag" (else no side output for late data)
.reduce/aggregate/apply() <- required: "function"
[.getSideOutput(...)] <- optional: "output tag"