一 、 AllowedLateness API 延时关闭窗口
AllowedLateness 方法需要基于 WindowedStream 调用。AllowedLateness 需要设置一个延时时间,注意这个时间决定了窗口真正关闭的时间,而且是加上WaterMark的时间,例如 WaterMark的延时时间为2s,AllowedLateness 的时间为2s,那一个10的滚动窗口,0-10这个单位窗口正常的关窗时间应该是超过12s的数据到达之后就关窗。而AllowedLateness 是在12s的基础上继续延长了2s,也就是在14s的时候才真正去关闭 0-10s的窗口,但是在12s的时候会触发窗口计算,从12s之后到14s的数据每到达一个就会触发一次窗口计算。
二 、 OutputTag API 侧输出流
使用 OutputTag API 保证窗口关闭的数据依然可以获取,窗口到达AllowedLateness 时间后将彻底关闭,此时再属于该窗口范围内的数据将会流向 OutputTag 。
context.collect(new Event("A", "/user", 1000L));
Thread.sleep(3000);
context.collect(new Event("B", "/prod", 6500L));
Thread.sleep(3000);
context.collect(new Event("C", "/cart", 4000L));
Thread.sleep(3000);
context.collect(new Event("D", "/user", 7500L));
System.out.println("窗口关闭 ~ ");
Thread.sleep(3000);
context.collect(new Event("E", "/cente", 8500L));
Thread.sleep(3000);
context.collect(new Event("F", "/cente", 4000L));
Thread.sleep(3000);
context.collect(new Event("G", "/cente", 9200L));
Thread.sleep(3000);
context.collect(new Event("H", "/cente", 1000L));
Thread.sleep(3000);
context.collect(new Event("I", "/cente", 1500L));
Thread.sleep(3000);
如果现在定义一个 5s的
滚动窗口,WaterMark延时时间为2s,AllowedLateness 延时时间为2s,此时相当于是 WaterMark到达9s的时候才会关闭0-5的窗口,也就是说最后两条数据会流向OutputTag . 当4000L数据到达后,会再次触发一次窗口计算。
完全与预期一致。
完整代码:
public class WindowOutputTest {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = Env.getEnv();
DataStreamSource<Event> dataStreamSource = env.addSource(new SourceFunction<Event>() {
@Override
public void run(SourceContext<Event> context) throws Exception {
context.collect(new Event("A", "/user", 1000L));
Thread.sleep(3000);
context.collect(new Event("B", "/prod", 6500L));
Thread.sleep(3000);
context.collect(new Event("C", "/cart", 4000L));
Thread.sleep(3000);
context.collect(new Event("D", "/user", 7500L));
System.out.println("窗口关闭 ~ ");
Thread.sleep(3000);
context.collect(new Event("E", "/cente", 8500L));
Thread.sleep(3000);
context.collect(new Event("F", "/cente", 4000L));
Thread.sleep(3000);
context.collect(new Event("G", "/cente", 9200L));
Thread.sleep(3000);
context.collect(new Event("H", "/cente", 1000L));
Thread.sleep(3000);
context.collect(new Event("I", "/cente", 1500L));
Thread.sleep(3000);
}
@Override
public void cancel() {
}
});
//operator
SingleOutputStreamOperator<Event> operator = dataStreamSource.assignTimestampsAndWatermarks(
WatermarkStrategy.<Event>forBoundedOutOfOrderness(Duration.ofSeconds(2))// 水位线延时2s
.withTimestampAssigner(new SerializableTimestampAssigner<Event>() {
@Override
public long extractTimestamp(Event event, long l) {
return event.timestamp;
}
})
);
OutputTag<Event> eventOutputTag = new OutputTag<Event>("late") {
};
WindowedStream<Event, Boolean, TimeWindow> windowedStream = operator.keyBy(d -> true)
.window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.SECONDS)))
.allowedLateness(Time.of(2, TimeUnit.SECONDS))
.sideOutputLateData(eventOutputTag);
SingleOutputStreamOperator<String> windowAgg = windowedStream.aggregate(new AggregateFunction<Event, Long, Long>() {
@Override
public Long createAccumulator() {
return 0L;
}
@Override
public Long add(Event event, Long acc) {
return acc + 1;
}
@Override
public Long getResult(Long acc) {
return acc;
}
@Override
public Long merge(Long aLong, Long acc1) {
return null;
}
}, new ProcessWindowFunction<Long, String, Boolean, TimeWindow>() {
@Override
public void process(Boolean key, Context context, Iterable<Long> iterable, Collector<String> collector) throws Exception {
long start = context.window().getStart();
long end = context.window().getEnd();
collector.collect(new Timestamp(start) + " ~ " + new Timestamp(end) + " ===> " + iterable.iterator().next());
}
});
windowAgg.print("窗口数据 ");
//获取测输出流中的延时数据
DataStream<Event> sideOutput = windowAgg.getSideOutput(eventOutputTag);
sideOutput.print("测输出流:-> ");
env.execute();
}
}