引言
接着上篇:Spring Batch 步骤对象-步骤Step与Tasklet 了解step步骤概念及其使用之后,本篇再来讲解spring batch使用更广,功能更强大的tasklet:居于块的批处理步骤:Chunk Tasklet
简介
居于chunk(块)的Tasklet相对简单Tasklet来说,多了3个模块:ItemReader( 读模块), ItemProcessor(处理模块),ItemWriter(写模块), 跟它们名字一样, 一个负责数据读, 一个负责数据加工,一个负责数据写。
结构图:
时序图:
需求:简单演示chunk Tasklet使用
ItemReader ItemProcessor ItemWriter 都接口,直接使用匿名内部类方式方便创建
package com.langfeiyes.batch._08_step_chunk_tasklet;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.item.*;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import java.util.Arrays;
import java.util.List;
@SpringBootApplication
@EnableBatchProcessing
public class ChunkTaskletJob {
@Autowired
private JobLauncher jobLauncher;
@Autowired
private JobBuilderFactory jobBuilderFactory;
@Autowired
private StepBuilderFactory stepBuilderFactory;
@Bean
public ItemReader itemReader(){
return new ItemReader() {
@Override
public Object read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
System.out.println("-------------read------------");
return "read-ret";
}
};
}
@Bean
public ItemProcessor itemProcessor(){
return new ItemProcessor() {
@Override
public Object process(Object item) throws Exception {
System.out.println("-------------process------------>" + item);
return "process-ret->" + item;
}
};
}
@Bean
public ItemWriter itemWriter(){
return new ItemWriter() {
@Override
public void write(List items) throws Exception {
System.out.println(items);
}
};
}
@Bean
public Step step1(){
return stepBuilderFactory.get("step1")
.chunk(3) //设置块的size为3次
.reader(itemReader())
.processor(itemProcessor())
.writer(itemWriter())
.build();
}
//定义作业
@Bean
public Job job(){
return jobBuilderFactory.get("step-chunk-tasklet-job")
.start(step1())
.incrementer(new RunIdIncrementer())
.build();
}
public static void main(String[] args) {
SpringApplication.run(ChunkTaskletJob.class, args);
}
}
执行完了之后结果
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret
-------------process------------>read-ret
-------------process------------>read-ret
[process-ret->read-ret, process-ret->read-ret, process-ret->read-ret]
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret
-------------process------------>read-ret
-------------process------------>read-ret
[process-ret->read-ret, process-ret->read-ret, process-ret->read-ret]
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret
-------------process------------>read-ret
-------------process------------>read-ret
[process-ret->read-ret, process-ret->read-ret, process-ret->read-ret]
....
观察上面打印结果,得出2个得出。
1>程序一直在循环打印,先循环打印3次reader, 再循环打印3次processor,最后一次性输出3个值。
2>死循环重复上面步骤
问题来了,为啥会出现这种效果,该怎么改进?
其实这个是ChunkTasklet 执行特点,ItemReader会一直循环读,直到返回null,才停止。而processor也是一样,itemReader读多少次,它处理多少次, itemWriter 一次性输出当前次输入的所有数据。
我们改进一下上面案例,要求只读3次, 只需要改动itemReader方法就行
int timer = 3;
@Bean
public ItemReader itemReader(){
return new ItemReader() {
@Override
public Object read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
if(timer > 0){
System.out.println("-------------read------------");
return "read-ret-" + timer--;
}else{
return null;
}
}
};
}
结果不在死循环了
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret-3
-------------process------------>read-ret-2
-------------process------------>read-ret-1
[process-ret->read-ret-3, process-ret->read-ret-2, process-ret->read-ret-1]
思考一个问题, 如果将timer改为 10,而 .chunk(3) 不变结果会怎样?
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret-10
-------------process------------>read-ret-9
-------------process------------>read-ret-8
[process-ret->read-ret-10, process-ret->read-ret-9, process-ret->read-ret-8]
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret-7
-------------process------------>read-ret-6
-------------process------------>read-ret-5
[process-ret->read-ret-7, process-ret->read-ret-6, process-ret->read-ret-5]
-------------read------------
-------------read------------
-------------read------------
-------------process------------>read-ret-4
-------------process------------>read-ret-3
-------------process------------>read-ret-2
[process-ret->read-ret-4, process-ret->read-ret-3, process-ret->read-ret-2]
-------------read------------
-------------process------------>read-ret-1
[process-ret->read-ret-1]
找出规律了嘛?
当chunkSize = 3 表示 reader 先读3次,提交给processor处理3次,最后由writer输出3个值
timer =10, 表示数据有10条,一个批次(趟)只能处理3条数据,需要4个批次(趟)来处理。
是不是有批处理味道出来
结论:chunkSize 表示: 一趟需要ItemReader读多少次,ItemProcessor要处理多少次。
到这,本篇就结束了,欲知后事如何,请听下回分解~
转视频版
看文字不过瘾可以切换视频版:Spring Batch高效批处理框架实战