最近在项目开发中,遇到需要一次性保存100万数据到数据库中。想到以下几种实现方式:
第一种方案:在mapper文件中,实现批量插入动态SQL语句,采用insert into table_name values(?,?,?,),(?,?,?)拼接SQL方式。
<insert id="batchSave" parameterType="java.util.Collection" useGeneratedKeys="true" keyProperty="id">
insert into menu_resource(parent_id,path,resource_key,resource_type,status)
values
<foreach collection="list" item="menuResource" separator=",">
(
#{menuResource.parentId},
#{menuResource.path},
#{menuResource.resourceKey},
#{menuResource.resourceType},
#{menuResource.status}
)
</foreach>
</insert>
Service实现如下:
List<MenuResource> menuResourceList =
menuResourceAddDtoCollection.stream().map(this::convert2Do).collect(Collectors.toList());
menuResourceMapper.batchSave(menuResourceList);
如果此时不做任何限制,一次性插入100万条数据,首先会报运行时异常:
com.mysql.jdbc.PacketTooBigException: Packet for query is too large (16588949 > 4194304).
You can change this value on the server by setting the max_allowed_packet' variable.
这句报错是,MySQL默认执行query语句的字节长度默认不能超过4M的字节。很显然,100万条数据触发了该限制。
解决方案:修改数据库配置,将限制改为256M:set global max_allowed_packet = 2*1024*1024*128;
修改完毕后,多次执行发现,程序直接异常退出,更有甚者,系统出现死机。
后来发现:出现内存溢出问题。
优化:一次性插入100万条数据,必须要实行按批处理。经过调试发现,每次批量插入1000条数据是一个比较中肯方式。(如果表中字段很多)。
优化后代码如下:
List<MenuResource> menuResourceList =
menuResourceAddDtoCollection.stream().map(this::convert2Do).collect(Collectors.toList());
long startTime = System.currentTimeMillis();
int size = menuResourceList.size();
if (size <= 1000) {
menuResourceMapper.batchSave(menuResourceList);
return Boolean.TRUE;
}
int i = size / 1000,remainder = size % 1000,j=0;
for (; j < i; j++) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + 1000);
menuResourceMapper.batchSave(dos);
}
if (remainder != 0) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + remainder);
//自定义批量插入,insert into values(1000条数据)
menuResourceMapper.batchSave(dos);
}
long endTime = System.currentTimeMillis();
System.out.println("自定义Mapper实现批量插入耗时:" + (endTime - startTime) + " 毫秒");
经过测试发现:采用自定义的mapper文件的方式实现100万条数据批量插入,执行五次耗时分别为:38237、30524、32508、32455、33227【毫秒】。
第二种方式:利用Mybatis-Plus的saveBatch方式:
List<MenuResource> menuResourceList =
menuResourceAddDtoCollection.stream().map(this::convert2Do).collect(Collectors.toList());
long startTime = System.currentTimeMillis();
int size = menuResourceList.size();
if (size <= 1000) {
menuResourceMapper.batchSave(menuResourceList);
return Boolean.TRUE;
}
int i = size / 1000,remainder = size % 1000,j=0;
for (; j < i; j++) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + 1000);
menuResourceMapper.batchSave(dos);
}
if (remainder != 0) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + remainder);
//调用Mybatis-Plus的saveBatch方法
saveBatch(dos);
}
long endTime = System.currentTimeMillis();
System.out.println("借助Mybatis-Plus实现批量插入耗时:" + (endTime - startTime) + " 毫秒");
经过测试发现:平均耗时为:34219 毫秒,比着自定义的批量插入,耗时高了1-2s左右。
再进一步优化。
第三种实现方式:借助JDK1.7的Fork/Join框架,以多线程的方式,将100万数据插入的总任务拆分为多个子任务。实现异步并行处理,以此来达到优化的目的。
实现如下:自定义批量插入任务。
public class MenuResourceBatchInsertTask extends RecursiveTask<Void> {
//默认插入1000条数据
private final int threshold = 1000;
private List<MenuResource> menuResourceList;
private MenuResourceMapper menuResourceMapper;
public MenuResourceBatchInsertTask(List<MenuResource> menuResourceList, MenuResourceMapper menuResourceMapper) {
this.menuResourceList = menuResourceList;
this.menuResourceMapper = menuResourceMapper;
}
@Override
protected Void compute() {
if (menuResourceList.size() < threshold) {
menuResourceMapper.batchSave(menuResourceList);
} else {
int middle = menuResourceList.size() / 2;
List<MenuResource> startResourceList = menuResourceList.subList(0, middle);
MenuResourceBatchInsertTask task1 = new MenuResourceBatchInsertTask(startResourceList, menuResourceMapper);
List<MenuResource> endResourceList = menuResourceList.subList(middle,menuResourceList.size());
MenuResourceBatchInsertTask task2 = new MenuResourceBatchInsertTask(endResourceList, menuResourceMapper);
invokeAll(task1, task2);
}
return null;
}
}
Service层实现如下:
@Service
public class MenuResourceServiceImpl extends ServiceImpl<MenuResourceMapper, MenuResource>
implements MenuResourceService{
@Resource
private MenuResourceMapper menuResourceMapper;
private ForkJoinPool forkJoinPool = new ForkJoinPool();
@Override
public Boolean addResource(MenuResourceAddDto menuResourceAddDto) {
MenuResource menuResource = new MenuResource();
BeanUtils.copyProperties(menuResourceAddDto, menuResource);
menuResource.setResourceType(menuResourceAddDto.getResourceTypeEnum().getCode());
return save(menuResource);
}
@Override
@Transactional(rollbackFor = Exception.class)
public boolean batchSave(Collection<MenuResourceAddDto> menuResourceAddDtoCollection) {
/**
* 新增菜单资源,当新增菜单资源成功后,发送一条邮件信息,给到特定的主管人员
*
*/
if (CollectionUtils.isEmpty(menuResourceAddDtoCollection)) {
throw new BusinessException(ErrorCodeEnum.MENU_RESOURCE_NOT_EMPTY);
}
List<MenuResource> menuResourceList =
menuResourceAddDtoCollection.stream().map(this::convert2Do).collect(Collectors.toList());
/*long startTime = System.currentTimeMillis();
int size = menuResourceList.size();
if (size <= 1000) {
menuResourceMapper.batchSave(menuResourceList);
return Boolean.TRUE;
}
int i = size / 1000,remainder = size % 1000,j=0;
for (; j < i; j++) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + 1000);
menuResourceMapper.batchSave(dos);
}
if (remainder != 0) {
List<MenuResource> dos = menuResourceList.subList(j * 1000, j * 1000 + remainder);
//自定义批量插入,insert into values(1000条数据)
menuResourceMapper.batchSave(dos);
//使用mybatis-plus提供的批量插入:耗时 32598 毫秒
// saveBatch(dos);
}
long endTime = System.currentTimeMillis();
System.out.println("借助Mybatis-Plus实现批量插入耗时:" + (endTime - startTime) + " 毫秒");
*/
// menuResourceMapper.batchSave(menuResourceList);
long startTime = System.currentTimeMillis();
MenuResourceBatchInsertTask insertTask = new MenuResourceBatchInsertTask(menuResourceList,menuResourceMapper);
forkJoinPool.invoke(insertTask);
long endTime = System.currentTimeMillis();
System.out.println("耗时:" + (endTime - startTime) + " 毫秒");
return Boolean.TRUE;
}
private <T> MenuResourceVo convert2Vo(T dto) {
if (Objects.isNull(dto)) {
return null;
}
MenuResourceVo menuResourceVo = MenuResourceVo.builder().build();
BeanUtils.copyProperties(dto, menuResourceVo);
return menuResourceVo;
}
private <T> MenuResource convert2Do(T dto) {
if (Objects.isNull(dto)) {
return null;
}
MenuResource menuResource = new MenuResource();
BeanUtils.copyProperties(dto, menuResource);
menuResource.setResourceType(0);
return menuResource;
}
}
经测试发现:通过Fork/Join结合自定义mapper动态SQL批量插入100万条数据,耗时大约:10s
测试类如下:
@RunWith(SpringRunner.class)
@SpringBootTest
public class MenuResourceServiceTest {
@Resource
private MenuResourceService menuResourceService;
@Test
public void testBatchSave() {
List<MenuResourceAddDto> list = new ArrayList<>();
for (int i = 0; i < 1000000; i++) {
String uuid = UUID.randomUUID().toString();
MenuResourceAddDto dto = new MenuResourceAddDto();
dto.setResourceKey(uuid);
dto.setPath(uuid + "/" + i);
dto.setParentId(1L);
dto.setStatus(0);
dto.setResourceTypeEnum(MenuResourceTypeEnum.MENU);
list.add(dto);
}
menuResourceService.batchSave(list);
}
}
有些读者可能会突发奇想,能不能用Mybatis-plus的saveBatch方法替换掉
MenuResourceBatchInsertTask类compute方法的自定义批量插入方法。答案是不能。经过测试发现。此处效率会特别慢。
原因:Mybatis-Plus的saveBatch方法,是单个insert语句执行的
总结:针对大数据量的批量插入、删除、修改。都可以借助Fork/Join+自定义mapper文件的方式来提升程序性能。