1、问题描述
频繁调用datax服务(从oracle同步到mysql)出现报错,获取不到连接
oracle读取时报错信息
"errorMessage": "Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).]. - 数据库连接失败. 因为根据您配置的连接信息,无法从:jdbc:oracle:thin:@//10.27.19.106:1722/BMS 中找到可连接的jdbcUrl. 请检查您的配置并作出修改. - java.lang.Exception: DataX无法连接对应的数据库,可能原因是:1) 配置的ip/port/database/jdbc错误,无法连接。2) 配置的username/password错误,鉴权失败。请和DBA确认该数据库的连接信息是否正确。\tat com.alibaba.datax.plugin.rdbms.util.DBUtil$2.call(DBUtil.java:73)\tat com.alibaba.datax.plugin.rdbms.util.DBUtil$2.call(DBUtil.java:53)\tat com.alibaba.datax.common.util.RetryUtil$Retry.call(RetryUtil.java:164)\tat com.alibaba.datax.common.util.RetryUtil$Retry.doRetry(RetryUtil.java:111)\tat com.alibaba.datax.common.util.RetryUtil.executeWithRetry(RetryUtil.java:30)\tat com.alibaba.datax.plugin.rdbms.util.DBUtil.chooseJdbcUrl(DBUtil.java:53)\tat com.alibaba.datax.plugin.rdbms.reader.util.OriginalConfPretreatmentUtil.dealJdbcAndTable(OriginalConfPretreatmentUtil.java:92)\tat com.alibaba.datax.plugin.rdbms.reader.util.OriginalConfPretreatmentUtil.simplifyConf(OriginalConfPretreatmentUtil.java:59)\tat com.alibaba.datax.plugin.rdbms.reader.util.OriginalConfPretreatmentUtil.doPretreatment(OriginalConfPretreatmentUtil.java:33)\tat com.alibaba.datax.plugin.rdbms.reader.CommonRdbmsReader$Job.init(CommonRdbmsReader.java:55)\tat com.alibaba.datax.plugin.reader.oraclereader.OracleReader$Job.init(OracleReader.java:37)\tat com.alibaba.datax.core.job.JobContainer.initJobReader(JobContainer.java:696)\tat com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:315)\tat com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119)\tat com.alibaba.datax.core.Engine.start(Engine.java:101)\tat com.alibaba.datax.core.Engine.entry(Engine.java:227)\tat com.midea.logistics.datasync.dataxservice.service.impl.DataxServiceImpl.lambda$doExecute$0(DataxServiceImpl.java:114)\tat java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)\tat java.util.concurrent.CompletableFuture$AsyncRun.exec(CompletableFuture.java:1632)\tat java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)\tat java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)\tat java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)\tat java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)"
mysql写入时报错信息
2024-10-18 15:32:07,133 [] ERROR c.a.datax.plugin.rdbms.util.DBUtil 420 - 从连接池中获取连接异常:
com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 10007, active 250, maxActive 250, creating 0
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1512)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1255)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.connect(DBUtil.java:418)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.access$000(DBUtil.java:22)
at com.alibaba.datax.plugin.rdbms.util.DBUtil$3.call(DBUtil.java:324)
at com.alibaba.datax.plugin.rdbms.util.DBUtil$3.call(DBUtil.java:321)
at com.alibaba.datax.common.util.RetryUtil$Retry.call(RetryUtil.java:164)
at com.alibaba.datax.common.util.RetryUtil$Retry.doRetry(RetryUtil.java:111)
at com.alibaba.datax.common.util.RetryUtil.executeWithRetry(RetryUtil.java:30)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.getConnection(DBUtil.java:321)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.getConnection(DBUtil.java:305)
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Task.prepare(CommonRdbmsWriter.java:245)
at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Task.prepare(MysqlWriter.java:73)
at com.alibaba.datax.core.taskgroup.runner.WriterRunner.run(WriterRunner.java:50)
at java.lang.Thread.run(Thread.java:750)
2、问题排查解决过程
2.1 第一反应猜测是数据库连接池连接不够用,获取连接等待时长过短,导致获取不到连接时报错,原来设置的maxActive(最大连接)200个,maxWai(t最大等待时长)6s,修改为maxActive=250,maxWai=12s
结果发现并没什么效果,排除了这两个因素,但是发现了一个规律:maxActive=200时,从第201个任务就开始一直报错,maxActive=250时,从第251个任务就开始一直报错,于是猜测很有可能是连接池没有回收获取的连接
2.2 咨询业务系统报错任务的执行频率(配了5个任务,有的任务是每隔3分钟执行一次,有的任务是4分钟执行一次,基本是同一个时间点,耗时最长80S秒,3个小时后,就报连接失败了),发现频率也不是特别高,耗时也不是特别长
补充描述:datax源码oracle切片那里获取了连接,没关闭。排查代码发现确实没关闭,调整后如下图,正常关闭连接
接着担心还有其它地方有类似问题,于是把调用关闭连接那里的所有引用代码都排查了一遍,未发现其它有遗漏的
2.3 接着2.2调整后测试,虽然没有了oracle的报错,但是mysql的报错依然存在,于是又仔细排查了一下代码,mysql写入时调用关闭连接的方法,有没有正常关闭连接,结果发现都是没问题的,但是连接没回收确实是事实,只能怀疑有些地方代码没按套路出牌
2.4 验证主动关闭连接触发回收和强制回收,看看是哪个环节有问题(主动关闭肯定有问题),当前参数removeAbandonedTimeout(连接池强制回收连接)是8h,为了看出效果,把removeAbandonedTimeout设置成了10s,
测试发现:大部分任务都是成功的,仍有少部分任务获取不到连接,并且报错任务无规律,于是得出结论:强制回收连接是生效的,如果当前任务刚好获取到了强制回收的连接就是成功的,那么就要研究如何从代码中排查连接池泄漏
2.5 上百度搜索如何排查数据库连接池泄漏,了解到durid连接池有个参数:LogAbandoned,设置为true时,会打印回收连接时的错误信息,于是调整后有了新的发现,如下,直接把回收连接异常时的堆栈信息打印出来了,主要是这个方法有问题:OriginalConfPretreatmentUtil.dealColumnConf()
2024-10-21 10:33:19,144 [] ERROR c.a.druid.pool.DruidDataSource 2666 - abandon connection, owner thread: job-0, connected at : 1729477939179, open stackTrace
at java.lang.Thread.getStackTrace(Thread.java:1564)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:1313)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1235)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:1225)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.connect(DBUtil.java:419)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.access$000(DBUtil.java:22)
at com.alibaba.datax.plugin.rdbms.util.DBUtil$3.call(DBUtil.java:324)
at com.alibaba.datax.plugin.rdbms.util.DBUtil$3.call(DBUtil.java:321)
at com.alibaba.datax.common.util.RetryUtil$Retry.call(RetryUtil.java:164)
at com.alibaba.datax.common.util.RetryUtil$Retry.doRetry(RetryUtil.java:111)
at com.alibaba.datax.common.util.RetryUtil.executeWithRetry(RetryUtil.java:30)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.getConnection(DBUtil.java:321)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.getConnection(DBUtil.java:305)
at com.alibaba.datax.plugin.rdbms.util.JdbcConnectionFactory.getConnecttion(JdbcConnectionFactory.java:27)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:125)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:140)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.doPretreatment(OriginalConfPretreatmentUtil.java:35)
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Job.init(CommonRdbmsWriter.java:41)
at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Job.init(MysqlWriter.java:31)
at com.alibaba.datax.core.job.JobContainer.initJobWriter(JobContainer.java:727)
at com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:316)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119)
at com.alibaba.datax.core.Engine.start(Engine.java:101)
at com.alibaba.datax.core.Engine.entry(Engine.java:227)
at com.midea.logistics.datasync.dataxservice.service.impl.DataxServiceImpl.lambda$doExecute$0(DataxServiceImpl.java:108)
2.6 进入相关代码发现获取连接是个野路子,而常规都是从DBUtil封装的从连接池获取连接
点进去 dealColumnConf(originalConfig, jdbcConnectionFactory, oneTable);
发现非异常逻辑,会调用DBUtil.getColumnMetaData(connectionFactory.getConnecttion(), oneTable,StringUtils.join(userConfiguredColumns, ","));点进去
看到这里已经真相大白了,根本没关闭连接,于是对代码进行了改造,从DBUtil获取连接,并在最外面一层调用后关闭连接,如下:
修改完后继续测试,发现又多了个新的报错信息:
com.alibaba.datax.common.exception.DataXException: Code:[DBUtilErrorCode-01], Description:[获取表字段相关信息失败.]. - 获取表:GEN_IDGEN_TEST_18 的字段的元信息时失败. 请联系 DBA 核查该库、表信息. - java.sql.SQLException: connection holder is null
at com.alibaba.druid.pool.DruidPooledConnection.checkStateInternal(DruidPooledConnection.java:1157)
at com.alibaba.druid.pool.DruidPooledConnection.checkState(DruidPooledConnection.java:1148)
at com.alibaba.druid.pool.DruidPooledConnection.createStatement(DruidPooledConnection.java:636)
at com.alibaba.datax.plugin.rdbms.util.DBUtil.getColumnMetaData(DBUtil.java:602)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:119)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.dealColumnConf(OriginalConfPretreatmentUtil.java:136)
at com.alibaba.datax.plugin.rdbms.writer.util.OriginalConfPretreatmentUtil.doPretreatment(OriginalConfPretreatmentUtil.java:36)
at com.alibaba.datax.plugin.rdbms.writer.CommonRdbmsWriter$Job.init(CommonRdbmsWriter.java:41)
at com.alibaba.datax.plugin.writer.mysqlwriter.MysqlWriter$Job.init(MysqlWriter.java:31)
at com.alibaba.datax.core.job.JobContainer.initJobWriter(JobContainer.java:727)
at com.alibaba.datax.core.job.JobContainer.init(JobContainer.java:316)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:119)
at com.alibaba.datax.core.Engine.start(Engine.java:101)
at com.alibaba.datax.core.Engine.entry(Engine.java:227)
at com.midea.logistics.datasync.dataxservice.service.impl.DataxServiceImpl.lambda$doExecute$0(DataxServiceImpl.java:108)
于是根据报错信息找到相关代码
第一反应就是连接为空,猜测可能是连接被关闭了,还在执行逻辑
接着排查这一部分的代码,发现:List<String> allColumns = DBUtil.getTableColumnsByConn(DATABASE_TYPE, conn, oneTable, ""),这段代码提前将连接关闭了
排查发现这段代码还有一处地方调用,于是决定将这里调整为不关闭连接,引用到这个方法的地方自行关闭,如下:
继续测试,任务成功,没有报错信息