生产问题（十四）K8S抢占CPU导致数据库链接池打爆

news2026/2/14 4:54:22

一、引言

线上一天出现了两次数据库连接失败的大量报错，一开始以为是数据库的问题，但是想了想如果是数据库的问题，应该会有大量的应用问题

具体分析之后，发现其实是容器cpu出现了Throttled，导致大量线程阻塞

二、分析

1、堆栈

既然出现了报错，又没有发布，先看看堆栈里面报错的地方

追踪到最底层的堆栈，显示的是数据库链接池耗尽[size:100; busy:1; idle:0; lastwait:44]，100个链接只有4个在处理，没有一个空闲的，那么很明显其他的链接所在的线程都被阻塞了

2、源码

虽然说不太可能，但是还是要看看源码。大多数同学都觉得框架里面是不可能有错的，所以也不会去看源码，基本上都是去其他方向分析，这个在99%的情况下没有问题，但是作者不止一次遇到框架本身是有问题的，只是遇到了一些极端情况。比如：

Mybatis拼接sql出错及源码解析_<if test="month!=null">-CSDN博客

生产问题（九）Mysql8.0 ddl问题_mysql8 default_authentication_plugin-CSDN博客

生产问题（十一）日志JavaAgent-NoClassDefFoundError_-javaagent 找不到报错-CSDN博客

生产问题（十三）谷歌Protobuf误修改系统全局时区-CSDN博客

这些问题，如果不往框架的方向上走是找不到问题原因的，那么言归正传，我们看源码，这是tomcat进行线程链接的地方，超时的话就会报

throw new PoolExhaustedException("[" + Thread.currentThread().getName()+"] " +
    "Timeout: Pool empty. Unable to fetch a connection in " + (maxWait / 1000) +
    " seconds, none available[size:"+size.get() +"; busy:"+busy.size()+"; idle:"+idle.size()+"; lastwait:"+timetowait+"].");

和lastwait:44的现象也对的上，给我们进一步指明了方向，有很多占据数据库链接的线程不做事，其他很多线程在等

private PooledConnection borrowConnection(int wait, String username, String password) throws SQLException {

        if (isClosed()) {
            throw new SQLException("Connection pool closed.");
        } //end if

        //get the current time stamp
        long now = System.currentTimeMillis();
        //see if there is one available immediately
        PooledConnection con = idle.poll();

        while (true) {
            if (con!=null) {
                //configure the connection and return it
                PooledConnection result = borrowConnection(now, con, username, password);
                if (result!=null) return result;
            }

            //if we get here, see if we need to create one
            //this is not 100% accurate since it doesn't use a shared
            //atomic variable - a connection can become idle while we are creating
            //a new connection
            if (size.get() < getPoolProperties().getMaxActive()) {
                //atomic duplicate check
                if (size.addAndGet(1) > getPoolProperties().getMaxActive()) {
                    //if we got here, two threads passed through the first if
                    size.decrementAndGet();
                } else {
                    //create a connection, we're below the limit
                    return createConnection(now, con, username, password);
                }
            } //end if

            //calculate wait time for this iteration
            long maxWait = wait;
            //if the passed in wait time is -1, means we should use the pool property value
            if (wait==-1) {
                maxWait = (getPoolProperties().getMaxWait()<=0)?Long.MAX_VALUE:getPoolProperties().getMaxWait();
            }

            long timetowait = Math.max(0, maxWait - (System.currentTimeMillis() - now));
            waitcount.incrementAndGet();
            try {
                //retrieve an existing connection
                con = idle.poll(timetowait, TimeUnit.MILLISECONDS);
            } catch (InterruptedException ex) {
                if (getPoolProperties().getPropagateInterruptState()) {
                    Thread.currentThread().interrupt();
                }
                SQLException sx = new SQLException("Pool wait interrupted.");
                sx.initCause(ex);
                throw sx;
            } finally {
                waitcount.decrementAndGet();
            }
            if (maxWait==0 && con == null) { //no wait, return one if we have one
                if (jmxPool!=null) {
                    jmxPool.notify(org.apache.tomcat.jdbc.pool.jmx.ConnectionPool.POOL_EMPTY, "Pool empty - no wait.");
                }
                throw new PoolExhaustedException("[" + Thread.currentThread().getName()+"] " +
                        "NoWait: Pool empty. Unable to fetch a connection, none available["+busy.size()+" in use].");
            }
            //we didn't get a connection, lets see if we timed out
            if (con == null) {
                if ((System.currentTimeMillis() - now) >= maxWait) {
                    if (jmxPool!=null) {
                        jmxPool.notify(org.apache.tomcat.jdbc.pool.jmx.ConnectionPool.POOL_EMPTY, "Pool empty - timeout.");
                    }
                    throw new PoolExhaustedException("[" + Thread.currentThread().getName()+"] " +
                        "Timeout: Pool empty. Unable to fetch a connection in " + (maxWait / 1000) +
                        " seconds, none available[size:"+size.get() +"; busy:"+busy.size()+"; idle:"+idle.size()+"; lastwait:"+timetowait+"].");
                } else {
                    //no timeout, lets try again
                    continue;
                }
            }
        } //while
    }