Connection lease request time out 问题分析
问题背景
使用apache的HttpClient,我们知道可以通过setConnectionRequestTimeout()配置从连接池获取链接的超时时间,而Connection lease request time out正是从连接池获取链接超时的报错,这通常意味着总连接池数不够,或者说单个rote的连接池太小。但本人在生产环境中这二者在当时都接口请求量都足够的情况下依旧出现了这个问题
可以看到总的连接池数是3800,单个路由是300,这在实际的环境运行中是完全足够的
@Bean(initMethod="start", destroyMethod = "close", name="httpAsyncNotifyClient")
public CloseableHttpAsyncClient httpAsyncNotifyClient(){
RequestConfig config = RequestConfig.custom()
.setConnectTimeout(NOTIFY_CONNECT_TIMEOUT)
.setConnectionRequestTimeout(DEFAULT_REQUEST_TIMEOUT)
.setSocketTimeout(NOTIFY_SOCKET_TIMEOUT)
.build();
CloseableHttpAsyncClient httpClient = HttpAsyncClients.custom()
.setDefaultRequestConfig(config)
.setMaxConnTotal(3800)
.setMaxConnPerRoute(300)
.setKeepAliveStrategy(getKeepAliveStrategy(true)).build();
return httpClient;
}
但依旧出现了Connection lease request time out
问题分析
定位问题的来源是AbstractNIOConnPool#processPendingRequest()
private boolean processPendingRequest(final LeaseRequest<T, C, E> request) {
final T route = request.getRoute();
final Object state = request.getState();
final long deadline = request.getDeadline();
final long now = System.currentTimeMillis();
if (now > deadline) {
request.failed(new TimeoutException("Connection lease request time out"));
return false;
}
...
}
而这个方法在请求过程的调用着者是AbstractNIOConnPool#lease()我们注意到这里,这里使用了lock
,而AbstractNIOConnPool是共用的连接池,也就是说如果有某个执行processPendingRequest()比较耗时,将会导致其他请求被锁在外面,最终导致获取链接超时
public Future<E> lease(
final T route, final Object state,
final long connectTimeout, final long leaseTimeout, final TimeUnit timeUnit,
final FutureCallback<E> callback) {
final LeaseRequest<T, C, E> leaseRequest = new LeaseRequest<T, C, E>(route, state,
connectTimeout >= 0 ? timeUnit.toMillis(connectTimeout) : -1,
leaseTimeout > 0 ? timeUnit.toMillis(leaseTimeout) : 0,
future);
this.lock.lock();
try {
final boolean completed = processPendingRequest(leaseRequest);
if (!leaseRequest.isDone() && !completed) {
this.leasingRequests.add(leaseRequest);
}
if (leaseRequest.isDone()) {
this.completedRequests.add(leaseRequest);
}
} finally {
this.lock.unlock();
}
...
}
按理processPendingRequest()的理论上都是一些内部方法的调用,去检查是否有可以复用的链接、获取链接、去建立链接(异步的)等等按理不应该耗时,但其中有一个地址解析的方法,这里是做DNS的解析,通常情况下也是比较快的,但如果DNS服务有问题,的确有可能造成在这里耗时较长
remoteAddress = this.addressResolver.resolveRemoteAddress(route);
问题验证
这里使用asyncClient去提交两个请求,xxx.com这边找了一个DNS解析比较慢的域名(可以通过修改DNS解析服务器为国外的方式),第二请求则是一个比较快的域名
RequestConfig config = RequestConfig.custom()
.setConnectTimeout(2000)
.setConnectionRequestTimeout(300)
.setSocketTimeout(10 * 1000).build();
CloseableHttpAsyncClient asyncClient = HttpAsyncClients.custom().setDefaultRequestConfig(config)
.setMaxConnTotal(3800)
.setMaxConnPerRoute(300)
.build();
...
executorService.submit(()->{
asyncClient.execute(new HttpGet("https://xxx.com"), callback);
});
executorService.submit(()-> {
asyncClient.execute(new HttpGet("https://www.baidu.com/"), callback);
});
结果
可以看到两个请求机会同时开始获取链接,但第二个请求却出现了Connection lease request time out异常