问题分析
自己在测试环境部署了RocketMQ,发现namesrv很容易挂掉,于是就想着监控,挂了就发邮件通知。 查看了rocketmq-dashboard项目,发现只能监控Broker,遂放弃这一路径。 于是就从报错的日志入手,发现最终可以根据RocketMQTemplate获得可活动的NameServer。
报错日志
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598 INFO 59571 --- [tWorkerThread_2] RocketmqRemoting : NETTY CLIENT PIPELINE: CLOSE 192.168.240.86:9876
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598 INFO 59571 --- [tWorkerThread_2] RocketmqRemoting : closeChannel: the channel[192.168.240.86:9876] was removed from channel table
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598 INFO 59571 --- [tWorkerThread_2] RocketmqRemoting : NETTY CLIENT PIPELINE: CLOSE 192.168.240.86:9876
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598 INFO 59571 --- [tWorkerThread_2] RocketmqRemoting : eventCloseChannel: the channel[null] has been removed from the channel table before
12月 25 13:59:22 192.168.240.65 java[59571]: 2023-12-25 13:59:22.598 INFO 59571 --- [lientSelector_1] RocketmqRemoting : closeChannel: close the connection to remote address[192.168.240.86:9876] result: true
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.597 INFO 59571 --- [ntScan_thread_1] RocketmqRemoting : createChannel: begin to connect remote host[192.168.240.86:9876] asynchronously
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.597 INFO 59571 --- [tWorkerThread_3] RocketmqRemoting : NETTY CLIENT PIPELINE: CONNECT UNKNOWN => 192.168.240.86:9876
12月 25 13:59:25 192.168.240.65 java[59571]: 2023-12-25 13:59:25.598 WARN 59571 --- [ntScan_thread_1] RocketmqRemoting : createChannel: connect remote host[192.168.240.86:9876] failed, AbstractBootstrap$PendingRegistrationPromise@f2a3fc5(failure: io.netty.channel.AbstractChannel$AnnotatedConnectException: 拒绝连接: /192.168.240.86:9876)
根据日志可以发现是NettyRemotingClient类在做监控,持续调用,具体核心方法:
org.apache.rocketmq.remoting.netty.NettyRemotingClient#createChannel
private Channel createChannel ( String addr) throws InterruptedException {
NettyRemotingClient. ChannelWrapper cw = ( NettyRemotingClient. ChannelWrapper ) this . channelTables. get ( addr) ;
if ( cw != null && cw. isOK ( ) ) {
return cw. getChannel ( ) ;
} else {
if ( this . lockChannelTables. tryLock ( 3000L , TimeUnit . MILLISECONDS ) ) {
try {
cw = ( NettyRemotingClient. ChannelWrapper ) this . channelTables. get ( addr) ;
boolean createNewConnection;
if ( cw != null ) {
if ( cw. isOK ( ) ) {
Channel var4 = cw. getChannel ( ) ;
return var4;
}
if ( ! cw. getChannelFuture ( ) . isDone ( ) ) {
createNewConnection = false ;
} else {
this . channelTables. remove ( addr) ;
createNewConnection = true ;
}
} else {
createNewConnection = true ;
}
if ( createNewConnection) {
ChannelFuture channelFuture = this . bootstrap. connect ( RemotingHelper . string2SocketAddress ( addr) ) ;
LOGGER . info ( "createChannel: begin to connect remote host[{}] asynchronously" , addr) ;
cw = new NettyRemotingClient. ChannelWrapper ( channelFuture) ;
this . channelTables. put ( addr, cw) ;
}
} catch ( Exception var8) {
LOGGER . error ( "createChannel: create channel exception" , var8) ;
} finally {
this . lockChannelTables. unlock ( ) ;
}
} else {
LOGGER . warn ( "createChannel: try to lock channel table, but timeout, {}ms" , 3000L ) ;
}
if ( cw != null ) {
ChannelFuture channelFuture = cw. getChannelFuture ( ) ;
if ( channelFuture. awaitUninterruptibly ( ( long ) this . nettyClientConfig. getConnectTimeoutMillis ( ) ) ) {
if ( cw. isOK ( ) ) {
LOGGER . info ( "createChannel: connect remote host[{}] success, {}" , addr, channelFuture. toString ( ) ) ;
return cw. getChannel ( ) ;
}
LOGGER . warn ( "createChannel: connect remote host[" + addr + "] failed, " + channelFuture. toString ( ) ) ;
} else {
LOGGER . warn ( "createChannel: connect remote host[{}] timeout {}ms, {}" , new Object [ ] { addr, this . nettyClientConfig. getConnectTimeoutMillis ( ) , channelFuture. toString ( ) } ) ;
}
}
return null ;
}
}
追溯
以NettyRemotingClient类为起点,使用Debug分析,最终可以看到完整的调用链路:
监控开发
那么监控开发就很容易了,注册RocketMQTemplate,使用定时任务监听即可,示例代码如下:
@Slf4j
@Component
public class MQMonitorTask {
@Resource
private RocketMQTemplate rocketMQTemplate;
@Scheduled ( cron = "0/10 * * * * ?" )
public void scanNameServerBroker ( ) {
org. apache. rocketmq. remoting. RemotingClient remotingClient = rocketMQTemplate. getProducer ( )
. getDefaultMQProducerImpl ( ) . getMqClientFactory ( ) . getMQClientAPIImpl ( ) . getRemotingClient ( ) ;
List < String > nameServerAddressList = remotingClient. getNameServerAddressList ( ) ;
List < String > availableNameSrvList = remotingClient. getAvailableNameSrvList ( ) ;
log. info ( "nameServerAddressList:{}" , JSONUtil . toJsonStr ( nameServerAddressList) ) ;
log. info ( "availableNameSrvList:{}" , JSONUtil . toJsonStr ( availableNameSrvList) ) ;
}
}
另外要在SprongBoot启动类加上注解@EnableScheduling来开启定时任务。