环境:
micrometer 1.8.2
prometheus 0.14.1
spring-boot-actuator 2.6.6
使用案例
<!-- Springboot启动actuator,默认会引入依赖:micrometer-core -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>2.6.6</version>
</dependency>
<!-- micrometer桥接prometheus包,默认会引入相关依赖:io.prometheus:simpleclient -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>1.8.2</version>
</dependency>
Timer
打点记录任务的每次执行时间。兜底默认的时间窗口是1分钟。如果想要修改可以配置:io.micrometer.core.instrument.distribution.DistributionStatisticConfig.Builder#expiry
Metrics.timer("my_name", "my_tag_1", "my_tag_2").record(() -> {
doMyJob();
});
LongTaskTimer
与Timer类似,记录任务执行时间,官方注释中也说了LongTask是一个主观判断,比如:1分钟以上的任务
一个比较大区别在于多了一个接口方法:io.micrometer.core.instrument.LongTaskTimer#activeTasks
获取当前正在执行中的任务数量
Metrics.more().longTaskTimer("my_name", "my_tag").record(doMyJob());
Gague
在服务器拉取指标时,或者客户端上报指标时,调用提供的对象与方法获取当前指标。即:记录的是当前状态
RingBuffer<MatchingOutput> rb = disruptor.getRingBuffer();
Metrics.gauge("ringbuffer_remaining", Tags.of("my_tag_1", "my_tag_2"), rb, RingBuffer::remainingCapacity);
Counter
计数器打点
Metrics.counter("my_request", "my_tag_1", "my_tag_2").increment();
DistributionSummary
跟踪事件的样本分布。 一个例子是访问 http 服务器的请求的响应大小。
DistributionSummary ds = DistributionSummary.builder("my.data.size")
.tag("type", "my_type_1")
.publishPercentileHistogram()
.register(Metrics.globalRegistry);
ds.record(myValue);
配置actuator
配置指标拉取端口,以及需要曝光的web接口
management:
server:
port: 9999
endpoints:
web:
exposure:
include: '*'
metrics:
tags:
application: myAppName
Springboot整合启动流程
拉取指标:http://localhost:9999/actuator/prometheus
servlet配置
接口自动配置有很多入口,例如下面两个
- 普通web服务:org.springframework.boot.actuate.autoconfigure.endpoint.web.servlet.WebMvcEndpointManagementContextConfiguration#webEndpointServletHandlerMapping
- 云服务商:org.springframework.boot.actuate.autoconfigure.cloudfoundry.servlet.CloudFoundryActuatorAutoConfiguration#cloudFoundryWebEndpointServletHandlerMapping
servlet逻辑
org.springframework.boot.actuate.metrics.export.prometheus.PrometheusScrapeEndpoint
@ReadOperation(producesFrom = TextOutputFormat.class)
public WebEndpointResponse<String> scrape(TextOutputFormat format, @Nullable Set<String> includedNames) {
try {
Writer writer = new StringWriter(this.nextMetricsScrapeSize);
Enumeration<MetricFamilySamples> samples = (includedNames != null)
? this.collectorRegistry.filteredMetricFamilySamples(includedNames)
: this.collectorRegistry.metricFamilySamples();
format.write(writer, samples);
String scrapePage = writer.toString();
this.nextMetricsScrapeSize = scrapePage.length() + METRICS_SCRAPE_CHARS_EXTRA;
return new WebEndpointResponse<>(scrapePage, format);
}
catch (IOException ex) {
// This actually never happens since StringWriter doesn't throw an IOException
throw new IllegalStateException("Writing metrics failed", ex);
}
}
没有配置过滤器,获取枚举对象
io.prometheus.client.CollectorRegistry#metricFamilySamples -》 io.prometheus.client.CollectorRegistry.MetricFamilySamplesEnumeration#MetricFamilySamplesEnumeration()
io.prometheus.client.CollectorRegistry.MetricFamilySamplesEnumeration
- sampleNameFilter
- collectorIter:对应io.prometheus.client.CollectorRegistry#namesToCollectors属性的value集合
- 构造器中查询一次next:findNextElement
findNextElement
遍历collectorIter迭代器一次,并收集一次指标
- io.prometheus.client.Collector#collect(io.prometheus.client.Predicate<java.lang.String>)
- io.micrometer.prometheus.MicrometerCollector#collect
- 遍历io.micrometer.prometheus.MicrometerCollector#children集合中所有io.micrometer.prometheus.MicrometerCollector.Child对象
- 例如Gauge类型中的lambda匿名实现:io.micrometer.prometheus.PrometheusMeterRegistry#newGauge
- 将遍历的child中所有样本,按照conventionName(例如:ringbuffer_remaining)分组,每个组对应一个样本家庭:io.prometheus.client.Collector.MetricFamilySamples
- 返回List,将其迭代器赋给next属性:io.prometheus.client.CollectorRegistry.MetricFamilySamplesEnumeration#next
- 遍历samples:io.prometheus.client.Collector.MetricFamilySamples#samples
- 将sample(io.prometheus.client.Collector.MetricFamilySamples.Sample)数据写入response响应结果:org.springframework.boot.actuate.metrics.export.prometheus.TextOutputFormat#CONTENT_TYPE_004#write
接口输出案例
公共配置的tag,所有的指标都会带有该tag:application=myAppName
指标名称:ringbuffer_remaining
指标tag:type=my_tag_1
指标类型:gauge
# HELP ringbuffer_remaining
# TYPE ringbuffer_remaining gauge
ringbuffer_remaining{application="myAppName",type="my_tag_1",} 1024.0
采样取数逻辑
Gauge
结合前面Gague使用案例的代码
io.micrometer.core.instrument.internal.DefaultGauge
- ref:对应ringbuffer实例的弱引用
- value:对应RingBuffer::remainingCapacity方法
取样逻辑即直接调用实例响应方法返回的结果作为打点value
public class DefaultGauge<T> extends AbstractMeter implements Gauge {
...
private final WeakReference<T> ref;
private final ToDoubleFunction<T> value;
...
@Override
public double value() {
T obj = ref.get();
if (obj != null) {
try {
return value.applyAsDouble(obj);
}
catch (Throwable ex) {
logger.log("Failed to apply the value function for the gauge '" + getId().getName() + "'.", ex);
}
}
return Double.NaN;
}
...
}
Timer
io.micrometer.prometheus.PrometheusTimer
- count:LongAdder,递增计数器
- totalTime:LongAdder,任务耗时累加结果
- max:io.micrometer.core.instrument.distribution.TimeWindowMax,简化版的ringbuffer,用于记录时间窗口中的最大值
- histogramFlavor:直方图风味(类型),当前版本只有两种:Prometheus/VictoriaMetrics
- histogram
- Prometheus类型:io.micrometer.core.instrument.distribution.TimeWindowFixedBoundaryHistogram#TimeWindowFixedBoundaryHistogram
- VictoriaMetrics类型:io.micrometer.core.instrument.distribution.FixedBoundaryVictoriaMetricsHistogram#FixedBoundaryVictoriaMetricsHistogram
取样逻辑即监控的方法实际调用时就会触发打点记录。取样逻辑只是在接口拉取数据时调用实例实现的接口方法拍一个样本快照
- io.micrometer.core.instrument.distribution.HistogramSupport#takeSnapshot()
- io.micrometer.prometheus.PrometheusTimer#takeSnapshot
- io.micrometer.core.instrument.AbstractTimer#takeSnapshot
- 如果histogram != null则追加histogramCounts数据
--io.micrometer.core.instrument.AbstractTimer#takeSnapshot
@Override
public HistogramSnapshot takeSnapshot() {
return histogram.takeSnapshot(count(), totalTime(TimeUnit.NANOSECONDS), max(TimeUnit.NANOSECONDS));
}
--io.micrometer.prometheus.PrometheusTimer#takeSnapshot
@Override
public HistogramSnapshot takeSnapshot() {
HistogramSnapshot snapshot = super.takeSnapshot();
if (histogram == null) {
return snapshot;
}
return new HistogramSnapshot(snapshot.count(),
snapshot.total(),
snapshot.max(),
snapshot.percentileValues(),
histogramCounts(),
snapshot::outputSummary);
}
时间窗口
io.micrometer.core.instrument.distribution.TimeWindowMax
- rotatingUpdater:AtomicIntegerFieldUpdater,rotating标志符原子更新方法
- clock:Clock,系统时钟,返回当前系统时间戳
- durationBetweenRotatesMills:long,滚动步进大小
- ringBuffer:AtomicLong[],队列
- currentBucket:int,队列当前游标
- lastRotateTimestampMills:上一次rotate的时间戳
- rotating:int,标志符,0 - not rotating, 1 - rotating
每次写入record,或者查询poll,都会提前校验下是否需要翻转,调用rotate方法
io.micrometer.core.instrument.distribution.TimeWindowMax#rotate
- wallTime=系统当前时间
- timeSinceLastRotateMillis = wallTime - lastRotateTimestampMillis,即:当前时间距离上次翻转的时间间隔
- 如果低于步进,直接返回不需要翻转:timeSinceLastRotateMillis < durationBetweenRotatesMillis
- 否则更新标志符,表示当前正在翻转,需要阻塞等待下
- 如果timeSinceLastRotateMillis已经超出整个队列的长度了:timeSinceLastRotateMillis >= durationBetweenRotatesMillis * ringBuffer.length
- 那么直接重置队列返回即可
- 遍历ringBuffer所有位置设置为0
- currentBucket更新为0
- 更新上次翻转时间:lastRotateTimestampMillis = wallTime - timeSinceLastRotateMillis % durationBetweenRotatesMillis
- 否则,将当前时间与上次翻转时间之间已经超时的bucket重置为0
int iterations = 0;
do {
ringBuffer[currentBucket].set(0);
if (++currentBucket >= ringBuffer.length) {
currentBucket = 0;
}
timeSinceLastRotateMillis -= durationBetweenRotatesMillis;
lastRotateTimestampMillis += durationBetweenRotatesMillis;
} while (timeSinceLastRotateMillis >= durationBetweenRotatesMillis && ++iterations < ringBuffer.length);
例如:当前时间为4,上次翻转时间为2,队列大小为3,durationBetweenRotatesMillis=1,currentBucket=1,那么timeSinceLastRotateMillis=4-2=2
循环第1轮
- 更新ringBuffer[1]=0
- 更新currentBucket=2
- 更新timeSinceLastRotateMillis=2-1
- 更新lastRotateTimestampMillis=2+1
- 更新iterations=1
循环第2轮
- 更新ringBuffer[2]=0
- 更新currentBucket=3
- currentBucket>=队列长度
- 重置currentBucket=0
- 更新timeSinceLastRotateMillis=1-1
- 更新lastRotateTimestampMillis=3+1
- 更新iterations=2,此时timeSinceLastRotateMillis=0,小于durationBetweenRotatesMillis,结束循环
一次旋转图例
当发现上次旋转时间(lastRotateTimestampMillis)已经落后当前时间(wallTime)4个单位后,lastRotateTimestampMillis向右移动4个时间单位,currentBucket也向右移动4个单位。但是因为currentBucket是数组的index,当越界的时候就移动到0继续(一个环)。例如下图:
currentBucket向右移动4个单位,队列长度为3,当前index=0,那么移动后index=2(转了一圈)
总结
Micrometer可以整合Prometheus也可以整合influxDB等时序数据库,主要作用就是桥接,类似于Slf4j与log4j,logback的关系。提供一个通用的打点能力,并将打点数据对接到相应的时序数据库,用户只需要关心何时打点即可。例如:
- 桥接包中的io.micrometer.prometheus.PrometheusMeterRegistry,将打点数据桥接至io.prometheus.client.CollectorRegistry
- 桥接包中的io.micrometer.influx.InfluxMeterRegistry,将打点数据按照influx协议桥接push至influxDB。
- 默认push频率为1分钟一次,可以按需配置:io.micrometer.core.instrument.push.PushRegistryConfig#step
- 线程池默认为单线程:java.util.concurrent.Executors#newSingleThreadScheduledExecutor(java.util.concurrent.ThreadFactory)
- 线程池线程命名规则针对influxDB实现:influx-metrics-publisher
actuator就像是启动器,会将对接具体时序数据库所需要的配置自动化,例如指标矩阵相关的:Prometheus曝光web接口的相关配置,influx相关配置,micrometer metrics等等相关配置
- org.springframework.boot.actuate.autoconfigure.metrics.JvmMetricsAutoConfiguration
- org.springframework.boot.actuate.autoconfigure.metrics.KafkaMetricsAutoConfiguration
- org.springframework.boot.actuate.autoconfigure.metrics.Log4J2MetricsAutoConfiguration
- org.springframework.boot.actuate.autoconfigure.metrics.LogbackMetricsAutoConfiguration
- org.springframework.boot.actuate.autoconfigure.metrics.SystemMetricsAutoConfiguration
最终可以通过Grafana等报表工具对接打点数据源展示图表。常见的架构有:Micrometer-》Prometheus-〉Grafana
注意:前端页面渲染存在瓶颈,例如一个指标的tag如果太多会导致报表非常的卡顿,一般5k个tag就会有感知,1W+会明显影响使用