1 项目目标
(1)熟练部署pushgateway
(2)使用api增删改查数据
(3)使用python Client SDK Push数据到pushgateway
2.1 规划节点
主机名 | 主机IP | 节点规划 |
prome-master01 | 10.0.1.10 | 服务端 |
prome-node01 | 10.0.1.20 | 客户端 |
2.2 基础准备
环境准备:Prometheus环境、grafana环境
项目地址:GitHub - prometheus/pushgateway: Push acceptor for ephemeral and batch jobs.
3 项目实施
3.1 部署安装pushgateway
什么是pushgateway
- Pushgateway是一个独立的服务组件,可在HTTP REST API上接收Prometheus指标,位于发送指标的应用服务程序和Prometheus服务器之间。Pushgateway接收指标,然后将其作为目标进行抓取,以便将指标提供给Prometheus服务器。
什么情况下使用pushgateway
- https://prometheus.io/docs/practices/pushing/
- Pushgateway的唯一有效用例是捕获服务级别批处理作业的结果
- pull网络不通,但有替代方案
pushgateway 注意事项
- 不支持带时间戳上报,会被忽略
- 当通过单个Pushgateway监视多个实例时,Pushgateway既成为单个故障点,又成为潜在的瓶颈。
- Prometheus为每个采集的target生成的up指标无法使用
- Pushgateway永远不会删除推送到其中的系列,除非通过Pushgateway的API手动删除了这些系列,否则它们将永远暴露给Prometheus
去这里下载最新的:Releases · prometheus/pushgateway (github.com)
wget https://github.com/prometheus/pushgateway/releases/download/v1.8.0/pushgateway-1.8.0.linux-amd64.tar.gz
tar xf pushgateway-1.8.0.linux-amd64.tar.gz -C /usr/local/
mv /usr/local/pushgateway-1.8.0.linux-amd64/ /usr/local/pushgateway
vim /usr/lib/systemd/system/pushgateway.service
[Unit]
Description=pushgateway server
Wants=network-online.target
After=network-online.target
[Service]
ExecStart=/usr/local/pushgateway/pushgateway
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=pushgateway
[Install]
WantedBy=default.target
systemctl start pushgateway
systemctl enable pushgateway
systemctl status pushgateway
访问IP:9091 pushgateway页面
pushgateway部署完成
将单个pushgateway加入prometheus采集job中:prometheus.yml
- job_name: 'pushgateway'
honor_timestamps: true
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- 172.20.70.205:9091
- 172.20.70.215:9091
刷新Prometheus
3.2 使用API Push 数据
我们要 Push 数据到 PushGateway 中,可以通过其提供的 API 标准接口来添加,默认 URL 地址为:http://<ip>:9091/metrics/job/<JOBNAME>{/<LABEL_NAME>/<LABEL_VALUE>}
,其中 <JOBNAME>
是必填项,为 job 标签值,后边可以跟任意数量的标签对,一般我们会添加一个 instance/<INSTANCE_NAME>
实例名称标签,来方便区分各个指标。
接下来,可以 Push 一个简单的指标数据到 PushGateway 中测试一下。
echo "test_metric 123456" | curl --data-binary @- http://10.0.1.10:9091/metrics/job/test_job
- 添加更多更复杂数据,通常数据会带上 instance, 表示来源位置:
cat <<EOF | curl --data-binary @- http://10.0.1.10:9091/metrics/job/some_job/instance/some_instance
# TYPE some_metric counter
some_metric{label="val1"} 42
# TYPE another_metric gauge
# HELP another_metric Just an example.
another_metric 2398.283
EOF
- 删除某个组下的某实例的所有数据:
curl -X DELETE http://10.0.1.10:9091/metrics/job/some_job/instance/some_instance
- 删除某个组下的所有数据:
curl -X DELETE http://10.0.1.10:9091/metrics/job/some_job
可以发现 pushgateway 中的数据我们通常按照 job
和 instance
分组分类,所以这两个参数不可缺少。
3.3 使用python Client SDK Push数据到pushgateway
pip install prometheus_client
pip install requests
demo样例:在Python应用程序中集成Prometheus客户端库,收集监控数据并定期将数据推送到Prometheus的Pushgateway服务。这使得Prometheus可以从不支持或不能直接暴露指标的程序中收集指标信息。
# coding:utf-8
import math
import time
import requests
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway, Counter, Histogram, Summary
import random
# 初始化CollectorRegistry
r1 = CollectorRegistry()
# pushgateway api地址
push_addr = "10.0.1.10:9091"
# 初始化四种不同类型的metrics
g1 = Gauge('test_gauge_01', 'Description of gauge', ['k1', 'k2'], registry=r1)
c1 = Counter('test_counter_01', 'HTTP Request', ['method', 'endpoint'], registry=r1)
test_buckets = (-5, 0, 5)
h1 = Histogram('test_histogram_01', 'test of histogram', buckets=test_buckets, registry=r1)
s1 = Summary('test_summary_01', 'A summary', registry=r1)
# 业务逻辑定义
def collect():
g1.labels(k1='v1', k2='v2').set(random.randint(1, 100))
c1.labels(method='get', endpoint='/login').inc(10)
h1.observe(random.randint(-10, 10))
f(random.uniform(0, 1))
@s1.time()
def f(t):
time.sleep(t)
# 自定义处理函数
def custom_handle(url, method, timeout, headers, data):
def handle():
h = {}
for k, v in headers:
h[k] = v
if method == 'PUT':
resp = requests.put(url, data=data, headers=h, timeout=timeout)
elif method == 'POST':
resp = requests.post(url, data=data, headers=h, timeout=timeout)
elif method == 'DELETE':
resp = requests.delete(url, data=data, headers=h, timeout=timeout)
else:
return
if resp.status_code >= 400:
raise IOError("error talking to pushgateway: {0} {1}".format(resp.status_code, resp.text))
return handle
# 主循环
if __name__ == '__main__':
step = 10
while True:
for i in ["a", "b", "c", "d"]:
collect()
res = push_to_gateway(push_addr, job='test_job_{}'.format(i), registry=r1, timeout=5, handler=custom_handle)
print(res)
time.sleep(step)
可以看到以下job:
访问http://10.0.1.10:9091/metrics 可以查看到push上来的值
在graph中查询也可看到test_的值。
在grafana创建一个dashboard
gauge:test_gauge_01
counter:rate(test_counter_01_total[1m])
histogram:histogram_quantile(0.90,sum(rate(test_histogram_01_bucket{}[30s])) by (le))
summary:rate(test_summary_01_count[1m])