今日已办
Jeager
功能
- 监控分布式工作流程并排除故障
- 识别性能瓶颈
- 追踪根本原因
- 分析服务依赖关系
部署
-
部署 Deployment — Jaeger documentation (jaegertracing.io)
-
支持 clickhouse jaegertracing/jaeger-clickhouse: Jaeger ClickHouse storage plugin implementation (github.com)
-
使用 prometheus 监控 Service Performance Monitoring (SPM) — Jaeger documentation (jaegertracing.io)
-
使用 elasticsearch docker - How to configure Jaeger with elasticsearch? - Stack Overflow
- github issue jaeger-collector: Failed to init storage factory · Issue #931 · jaegertracing/jaeger (github.com)
version: "3"
services:
proxy:
image: traefik:v3.0
container_name: proxy
hostname: proxy
networks:
- elastic-jaeger
ports:
- "80:80"
- "8080:8080"
command:
- --ping=true
- --api.dashboard=true
- --api.insecure=true
- --providers.file.directory=/etc/traefik
- --providers.file.watch=true
- --entrypoints.web-entrypoint.address=:80
- --entrypoints.kafka-entrypoint.address=:9092
- --accesslog=true
- --metrics.openTelemetry=true
- --metrics.openTelemetry.address=jaeger-collector:4317
- --metrics.openTelemetry.grpc=true
- --metrics.openTelemetry.insecure=true
- --tracing.openTelemetry=true
- --tracing.openTelemetry.address=jaeger-collector:4317
- --tracing.openTelemetry.grpc=true
- --tracing.openTelemetry.insecure=true
- --log.level=WARN # DEBUG, INFO, WARN, ERROR, FATAL, PANIC
healthcheck:
test: [ "CMD-SHELL", "traefik healthcheck --ping" ]
interval: 5s
timeout: 3s
retries: 3
volumes:
- ./config/traefik:/etc/traefik
elasticsearch:
image: elasticsearch:7.17.12
container_name: elasticsearch
networks:
- elastic-jaeger
ports:
- "127.0.0.1:9200:9200"
- "127.0.0.1:9300:9300"
restart: on-failure
environment:
- cluster.name=jaeger-cluster
- discovery.type=single-node
- http.host=0.0.0.0
- transport.host=127.0.0.1
- ES_JAVA_OPTS=-Xms512m -Xmx512m
- xpack.security.enabled=false
volumes:
- esdata:/usr/share/elasticsearch/data
jaeger-collector:
container_name: jaeger-collector
image: jaegertracing/jaeger-collector
ports:
- "14269:14269"
- "14268:14268"
- "14267:14267"
- "14250:14250"
- "9411:9411"
- "4317:4317"
networks:
- elastic-jaeger
restart: on-failure
environment:
- SPAN_STORAGE_TYPE=elasticsearch
command: [
"--es.server-urls=http://elasticsearch:9200",
"--es.num-shards=1",
"--es.num-replicas=0",
"--log-level=error"
]
depends_on:
- elasticsearch
jaeger-agent:
container_name: jaeger-agent
image: jaegertracing/jaeger-agent
hostname: jaeger-agent
command: [ "--reporter.grpc.host-port=jaeger-collector:14250" ]
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778"
networks:
- elastic-jaeger
restart: on-failure
environment:
- SPAN_STORAGE_TYPE=elasticsearch
depends_on:
- jaeger-collector
jaeger-query:
container_name: jaeger-query
image: jaegertracing/jaeger-query
environment:
- SPAN_STORAGE_TYPE=elasticsearch
- no_proxy=localhost
ports:
- "16686:16686"
- "16687:16687"
networks:
- elastic-jaeger
restart: on-failure
command: [
"--es.server-urls=http://elasticsearch:9200",
"--span-storage.type=elasticsearch",
"--log-level=debug"
]
depends_on:
- jaeger-agent
volumes:
esdata:
driver: local
networks:
elastic-jaeger:
driver: bridge
- Service Performance Monitoring (SPM) — Jaeger documentation (jaegertracing.io)
可以看到指标了
jaeger的 trace 展示与 grafana,signoz 不一致
出现异常,为修改相关代码,先前可以在Prometheus观测到traefik的指标【已修复】
Otel-collector 的 Pipeline
理解了整个 otel-collector 的 Pipeline 的流程和各个组件的功能
- spanmetrics 是一个 connector
- 它可以作为一个 receiver 【可以接收上游 trace pipeline 的 spanmetrics - 它作为一个 exporter】来开启一个metric 的 pipeline
- 它可以作为一个 exporter 【存储 trace pipeline 的 span 指标】
- spanmetrics 定义为 processer,可以在 trace 的 pipeline 中将 span的指标导出到 Prometheus 里
可以观测到 traefik、venus、profile 上报的 metrics!
明日待办
- 压测 jaeger
- 测试替换 jaeger 的数据库为 es