Apache SkyWalking 监控 Linux 实战

news2025/1/12 10:09:31

SkyWalking 从 8.4 版本开始支持监控主机,用户可以轻松从 dashboard 上检测可能的问题,例如当 CPU 使用过载、内存或磁盘空间不足或者当网络状态不健康时等。
与监控 MySQL Server 类似,SkyWalking 也是利用 Prometheus 和 OpenTelemetry 收集主机的 metrics 数据。
同时 SkyWalking 也提供了使用 InfluxDB Telegraf 通过 Telegraf receiver 接收主机的 metrics 数据,telegraf receiver 插件负责接收、处理和转换 metrics,
然后将转换后的数据发送给 SkyWalking MAL 处理。

方式一:Prometheus + OpenTelemetry,处理流程如下:
infrastructure-monitoring

  • Prometheus Node Exporter 从主机收集 metrics 数据.
  • OpenTelemetry Collector 通过 Prometheus Receiver 从 Node Exporters 抓取 metrics 数据, 然后将 metrics 推送的到 SkyWalking OAP Server.
  • SkyWalking OAP Server 通过 MAL 引擎去分析、计算、聚合和存储,处理规则位于 /config/otel-oc-rules/vm.yaml 文件.
  • 用户可以通过 SkyWalking WebUI dashboard 查看监控数据。

方式二:通过 Telegraf receiver 具体参考官网文档的部署方式 https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-vm-monitoring 。

下面让我们一起开始部署吧

部署 SkyWalking

之前的文章都是默认大家会部署 SkyWalking 的,这里我也提供下平时测试用的部署方式,我这里采用 docker compose 部署,主要是参考官网提供的部署配置,
SkyWalking 提供的部署基础配置在代码目录中,链接地址 https://github.com/apache/skywalking/tree/master/docker 。

docker-compose.yml
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:${ES_VERSION}
    container_name: elasticsearch
    ports:
      - "9200:9200"
    healthcheck:
      test: [ "CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1

  oap:
    image: ${OAP_IMAGE}
    container_name: oap
    depends_on:
      elasticsearch:
        condition: service_healthy
    links:
      - elasticsearch
    ports:
      - "11800:11800"
      - "12800:12800"
    healthcheck:
      test: [ "CMD-SHELL", "/skywalking/bin/swctl ch" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      SW_STORAGE: elasticsearch
      SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
      SW_HEALTH_CHECKER: default
      SW_TELEMETRY: prometheus
      JAVA_OPTS: "-Xms2048m -Xmx2048m"

  ui:
    image: ${UI_IMAGE}
    container_name: ui
    depends_on:
      oap:
        condition: service_healthy
    links:
      - oap
    ports:
      - "8080:8080"
    environment:
      SW_OAP_ADDRESS: http://oap:12800
      SW_ZIPKIN_ADDRESS: http://oap:9412

.env

用于镜像版本,这里使用 SkyWalking 9.7.0 版本,也是当前最新的版本

# The docker-compose.yml file is meant to be used locally for testing only after a local build, if you want to use it
# with officially released Docker images, please modify the environment variables on your command line interface.
# i.e.:
# export OAP_IMAGE=apache/skywalking-oap-server:<tag>
# export UI_IMAGE=apache/skywalking-ui:<tag>
# docker compose up

ES_VERSION=7.4.2
OAP_IMAGE=apache/skywalking-oap-server:9.7.0
UI_IMAGE=apache/skywalking-ui:9.7.0

启动 SkyWalking

docker compose up

启动完成后,访问 http://IP:8080 就可以正常打开 dashboard 页面了,当然这个时候还看不到监控数据。

部署 Prometheus node-exporter

node-exporter 官方文档 ,下载地址 https://prometheus.io/download/#node_exporter
这里下载 1.7.0 版本

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz
cd node_exporter-1.7.0.linux-amd64
./node_exporter

启动成功后访问 http://IP:9100/metrics 可以看到采集到的 metrics 信息。
node-exporter-metrics

部署 OpenTelemetry Collector

OpenTelemetry Collector 官方文档
这里将 otel-collector 和 skywalking 一起通过 docker compose 部署,完整配置如下:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: '3.8'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:${ES_VERSION}
    container_name: elasticsearch
    ports:
      - "9200:9200"
    healthcheck:
      test: [ "CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1

  oap:
    image: ${OAP_IMAGE}
    container_name: oap
    depends_on:
      elasticsearch:
        condition: service_healthy
    links:
      - elasticsearch
    ports:
      - "11800:11800"
      - "12800:12800"
    healthcheck:
      test: [ "CMD-SHELL", "/skywalking/bin/swctl ch" ]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    environment:
      SW_STORAGE: elasticsearch
      SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
      SW_HEALTH_CHECKER: default
      SW_TELEMETRY: prometheus
      JAVA_OPTS: "-Xms2048m -Xmx2048m"

  ui:
    image: ${UI_IMAGE}
    container_name: ui
    depends_on:
      oap:
        condition: service_healthy
    links:
      - oap
    ports:
      - "8080:8080"
    environment:
      SW_OAP_ADDRESS: http://oap:12800
      SW_ZIPKIN_ADDRESS: http://oap:9412

  otel-collector:
    image: otel/opentelemetry-collector:0.50.0
    container_name: otel-collector
    command: [ "--config=/etc/otel-collector-config.yaml" ]
    volumes:
      - /opt/docker/skywalking/otel-collector-config.yaml:/etc/otel-collector-config.yaml
    expose:
      - 55678

在 /opt/docker/skywalking 目录(也可以自定义)创建配置文件 otel-collector-config.yaml

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: "vm-monitoring" # make sure to use this in the vm.yaml to filter only VM metrics
          scrape_interval: 10s
          static_configs:
            - targets: ["192.168.56.102:9100"] # OpenTelemetry Collector 的地址,根据自己的服务器IP进行调整

processors:
  batch:

exporters:
  otlp:
    endpoint: "oap:11800" # The OAP Server address
    # The config format of OTEL version prior to 0.34.0, eg. 0.29.0, should be:
    # insecure: true
    tls:
      insecure: true
    #insecure: true
  # Exports data to the console
  logging:
    loglevel: debug

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [otlp, logging]

docker compose 重新启动

docker compose restart

访问 SkyWalking dashboard 页面,正常情况下就可以看到监控到的数据了。linux-os-monitor
linux-os-monitor-metrics

相关链接

  • https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-vm-monitoring/
  • https://skywalking.apache.org/blog/2021-02-07-infrastructure-monitoring/
  • https://prometheus.io/docs/guides/node-exporter/
  • https://opentelemetry.io/docs/collector/

更多精彩内容请关注公众号 geekymv,喜欢请分享给更多的朋友哦」如有问题,欢迎交流。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1511571.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

数据结构-链表(一)

一、链表简介 链表&#xff08;Linked List&#xff09;是一种常见的数据结构&#xff0c;用于存储和组织数据。与数组不同&#xff0c;链表的元素&#xff08;节点&#xff09;在内存中不必连续存储&#xff0c;而是通过指针链接在一起。 链表由多个节点组成&#xff0c;每个…

C++day2——引用、结构体、类

思维导图&#xff1a; 2、自己封装一个矩形类(Rect)&#xff0c; 拥有私有属性&#xff1a;宽度(width)、高度(height)&#xff0c; 定义公有成员函数初始化函数:void init(int w, int h) 更改宽度的函数:set_w(int w)更改高度的函数:set_h(int h) 输出该矩形的周长和面积函…

逆向案例七——中国天气质量参数搜不到加密,以及应对禁止打开开发者工具和反debuger技巧

进入相关城市数据页面&#xff0c;发现不能调试 应对方法&#xff0c;再另一个页面&#xff0c;打开开发者工具&#xff0c;选择取消停靠到单独页面 接着&#xff0c;复制链接在该页面打开。接着会遇到debugger 再debugger处打上断点&#xff0c;一律不在此处暂停。 然后点击继…

数据结构中的堆(Java)

文章目录 把普通数组转换大顶堆数组堆增删改查替换堆排序 把普通数组转换大顶堆数组 该方式适用索引为0起点的堆 在堆&#xff08;Heap&#xff09;这种数据结构中&#xff0c;节点被分为两类&#xff1a;叶子节点&#xff08;Leaf Nodes&#xff09;和非叶子节点&#xff08;N…

springboot的Converter和HttpMessageConveter

Converter和HttpMessageConveter是springboot和springmvc在处理请求的时候需要用到的。但是这两者的完全是不一样的&#xff0c;作用的地方也不一样。 1&#xff0c;springboot和springmvc处理请求的流程 先来回顾一下处理请求的流程&#xff1a; 用户向服务器发送请求&#…

云原生应用(2)之使用容器运行Nginx应用及Docker命令

一、使用Docker容器运行Nginx 1.1 使用docker run命令运行Nginx应用 1.1.1 观察下载容器镜像过程 查找本地容器镜像文件&#xff1b; 执行命令过程一&#xff1a;下载容器镜像 # docker run -d nginx:latest Unable to find image nginx:latest locally latest: Pulling from…

软考72-上午题-【面向对象技术2-UML】-UML中的图3

一、状态图 1-1、状态图的定义 状态图&#xff0c;展现了一个状态机&#xff0c;由&#xff1a;状态、转换、事件和活动组成&#xff0c;是系统的动态视图。 活动(动作) 可以在状态内执行也可以在状态转换(迁移) 时执行。 状态图强调&#xff1a;行为的事件顺序。 1-2、状态图…

【ollama】(4):在autodl中安装ollama工具,配置环境变量,修改端口,使用RTX 3080 Ti显卡,测试coder代码生成大模型

1&#xff0c;ollama项目 Ollama 是一个强大的框架&#xff0c;设计用于在 Docker 容器中部署 LLM。Ollama 的主要功能是在 Docker 容器内部署和管理 LLM 的促进者&#xff0c;它使该过程变得非常简单。它帮助用户快速在本地运行大模型&#xff0c;通过简单的安装指令&#xf…

【考研数学】660/880/1000/1800 使用手册

开门见山&#xff0c;直接介绍几个热门的习题册 660&#xff1a;660表面上叫基础通关660&#xff0c;但实际上很多题的难度并不适合基础阶段&#xff0c;建议在强化阶段搭配着 严选题做660&#xff0c;对提升做小题的速度和能力非常有帮助。 880&#xff1a;题量适中&#xf…

20240312-1-Graph(图)

Graph(图) 在面试的过程中,一般不会考到图相关的问题,因为图相关的问题难,而且描述起来很麻烦. 但是也会问道一下常见的问题,比如,最短路径,最小支撑树,拓扑排序都被问到过. 图常用的表示方法有两种: 分别是邻接矩阵和邻接表. 邻接矩阵是不错的一种图存储结构,对于边数相对顶点…

MooC下载pdf转为ppt后去除水印方法

1、从MooC下载的课件&#xff08;一般为pdf文件&#xff09;可能带有水印&#xff0c;如下图所示&#xff1a; 2、将pdf版课件转为ppt后&#xff0c;同样带有水印&#xff0c;如下图所示&#xff1a; 3、传统从pdf中去除水印方法不通用&#xff0c;未找到有效去除课件pdf方法…

c 语言中指针注意事项

看看下面两个 #include<iostream> using namespace std;int main() {int a 10;char p[6];*((int *)p) *(& a); // 正确写法*p *(&a); // 错误写法cout << *(int*)p; } 把原因写在评论区

飞塔防火墙开局百篇——002.FortiGate上网配置——在路由模式下使用虚拟接口对(virtual-wire-pair)

在路由模式下使用虚拟接口对&#xff08;virtual-wire-pair&#xff09; 拓扑配置接口配置策略 使用方有透明模式下一进一出的这样需求的组网&#xff0c;可以在路由模式下使用虚拟接口对&#xff08;virtual-wire-pair&#xff09;替代。 登陆FortiGate防火墙界面&#xff0c;…

01 THU大模型之基础入门

1. NLP Basics Distributed Word Representation词表示 Word representation: a process that transform the symbols to the machine understandable meanings 1.1 How to represent the meaning so that the machine can understand Compute word similarity 计算词相似度 …

中间件 | RabbitMq - [AMQP 模型]

INDEX 1 全局示意2 依赖 1 全局示意 AMQP&#xff0c;即高级消息队列协议&#xff08;Advanced Message Queuing Protocol&#xff09;&#xff0c;整体架构如下图 producer 发送消息给 rabbit mq brokerrabbit mq broker 分发消息给 consumer消费producer/consumer 都通过 …

Python算法题集_搜索旋转排序数组

Python算法题集_搜索旋转排序数组 题33&#xff1a;搜索旋转排序数组1. 示例说明2. 题目解析- 题意分解- 优化思路- 测量工具 3. 代码展开1) 标准求解【二分法区间判断】2) 改进版一【二分找分界标准二分法】3) 改进版二【递归实现二分法】 4. 最优算法5. 相关资源 本文为Pytho…

Android APK体积优化指南:清理项目,打造更小的APK、更快的构建速度和更好的开发体验

Android APK体积优化指南&#xff1a;清理项目&#xff0c;打造更小的APK、更快的构建速度和更好的开发体验 在任何软件项目中&#xff0c;开发是一个持续的过程&#xff0c;随着时间的推移&#xff0c;代码库会变得越来越复杂。这种复杂性可能导致构建时间变慢、APK体积变大&…

DayDreamInGIS 之 ArcGIS Pro二次开发 锐角检查

功能&#xff1a;检查图斑中所有的夹角&#xff0c;如果为锐角&#xff0c;在单独的标记图层中标记。生成的结果放在默认gdb中&#xff0c;以 图层名_锐角检查 的方式命名 大体实现方式&#xff1a;遍历图层中的所有要素&#xff08;多部件要素分别处理&#xff09;&#xff0…

Redis核心数据结构之压缩列表(二)

压缩列表 压缩列表节点的构成 encoding 节点的encoding属性记录了节点的content属性所保存数据的类型及长度: 1.一字节、两字节或者五字节长&#xff0c;值得最高位为00、01或者10的是字节数组编码:这种编码表示节点的content属性保存着字节数组&#xff0c;数组的长度由编…

MachineSink - 优化阅读笔记

注&#xff1a;该优化与全局子表达式消除刚好是相反的过程&#xff0c;具体该不该做这个优化得看代价模型算出来的结果(有采样文件指导算得会更准确) 该优化过程将指令移动到后继基本块中&#xff0c;以便它们不会在不需要其结果的路径上执行。 该优化过程并非旨在替代或完全…