云原生日志Loki

1. Loki简介

1.1 Loki介绍

Loki是 Grafana Labs 团队最新的开源项目，是一个水平可扩展，高可用性，多租户的日志聚合系统。它的设计非常经济高效且易于操作，因为它不会为日志内容编制索引，而是为每个日志流编制一组标签。项目受 Prometheus 启发，官方的介绍就是：Like Prometheus, but for logs，类似于 Prometheus 的日志系统。

1.2 Loki 的主要特点

水平可扩展性：Loki 能够通过增加更多的节点来扩展其处理能力，以应对不断增长的数据量。
高可用性：设计上保证了系统的稳定性和可靠性，即使在部分组件发生故障时也能继续运行。
多租户支持：Loki 支持在同一个实例中为不同的用户提供隔离的日志数据存储和查询服务。
经济高效：与传统的日志系统相比，Loki 不会对日志内容进行全文索引，而是通过为每个日志流创建一组标签来组织数据，这大大减少了存储需求和提高了查询效率。
易于操作：Loki 的设计理念是简单易用，使得用户可以快速上手并有效地管理日志数据。

2. Loki部署

主机名	作用	系统	IPV4
master	采集端	CentOS 7.9	192.168.110.21/24
node-01	被采集端	CentOS 7.9	192.168.110.22/24
node-02	被采集端	CentOS 7.9	192.168.110.23/24

2.1 Loki主要组件

Loki 作为日志聚合系统的设计理念确实非常先进和实用，它通过以下三个主要组件来实现高效的日志管理：

Promtail
- Promtail 是 Loki 的日志收集器，它运行在每个主机上，负责从本地文件中收集日志数据。
- 它使用类似于 Prometheus 的服务发现机制来动态地发现日志文件。
- Promtail 将日志作为流发送到 Loki 服务器，这些流由一组标签定义，而不是通过全文索引。
Loki
- Loki 本身是一个高效的日志存储系统，它接收来自 Promtail 的日志数据，并将其存储在后端数据库中，如 BoltDB 或 Cassandra。
- Loki 提供了一个 RESTful API，允许用户查询和聚合日志数据。
- 它的设计允许快速查询和聚合，因为查询是基于标签的，而不是基于文本的全文搜索。
Grafana
- Grafana 是一个流行的开源数据可视化和监控平台，它与 Loki 紧密集成。
- 用户可以通过 Grafana 来创建仪表板，实时监控和分析 Loki 中的日志数据。
- Grafana 提供了丰富的图表和可视化选项，帮助用户更直观地理解日志数据。

2.2 Loki主要特性

Loki 与其他日志聚合系统相比确实具有一些独特的特性，这些特性使得 Loki 成为处理大规模日志数据的理想选择，特别是在云原生和容器化环境中。它的简单性、效率和成本效益使其在日志聚合领域中独树一帜。

不进行全文索引：
- Loki 不对日志内容进行全文索引，而是存储压缩的非结构化日志数据，并仅索引元数据。这种方法简化了操作，减少了存储和计算资源的需求，从而降低了成本。
使用标签记录流：
- Loki 采用了与 Prometheus 类似的标签系统，通过标签来索引和分组日志流。这种方法提高了日志数据的扩展性和操作效率，使得用户可以快速地查询和聚合具有特定标签的日志。
适合存储 Kubernetes Pod 日志：
- Loki 特别适合于存储和处理 Kubernetes 环境中的 Pod 日志。在 Kubernetes 中，Pod 是短暂的，而 Loki 的设计允许它高效地处理这些短暂生命周期的日志源。
- Kubernetes Pod 的标签等元数据可以自动被 Loki 索引，这使得用户能够利用这些标签来查询特定 Pod 或命名空间的日志。
水平可扩展性：
- Loki 设计为易于水平扩展，可以通过增加更多的节点来处理更多的日志数据，这使得它能够适应不断增长的数据量。
多租户支持：
- Loki 支持多租户部署，允许多个团队或项目在同一个 Loki 实例上独立地存储和查询日志，而不会相互干扰。
与 Grafana 的集成：
- Loki 与 Grafana 的紧密集成提供了强大的可视化和监控能力，使得用户可以轻松地创建仪表板和警报。
成本效益：
- 由于 Loki 的设计减少了对存储和计算资源的需求，它提供了一种成本效益更高的日志聚合解决方案，特别是在处理大规模日志数据时。
各日志收集组件简单对比

名称	安装的组件	优点
ELK/EFK	Elasticsearch, Logstash	支持自定义grok正则解析复杂日志内容- Dashboard支持丰富的可视化展示
	Kibana, Filebeat, Kafka/Redis
Loki	Grafana, Loki, Promtail	占用资源小- Grafana原生支持- 查询速度快

相关资料
- 官方地址：Grafana Loki OSS | Log aggregation system
- 文档地址：Configure the Loki data source | Grafana documentation
- git地址：https://github.com/grafana/loki/blob/master/docs/README.md
- 下载地址：https://github.com/grafana/loki/releases/

2.3 安装Loki

Grafana Loki 是一个日志聚合工具，它是功能齐全的日志堆栈的核心。
Loki 是一个为有效保存日志数据而优化的数据存储。日志数据的高效索引将 Loki 与其他日志系统区分开来，
与其他日志系统不同，Loki 索引是根据标签构建的，原始日志消息未编入索引。

 [root@master ~]# wget -c https://github.com/grafana/loki/releases/download/v2.9.8/loki-2.9.8.x86_64.rpm
 [root@master ~]# yum install loki-2.9.8.x86_64.rpm -y
 [root@master ~]# sed -i 's/127.0.0.1/192.168.110.21/' /etc/loki/config.yml  # 修改IP
 [root@master ~]# systemctl enable loki --now
 [root@master ~]# systemctl is-active loki.service 
 active
 [root@master ~]# ss -lnupt | egrep '3100|9096'
 tcp    LISTEN     0      128    [::]:9096               [::]:*                   users:(("loki",pid=11871,fd=9))
 tcp    LISTEN     0      128    [::]:3100               [::]:*                   users:(("loki",pid=11871,fd=8))

文件介绍

 [root@master ~]# rpm -ql loki
 /etc/loki/config.yml               # 配置文件
 /etc/systemd/system/loki.service   # 系统服务文件
 /usr/bin/loki                      # 二进制文件
 
 [root@master ~]# cat /etc/loki/config.yml 
 auth_enabled: false     # 是否启用鉴权
 
 server:
   http_listen_port: 3100  # http访问端口
   grpc_listen_port: 9096  # rpc访问端口
 
 common:
   instance_addr: 127.0.0.1  # 修改为自己的IP或localhost
   path_prefix: /tmp/loki
   storage:
     filesystem:
       chunks_directory: /tmp/loki/chunks # 记录块存储目录，默认chunks块上的日志数量或到期后，将chunks数据打标签后存储
       rules_directory: /tmp/loki/rules
   replication_factor: 1
   ring:
     kvstore:
       store: inmemory
 
 query_range:            # 查询规则
   results_cache:
     cache:
       embedded_cache:
         enabled: true
         max_size_mb: 100
 
 schema_config:
   configs:
     - from: 2020-10-24
       store: boltdb-shipper
       object_store: filesystem
       schema: v11
       index:
         prefix: index_
         period: 24h
 
 ruler:
   alertmanager_url: http://localhost:9093   # 告警通知url，修改为自己的IP
   
 # By default, Loki will send anonymous, but uniquely-identifiable usage and configuration
 # analytics to Grafana Labs. These statistics are sent to https://stats.grafana.org/
 #
 # Statistics help us better understand how Loki is used, and they show us performance
 # levels for most users. This helps us prioritize features and documentation.
 # For more information on what's sent, look at
 # https://github.com/grafana/loki/blob/main/pkg/usagestats/stats.go
 # Refer to the buildReport method to see what goes into a report.
 #
 # If you would like to disable reporting, uncomment the following lines:
 #analytics:
 #  reporting_enabled: false

2.4 安装Promtail

 [root@master ~]# wget -c https://github.com/grafana/loki/releases/download/v2.9.8/promtail-2.9.8.x86_64.rpm
 [root@master ~]# yum install promtail-2.9.8.x86_64.rpm -y
 [root@master ~]# setfacl -m u:promtail:r /var/log/messages  # 注意默认是没有读权限
 [root@master ~]# systemctl enable promtail --now
 [root@master ~]# systemctl is-active promtail
 active

测试
- 访问http://192.168.110.21:9080/targets

 [root@master ~]# cat /tmp/positions.yaml 
 positions:
   /var/log/messages: "1008852"

配置文件详解

 [root@master ~]# cat /etc/promtail/config.yml
 # This minimal config scrape only single log file.
 # Primarily used in rpm/deb packaging where promtail service can be started during system init process.
 # And too much scraping during init process can overload the complete system.
 # https://github.com/grafana/loki/issues/11398
 
 server:
   http_listen_port: 9080
   grpc_listen_port: 0
 
 positions:
   filename: /tmp/positions.yaml # 用于记录每次读取日志文件的索引行数，如：promtail重启后从该配置中恢复日志文件的读取位置
 
 clients:
 - url: http://localhost:3100/loki/api/v1/push  # 推送日志流到Loki中的api
 
 scrape_configs:     # 发现日志文件的位置并从中提取标签
 - job_name: system  # 任务名称
   static_configs:   # 目录配置
   - targets:        # 标签
       - localhost
     labels:
       job: varlogs  # 子任务名称，通常以项目命令
       #NOTE: Need to be modified to scrape any additional logs of the system.
       __path__: /var/log/messages  # 收集本地系统日志

2.5 安装命令行工具

 [root@master ~]# wget -c https://github.com/grafana/loki/releases/download/v2.9.8/logcli-2.9.8.x86_64.rpm
 [root@master ~]# yum install logcli-2.9.8.x86_64.rpm -y
 [root@master ~]# logcli --help
 --help：显示帮助信息。
 --version：显示应用程序版本。
 --quiet：抑制查询元数据。
 --stats：显示查询统计信息。
 --output：指定输出模式，可选 default, raw, jsonl。raw 模式会抑制日志标签和时间戳。
 --timezone：指定格式化输出时间戳时使用的时区，默认为本地时区或 UTC。
 --cpuprofile 和 --memprofile：指定 CPU 和内存分析文件的写入位置。
 --stdin：从标准输入读取日志输入。
 --addr：设置 Loki 服务器地址，默认为 http://localhost:3100。
 --username 和 --password：设置 HTTP 基本认证的用户名和密码。
 --ca-cert：设置服务器 CA 证书的路径。
 --tls-skip-verify：跳过 TLS 服务器证书验证。
 --cert 和 --key：设置客户端证书和密钥的路径。
 --org-id：添加 API 请求的租户 ID。
 --query-tags：添加 HTTP 请求头，用于跟踪查询统计。
 --bearer-token 和 --bearer-token-file：设置用于 API 请求的身份验证令牌。
 --retries：设置查询失败时的重试次数。
 --min-backoff 和 --max-backoff：设置重试之间的最小和最大回退时间。
 --auth-header：设置用于 API 请求的身份验证头部。
 --proxy-url：设置 HTTP 或 HTTPS 代理的 URL。

2.6 安装Granafa

 [root@master ~]# wget -c https://dl.grafana.com/enterprise/release/grafana-enterprise-10.0.2-1.x86_64.rpm
 [root@master ~]# yum install grafana-enterprise-10.0.2-1.x86_64.rpm -y
 [root@master ~]# systemctl enable --now grafana-server

访问： http://192.168.110.21:3000
- 用户名密码都为：admin

添加Loki数据源

Data sources ——> Add new data source ——> Loki

2.7 增加一个标签

 [root@master ~]# vim /etc/promtail/config.yml
 ...
 scrape_configs:
 - job_name: system
   static_configs:
   - targets:
       - localhost
     labels:
       job: varlogs
       #NOTE: Need to be modified to scrape any additional logs of the system.
       __path__: /var/log/messages
   - targets:              # 添加一个targets
       - localhost
     labels:
       job: varlogs-02
       __path__: /var/log/secure
 
 [root@master ~]# setfacl -m u:promtail:r /var/log/secure 
 [root@master ~]# systemctl restart promtail.service

添加监控指标

3. LogQL语法

在 Loki 中，查询日志数据时可以使用标签选择器和过滤表达式来精确地定位和检索所需的日志流。

3.1 标签选择器

标签选择器用于指定一组标签，以便从 Loki 中检索匹配特定标签的日志流。
选择器放在 {} 中，多个标签表达式用逗号分隔。

3.1.1 支持的符号

=：表示标签的值需要完全相同。
!=：表示标签的值需要不平等。
=~：表示标签的值需要与正则表达式匹配。
!~：表示标签的值不能与正则表达式匹配。

3.1.2 过滤表达式

在选择了日志流之后，可以使用搜索表达式进一步过滤结果。
搜索表达式可以是文本或正则表达式。

3.1.3 示例

{job="mysql"} |= "error"：选择 job 标签为 mysql 的日志流，并且日志行包含字符串 "error"。
{name="kafka"} |~ "tsdb-ops.*io:2003"：选择 name 标签为 kafka 的日志流，并且日志行匹配正则表达式 "tsdb-ops.*io:2003"。
{instance=~"kafka-[23]",name="kafka"} != kafka.server:type=ReplicaManager：选择 instance 标签匹配正则表达式 "kafka-[23]" 并且 name 标签为 kafka 的日志流，排除包含 kafka.server:type=ReplicaManager 的行。

3.1.4 支持的操作符

|=：日志行包含指定的字符串。
!=：日志行不包含指定的字符串。
|~：日志行匹配指定的正则表达式。
!~：日志行不匹配指定的正则表达式。

3.1.5 支持多个过滤

可以在同一查询中使用多个过滤条件，例如：
- {job="mysql"} |= "error" != "timeout"：选择 job 标签为 mysql 的日志流，日志行包含 "error" 但不包含 "timeout"。

4. Loki收集Nginx日志

为了方便grafana dashbord展示，我们把日志格式修改为json

4.1 修改Nginx配置

# 在被监控端上做
[root@node-01~]# vim /etc/nginx/nginx.conf   # 再http模块内添加
http {
...
log_format json escape=json '{'
          '"remote_addr": "$remote_addr", '
          '"request_uri": "$request_uri", '
          '"request_length": "$request_length", '
 		  '"request_time": "$request_time", '
          '"request_method": "$request_method", '
          '"status": "$status", '
          '"body_bytes_sent": "$body_bytes_sent", '
          '"http_referer": "$http_referer", '
          '"http_user_agent": "$http_user_agent", '
          '"http_x_forwarded_for": "$http_x_forwarded_for", '
          '"http_host": "$http_host", '
          '"server_name": "$server_name", '
          '"upstream": "$upstream_addr", '
          '"upstream_response_time": "$upstream_response_time", '
          '"upstream_status": "$upstream_status"'
          # '"geoip_country_code": "$geoip2_data_country_code", '
          # '"geoip_country_name": "$geoip2_data_country_name", '
          # '"geoip_city_name": "$geoip2_data_city_name"'
         '};';
access_log /var/log/nginx/json_access.log json;
}

[root@node-01 ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
[root@node-01 ~]# systemctl start nginx

访问Nginx

[root@node-01 ~]# tail /var/log/nginx/access.log 
192.168.110.23 - - [15/Jun/2024:19:45:37 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.23 - - [15/Jun/2024:19:45:38 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.23 - - [15/Jun/2024:19:45:38 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:42 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:42 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:44 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:44 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:44 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:45 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"
192.168.110.21 - - [15/Jun/2024:19:45:45 +0800] "GET / HTTP/1.1" 200 615 "-" "curl/7.29.0" "-"

4.2 安装Promtail

4.2.1 安装

[root@node-01 ~]# yum install promtail-2.9.8.x86_64.rpm -y

4.2.2 修改targets

[root@node-01 ~]# vim /etc/promtail/config.yml 
# This minimal config scrape only single log file.
# Primarily used in rpm/deb packaging where promtail service can be started during system init process.
# And too much scraping during init process can overload the complete system.
# https://github.com/grafana/loki/issues/11398

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
- url: http://192.168.110.21:3100/loki/api/v1/push    # 指定收集端

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: nginxlogs     # 修改标签
      host: 192.168.110.22
      #NOTE: Need to be modified to scrape any additional logs of the system.
      __path__: /var/log/nginx/*.log     # 采集的日志

4.2.3 授权启动

[root@node-01 ~]# setfacl -m u:promtail:r /var/log/nginx/*.log
[root@node-01 ~]# systemctl restart promtail

4.3 Granafa采集数据

4.4 添加Doshbarod

ID:16101