Elasticearch数据流向

news2025/4/13 4:16:28

Elasticearch数据流向

数据流向图

Editor _ Mermaid Chart-2025-04-08-095354

--- config: layout: elk look: classic theme: mc --- flowchart LR subgraph s1["图例"] direction TB W["写入流程"] R["读取流程"] end A["Logstash Pipeline"] -- 写入请求 --> B["Elasticsearch协调节点"] B --> C["索引路由"] C -- 主分片写入 --> D["主分片"] D -- 副本同步 --> E["副本分片"] F["客户端"] -- 搜索请求 --> B B -- 广播查询 --> D & E D -- 返回结果 --> B E -- 返回结果 --> B B -- 聚合排序 --> F W:::writeLine R:::readLine classDef writeLine stroke:#4CAF50,stroke-width:2px,stroke-dasharray:5 classDef readLine stroke:#FF9800,stroke-width:2px linkStyle 0 stroke:#4CAF50,stroke-width:2px,stroke-dasharray:5,fill:none linkStyle 1 stroke:#4CAF50,stroke-width:2px,stroke-dasharray:5,fill:none linkStyle 2 stroke:#4CAF50,stroke-width:2px,stroke-dasharray:5,fill:none linkStyle 3 stroke:#4CAF50,stroke-width:2px,stroke-dasharray:5,fill:none linkStyle 4 stroke:#FF9800,stroke-width:2px,fill:none linkStyle 5 stroke:#FF9800,stroke-width:2px,fill:none linkStyle 6 stroke:#FF9800,stroke-width:2px,fill:none linkStyle 7 stroke:#FF9800,stroke-width:2px,fill:none linkStyle 8 stroke:#FF9800,stroke-width:2px,fill:none linkStyle 9 stroke:#FF9800,stroke-width:2px,fill:none

关键流程说明

1. 数据写入流程（Logstash → Elasticsearch）

步骤 1：Logstash 通过 output { elasticsearch { ... } } 插件发送数据到 Elasticsearch 协调节点。
步骤 2：协调节点根据索引名称和文档路由规则（默认 _id 哈希）将数据路由到对应的 主分片。
步骤 3：主分片写入成功后，自动将数据同步到所有 副本分片（高可用保障）。

2. 数据存储结构

索引：数据逻辑容器（如 logs-2023 表示 2023 年日志）。
分片：
- 主分片（Primary Shard）：负责数据写入和查询。
- 副本分片（Replica Shard）：主分片的副本，提供冗余和并行查询能力。

分片规则：

# 索引设置示例（3主分片+1副本）
PUT /logs-2023
{
  "settings": {
    "number_of_shards": 3,    # 主分片数量
    "number_of_replicas": 1   # 每个主分片的副本数
  }
}

3. 数据查询流程（客户端 → Elasticsearch）

步骤 1：客户端向协调节点发送搜索请求。
步骤 2：协调节点将请求广播到所有相关分片（主分片或副本分片）。
步骤 3：各分片执行本地搜索并返回结果。
步骤 4：协调节点聚合结果，排序后返回客户端。

配置参数关联

流程环节	相关配置	作用
Logstash输出	`output.elasticsearch.hosts`	指定Elasticsearch协调节点地址
索引路由	`document_id`（Logstash插件参数）	自定义文档路由规则（默认随机哈希）
分片同步	`index.number_of_replicas`	控制数据冗余和查询并发能力
搜索性能	`index.refresh_interval`（默认1秒）	控制近实时性（刷新间隔越短，写入可见越快）

场景示例（日志分析场景）

Logstash配置：

output {
  elasticsearch {
    hosts => ["es-node1:9200", "es-node2:9200"]
    index => "logs-%{+YYYY.MM.dd}"  # 按天分索引
    user => "elastic"
    password => "your_password"
  }
}