目录
聚合
1)案例需求:
2)需求分析
3)实现步骤:
准备工作
创建 flume1-logger-flume.conf
创建 flume2-netcat-flume.conf
创建 flume3-flume-logger.conf
执行配置文件
聚合
1)案例需求:
hadoop12
上的Flume-1
监控文件/opt/module/group.log
,hadoop13
上的Flume-2
监控某个端口的数据流,Flume-1
与Flume-2
将数据发送给hadoop14
上的Flume-3
,Flume-3
将最终数据打印到控制台。
2)需求分析
多数据源汇总案例
3)实现步骤:
-
准备工作
-
分发 Flume
-
[lzl@hadoop12 module]$ xsync flume xsync 是集群同步文件脚本,也就是在一台服务器分发文件给其他台服务器,脚本内容如下: #!/bin/bash #1. 判断参数个数 if [ $# -lt 1 ] then echo Not Enough Arguement! exit; fi #2. 遍历集群所有机器 for host in hadoop12 hadoop13 hadoop14 do echo ==================== $host ==================== #3. 遍历所有目录,挨个发送 for file in $@ do #4 判断文件是否存在 if [ -e $file ] then #5. 获取父目录 pdir=$(cd -P $(dirname $file); pwd) #6. 获取当前文件的名称 fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exists! fi done done
-
在
hadoop12
、hadoop13
以及hadoop14
的/opt/module/flume/job
目录下创建一个group3
文件夹。[lzl@hadoop12 job]$ mkdir group3 [lzl@hadoop13 job]$ mkdir group3 [lzl@hadoop14 job]$ mkdir group3
-
-
创建 flume1-logger-flume.conf
-
配置 Source 用于监控
/opt/module/group.log
文件,配置 Sink 输出数据到下一级 Flume。在
hadoop12
上编辑配置文件[lzl@hadoop12 group3]$ vim flume1-logger-flume.conf
添加如下内容
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /opt/module/group.log a1.sources.r1.shell = /bin/bash -c # Describe the sink a1.sinks.k1.type = avro a1.sinks.k1.hostname = hadoop14 a1.sinks.k1.port = 4141 # Describe the channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
-
创建 flume2-netcat-flume.conf
-
配置 Source 监控端口
44444
数据流,配置 Sink 数据到下一级 Flume。在
hadoop13
上编辑配置文件[lzl@hadoop12 group3]$ vim flume2-netcat-flume.conf
添加如下内容
# Name the components on this agent a2.sources = r1 a2.sinks = k1 a2.channels = c1 # Describe/configure the source a2.sources.r1.type = netcat a2.sources.r1.bind = hadoop13 a2.sources.r1.port = 44444 # Describe the sink a2.sinks.k1.type = avro a2.sinks.k1.hostname = hadoop14 a2.sinks.k1.port = 4141 # Use a channel which buffers events in memory a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a2.sources.r1.channels = c1 a2.sinks.k1.channel = c1
-
-
创建 flume3-flume-logger.conf
-
配置 source 用于接收
flume1
与flume2
发送过来的数据流,最终合并后 sink 到控制台。在
hadoop14
上编辑配置文件[lzl@hadoop14 group3]$ touch flume3-flume-logger.conf [lzl@hadoop14 group3]$ vim flume3-flume-logger.conf
添加如下内容
# Name the components on this agent a3.sources = r1 a3.sinks = k1 a3.channels = c1 # Describe/configure the source a3.sources.r1.type = avro a3.sources.r1.bind = hadoop14 a3.sources.r1.port = 4141 # Describe the sink a3.sinks.k1.type = logger # Describe the channel a3.channels.c1.type = memory a3.channels.c1.capacity = 1000 a3.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a3.sources.r1.channels = c1 a3.sinks.k1.channel = c1
-
-
执行配置文件
-
分别开启对应配置文件:
flume3-flume-logger.conf
,flume2-netcat-flume.conf
,flume1-logger-flume.conf
。[lzl@hadoop14 flume]$ bin/flume-ng agent --conf conf/ --name a3 --conf-file job/group3/flume3-flume-logger.conf -Dflume.root.logger=INFO,console [lzl@hadoop12 flume]$ bin/flume-ng agent --conf conf/ --name a2 --conf-file job/group3/flume1-logger-flume.conf [lzl@hadoop13 flume]$ bin/flume-ng agent --conf conf/ --name a1 --conf-file job/group3/flume2-netcat-flume.conf
-
-
在
hadoop13
上向/opt/module
目录下的group.log
追加内容[lzl@hadoop13 module]$ echo 'hello' >> group.log
-
在
hadoop12
上向44444
端口发送数据[lzl@hadoop12 flume]$ telnet hadoop13 44444
-
检查
hadoop14
上数据