余老师带你学习大数据框架全栈第十二章Flume第十二节企业开发案例之聚合

聚合

企业开发案例3

实验目的:
app-11 上的 Flume-1 监控文件
/hadoop/test/group.log，app-12 上的 Flume-2 监控某一个端口的数据流，Flume-1与 Flume-2 将数据发送给 app-13 上的 Flume-3，Flume-3 将最终数据打印到控制台
实验分析:
在这里插入图片描述

实验步骤：
一、实验前准备
1.新建FlumeSqoopC1,C2,C3
在这里插入图片描述

进入C1,
一)进行三台机器认证
切换到root用户。
命令：sudo /bin/bash
在这里插入图片描述
进入hadoop目录下并查看有哪些文件夹。
命令：cd /hadoop/
运行initHost.sh脚本，进行三台机器的认证：./initHosts.sh
确保我们的三台机器是running状态。
命令：./initHosts.sh

二)启动集群
1切换到hadoop用户，（密码Yhf_1018 ）
命令：su – hadoop
在这里插入图片描述 2切换到hadoop根目录下
命令：cd /hadoop/
3启动startAll.sh
命令：./startAll.sh
这个脚本里包含这三台机器所有的启动命令
三)
1.新建test文件夹，切换到此文件夹下新建group.log文件。
命令：mkdir test
cd test
touch group.log
在这里插入图片描述
2.切换到 Flume/apache-flume-1.9.0-bin/目录下，新建job文件夹
命令：cd ..
cd Flume/apache-flume-1.9.0-bin/
mkdir job

3.在app-12,app-13的Flume/apache-flume-1.9.0-bin/目录下，新建job文件夹
命令：ssh hadoop@app-12 "cd /hadoop/Flume/apache-flume-1.9.0-bin/ && mkdir job"
ssh hadoop@app-13 "cd /hadoop/Flume/apache-flume-1.9.0-bin/ && mkdir job”
在这里插入图片描述

二、开始实验
app-11
1.切换到job目录下
命令：cd job
在这里插入图片描述
2.创建 f1.conf配置文件
配置 Source 用于监控group.log 文件，配置 Sink 输出数据到下一级 Flume。
命令：vi f1.conf
输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /hadoop/test/group.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = app-13
a1.sinks.k1.port = 4141
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.安装nc工具
命令：sudo yum install -y nc
在这里插入图片描述
安装成功。
app-12
4.免密登录app-12
命令：ssh app-12

切换到/hadoop/Flume/apache-flume-1.9.0-bin/job目录下，创建 f2.conf配置文件
命令：cd /hadoop/Flume/apache-flume-1.9.0-bin/job
vi f2.conf

配置 Source 监控端口 44444 数据流，配置 Sink 数据到下一级 Flume
输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = app-12
a2.sources.r1.port = 44444
# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname =app-13
a2.sinks.k1.port = 4141
# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

点击Ecs退出编辑，:wq保存退出
5.安装nc工具
命令：
sudo yum install -y nc

app-13
6.免密登录app-13
命令：ssh app-13
切换到/hadoop/Flume/apache-flume-1.9.0-bin/job目录下，创建 f2.conf配置文件
配置 source 用于接收 flume1 与 flume2 发送过来的数据流，最终合并后 sink 到控制台。
命令：cd /hadoop/Flume/apache-flume-1.9.0-bin/job
vi f3.conf
在这里插入图片描述

输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1
# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = app-13
a3.sources.r1.port = 4141
# Describe the sink
a3.sinks.k1.type = logger
# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

点击Ecs退出编辑，:wq保存退出
7.安装nc工具
命令：sudo yum install -y nc

8.执行配置文件
在app-11,app-12,app-13中的cd /hadoop/Flume/apache-flume-1.9.0-bin/目录下，分别开启对应配置文件f1.conf、 f2.conf、f3.conf
在这里插入图片描述

app-13:
命令：flume-ng agent --name a3 --conf-file job/f3.conf -Dflume.root.logger=INFO,console
app-12:
命令：flume-ng agent --name a2 --conf-file job/f2.conf
app-11:
命令：flume-ng agent --name a1 --conf-file job/f1.conf
9.重新开一个终端，登录app-12
命令：ssh app-12
上向 44444 端口发送数据
命令：nc app-12 44444
输入bcd,回车
在这里插入图片描述

10.在 app-11上/hadoop/test 目录下的 group.log 追加内容
命令：cd test
echo abc >> group.log
在这里插入图片描述

11.检查app-13 上的数据
在这里插入图片描述
详细学习内容可观看Spark快速大数据处理扫一扫~~~或者引擎搜索Spark余海峰

余老师带你学习大数据框架全栈第十二章Flume第十二节企业开发案例之聚合

聚合

猜你喜欢