Apache Storm 入门教程

 

1、了解Storm

1.1、什么是Storm?

疑问:已经有了Hadoop,为什么还要有Storm?

官网:http://storm.apache.org/

源码:https://github.com/apache/storm

  • Storm是一个开源免费的分布式实时计算系统,Storm可以轻松的处理无界的数据流。

  • Storm有许多用例:实时分析、在线机器学习、连续计算、分布式RPC、ETL等等。Storm很快:每个节点每秒处理超过一百万个条消息。Storm是可扩展的、容错的,保证您的数据将被处理,并且易于设置和操作。

  • Storm只负责数据的计算,不负责数据的存储。

  • 2013年前后,阿里巴巴基于storm框架,使用java语言开发了类似的流式计算框架佳作,Jstorm。2016年年底阿里巴巴将源码贡献给了Apache storm,两个项目开始合并,新的项目名字叫做storm2.x。阿里巴巴团队专注flink开发。

扫描二维码关注公众号,回复: 3748131 查看本文章

1.2、流式计算的架构

 

2、Storm架构

2.1、Storm的核心技术组成

  • Topology(拓扑)

    • 一个拓扑是一个图的计算。用户在一个拓扑的每个节点包含处理逻辑,节点之间的连接显示数据应该如何在节点间传递。Topology的运行时很简单的。

  • Stream(流)

    • 流是Storm的核心抽象。一个流是一个无界Tuple序列,Tuple可以包含整型、长整型、短整型、字节、字符、双精度数、浮点数、布尔值和字节数组。用户可以通过自定义序列化器,在本机Tuple使用自定义类型。

  • Spout(喷口)

    • Spout是Topology流的来源。一般Spout从外部来源读取Tuple,提交到Topology(如Kestrel队列或Twitter API)。Spout可以分为可靠的和不可靠的两种模式。Spout可以发出超过一个流。

  • Bolt(螺栓)

    • Topology中的所有数据的处理都在Bolt中完成。Bolt可以完成数据过滤、业务处理、连接运算、连接、访问数据库等操作。Bolt可以做简单的流转换,发出超过一个流,主要方法是execute方法。完全可以在Bolt中启动新的线程做异步处理。

  • Stream grouping(流分组)

    • 流分组在Bolt的任务中定义流应该如何分区。

  • Task(任务)

    • 每个Spout或Bolt在集群中执行许多任务。每个任务对应一个线程的执行,流分组定义如何从一个任务集到另一个任务集发送Tuple。

  • worker(工作进程)

    • Topology跨一个或多个Worker节点的进程执行。每个Worker节点的进程是一个物理的JVM和Topology执行所有任务的一个子集。

2.2、Storm应用的编程模型

需要我们知道的是:

  • Spout是数据的来源;

  • Bolt是执行具体业务逻辑;

  • 数据的流向,是可以任意组合的;

  • 一个Topology是由若干个Spout、Bolt组成。

2.3、集群架构

  • Nimbus:负责资源分配和任务调度。

  • Supervisor:负责接受nimbus分配的任务,启动和停止属于自己管理的worker进程。

  • Worker:运行具体处理组件逻辑的进程。

  • Task:worker中每一个spout/bolt的线程称为一个task. 在storm0.8之后,task不再与物理线程对应,同一个spout/bolt的task可能会共享一个物理线程,该线程称为executor。

架构说明:

  1. 在集群架构中,用户提交到任务到storm,交由nimbus处理。

  2. nimbus通过zookeeper进行查找supervisor的情况,然后选择supervisor进行执行任务。

  3. supervisor会启动一个woker进程,在worker进程中启动线程进行执行具体的业务逻辑。

2.4、开发环境与生产环境

在开发Storm应用时,会面临着2套环境,一是开发环境,另一个是生产环境也是集群环境。

  • 开发环境无需搭建集群,Storm已经为开发环境做了模拟支持,可以让开发人员非常轻松的在本地运行Storm应用,无需安装部分任何的环境。

  • 集群环境,需要在linux机器上进行部署,然后将开发好的jar包,部署到集群中才能运行,类似于hadoop中的MapReduce程序的运行。

3、Storm快速入门

3.1、需求分析

Topology的设计:

说明:

  • RandomSentenceSpout:随机生成一个英文的字符串,模拟用户的输入;

  • SplitSentenceBolt:将接收到的句子按照空格进行分割;

  • WordCountBolt:负责将接收到上游的单词对出现的次数进行统计;

  • PrintBolt:负责将接收到的数据打印出来;

3.2、创建工程,导入依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>itcast-bigdata</artifactId>
        <groupId>cn.itcast.bigdata</groupId>
        <version>1.0.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>
​
    <artifactId>itcast-bigdata-storm</artifactId>
​
    <dependencies>
        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>1.1.1</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
    
</project>

3.3、编写RandomSentenceSpout

package cn.itcast.storm;
​
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
​
import java.util.Map;
import java.util.Random;
​
/**
 * Spout类需要继承BaseRichSpout抽象类实现
 */
public class RandomSentenceSpout extends BaseRichSpout {
​
    private SpoutOutputCollector collector;
​
    private String[] sentences = new String[]{"the cow jumped over the moon", "an apple a day keeps the doctor away",
            "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature"};
​
    /**
     * 初始化的一些操作放到这里
     *
     * @param conf      配置信息
     * @param context   应用的上下文
     * @param collector 向下游输出数据的收集器
     */
    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
        this.collector = collector;
    }
​
    /**
     * 处理业务逻辑,在最后向下游输出数据
     */
    public void nextTuple() {
        //随机生成句子
        String sentence = this.sentences[new Random().nextInt(sentences.length)];
        System.out.println("生成的句子为 --> " + sentence);
        //向下游输出
        this.collector.emit(new Values(sentence));
    }
​
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        //定义向下游输出的名称
        declarer.declare(new Fields("sentence"));
    }
​
}
​

3.4、编写SplitSentenceBolt

package cn.itcast.storm;
​
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
​
import java.util.Map;
​
/**
 * 实现Bolt,需要继承BaseRichBolt
 */
public class SplitSentenceBolt extends BaseRichBolt{
​
    private OutputCollector collector;
​
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }
​
    public void execute(Tuple input) {
        // 通过Tuple的getValueByField获取上游传递的数据,其中"sentence"是定义的字段名称
        String sentence = input.getStringByField("sentence");
​
        // 进行分割处理
        String[] words = sentence.split(" ");
​
        // 向下游输出数据
        for (String word : words) {
            this.collector.emit(new Values(word));
        }
    }
​
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));
    }
}
​

3.5、编写WordCountBolt

package cn.itcast.storm;
​
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
​
import java.util.HashMap;
import java.util.Map;
​
public class WordCountBolt extends BaseRichBolt {
​
    private Map<String, Integer> wordMaps = new HashMap<String, Integer>();
​
    private OutputCollector collector;
​
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        this.collector = collector;
    }
​
    public void execute(Tuple input) {
        String word = input.getStringByField("word");
        Integer count = this.wordMaps.get(word);
        if (null == count) {
            count = 0;
        }
        count++;
        this.wordMaps.put(word, count);
​
        // 向下游输出数据,注意这里输出的多个字段数据
        this.collector.emit(new Values(word, count));
    }
​
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word", "count"));
​
    }
}
​

3.6、编写PrintBolt

package cn.itcast.storm;
​
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Tuple;
​
import java.util.Map;
​
public class PrintBolt extends BaseRichBolt {
​
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
    }
​
    public void execute(Tuple input) {
        String word = input.getStringByField("word");
        Integer count = input.getIntegerByField("count");
​
        // 打印上游传递的数据
        System.out.println(word + " : " + count);
​
        // 注意:这里不需要再向下游传递数据了,因为没有下游了
    }
​
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
​
    }
}
​

3.7、编写WordCountTopology

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout());
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt()).shuffleGrouping("RandomSentenceSpout");
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt()).shuffleGrouping("SplitSentenceBolt");
        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
        LocalCluster localCluster = new LocalCluster();
        Config config = new Config();
        localCluster.submitTopology("WordCountTopology", config, topology);
​
    }
}
​

3.8、测试

生成的句子为 --> i am at two with nature
i : 1
am : 1
at : 1
two : 1
with : 1
nature : 1
生成的句子为 --> the cow jumped over the moon
the : 1
cow : 1
jumped : 1
over : 1
the : 2
moon : 1
生成的句子为 --> an apple a day keeps the doctor away
an : 1
apple : 1
a : 1
day : 1
keeps : 1
the : 3
doctor : 1
away : 1

至此,一个简单的Storm应用就编写完成了。

4、集群模式

编写完的Storm的Topology应用最终需要提交到集群运行的,所以需要先部署Storm集群环境。

4.1、集群机器的分配情况

主机名 IP地址 zookeeper nimbus supervisor
node01 192.168.40.133  
node02 192.168.40.134  
node03 192.168.40.135  

注意:storm集群依赖于zookeeper,所以要先保证zookeeper集群的正确运行。

4.2、搭建Storm集群环境

cd /export/software/
rz 上传apache-storm-1.1.1.tar.gz
tar -xvf apache-storm-1.1.1.tar.gz -C /export/servers/
cd /export/servers/
mv apache-storm-1.1.1/ storm
#配置环境变量
export STORM_HOME=/export/servers/storm
export PATH=${STORM_HOME}/bin:$PATH
source /etc/profile
​
​

修改配置文件:

cd /export/servers/storm/conf/
vim storm.yaml
​
#指定zookeeper服务的地址
storm.zookeeper.servers:
     - "node01"
     - "node02"
     - "node03"
​
#指定nimbus所在的机器
nimbus.seeds: ["node01"]
​
#指定ui管理界面的端口
ui.port: 18080
​
#保存退出

分发到node02、node03上。

scp -r /export/servers/storm/ node02:/export/servers/
scp -r /export/servers/storm/ node03:/export/servers/
​
scp /etc/profile node02:/etc/
source /etc/profile #在node02上执行
scp /etc/profile node03:/etc/
source /etc/profile #在node03上执行
​

在node01上启动nimbus和ui,node02、node03上启动supervisor。

node01:

nohup storm nimbus > /dev/null 2>&1 &
nohup storm ui > /dev/null 2>&1 &
​
#logviewer用于在线查看日志文件
nohup storm logviewer > /dev/null 2>&1 &

node02:

nohup storm supervisor > /dev/null 2>&1 &
nohup storm logviewer > /dev/null 2>&1 &

node03:

nohup storm supervisor > /dev/null 2>&1 &
nohup storm logviewer > /dev/null 2>&1 &

4.3、检查集群是否正常运行

打开浏览器,访问地址:http://node01:18080/index.html

在线查看日志:

至此,storm的集群搭建完毕。

5、提交Topology到集群

5.1、修改Topology的提交代码

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout());
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt()).shuffleGrouping("RandomSentenceSpout");
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt()).shuffleGrouping("SplitSentenceBolt");
        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
​
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
//        LocalCluster localCluster = new LocalCluster();
//        localCluster.submitTopology("WordCountTopology", config, topology);
​
        try {
            //提交到集群
            StormSubmitter.submitTopology("WordCountTopology", config, topology);
        } catch (AlreadyAliveException e) {
            e.printStackTrace();
        } catch (InvalidTopologyException e) {
            e.printStackTrace();
        } catch (AuthorizationException e) {
            e.printStackTrace();
        }
​
    }
}
​

5.2、项目打包

打包成功。

5.3、上传到服务器

cd /tmp
rz上传itcast-bigdata-storm-1.0.0-SNAPSHOT.jar

5.4、提交Topology到集群

#通过storm jar命令提交jar,并且需要指定运行的入口类
storm jar itcast-bigdata-storm-1.0.0-SNAPSHOT.jar cn.itcast.storm.WordCountTopology

提交过程如下:

Running: /export/software/jdk1.8.0_141/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/export/servers/storm -Dstorm.log.dir=/export/servers/storm/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /export/servers/storm/lib/asm-5.0.3.jar:/export/servers/storm/lib/objenesis-2.1.jar:/export/servers/storm/lib/log4j-core-2.8.2.jar:/export/servers/storm/lib/reflectasm-1.10.1.jar:/export/servers/storm/lib/storm-rename-hack-1.1.1.jar:/export/servers/storm/lib/kryo-3.0.3.jar:/export/servers/storm/lib/log4j-over-slf4j-1.6.6.jar:/export/servers/storm/lib/slf4j-api-1.7.21.jar:/export/servers/storm/lib/servlet-api-2.5.jar:/export/servers/storm/lib/clojure-1.7.0.jar:/export/servers/storm/lib/log4j-slf4j-impl-2.8.2.jar:/export/servers/storm/lib/log4j-api-2.8.2.jar:/export/servers/storm/lib/disruptor-3.3.2.jar:/export/servers/storm/lib/storm-core-1.1.1.jar:/export/servers/storm/lib/minlog-1.3.0.jar:/export/servers/storm/lib/ring-cors-0.1.5.jar:itcast-bigdata-storm-1.0.0-SNAPSHOT.jar:/export/servers/storm/conf:/export/servers/storm/bin -Dstorm.jar=itcast-bigdata-storm-1.0.0-SNAPSHOT.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} cn.itcast.storm.WordCountTopology
1197 [main] WARN  o.a.s.u.Utils - STORM-VERSION new 1.1.1 old null
1248 [main] INFO  o.a.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -6891877266277720388:-8731485235457199991
1412 [main] INFO  o.a.s.u.NimbusClient - Found leader nimbus : node01:6627
1539 [main] INFO  o.a.s.s.a.AuthUtils - Got AutoCreds []
1564 [main] INFO  o.a.s.u.NimbusClient - Found leader nimbus : node01:6627
1644 [main] INFO  o.a.s.StormSubmitter - Uploading dependencies - jars...
1651 [main] INFO  o.a.s.StormSubmitter - Uploading dependencies - artifacts...
1651 [main] INFO  o.a.s.StormSubmitter - Dependency Blob keys - jars : [] / artifacts : []
1698 [main] INFO  o.a.s.StormSubmitter - Uploading topology jar itcast-bigdata-storm-1.0.0-SNAPSHOT.jar to assigned location: /export/servers/storm/storm-local/nimbus/inbox/stormjar-d80d9d68-4257-4b69-b179-7ffff28134e5.jar
1742 [main] INFO  o.a.s.StormSubmitter - Successfully uploaded topology jar to assigned location: /export/servers/storm/storm-local/nimbus/inbox/stormjar-d80d9d68-4257-4b69-b179-7ffff28134e5.jar
1742 [main] INFO  o.a.s.StormSubmitter - Submitting topology WordCountTopology in distributed mode with conf {"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-6891877266277720388:-8731485235457199991"}
1742 [main] WARN  o.a.s.u.Utils - STORM-VERSION new 1.1.1 old 1.1.1
2553 [main] INFO  o.a.s.StormSubmitter - Finished submitting topology: WordCountTopology

可以看到在界面中已经存在Topology的信息。

提示:可以点击Topology的名称查看详情。

5.5、查看运行结果

通过界面管理工具可以看到,该任务被分配到了node02上:

进入node02机器的logs目录:/export/servers/storm/logs/workers-artifacts/WordCountTopology-1-1531816634/6700

tail -f worker.log
​
2018-07-17 16:48:06.401 STDIO Thread-4-RandomSentenceSpout-executor[2 2] [INFO] 生成的句子为 --> the cow jumped over the moon
2018-07-17 16:48:06.415 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] the : 2507
2018-07-17 16:48:06.415 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] cow : 642
2018-07-17 16:48:06.415 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] jumped : 642
2018-07-17 16:48:06.415 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] over : 642
2018-07-17 16:48:06.416 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] the : 2508
2018-07-17 16:48:06.417 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] moon : 642
2018-07-17 16:48:06.602 STDIO Thread-4-RandomSentenceSpout-executor[2 2] [INFO] 生成的句子为 --> i am at two with nature
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] i : 625
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] am : 625
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] at : 625
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] two : 625
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] with : 625
2018-07-17 16:48:06.615 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] nature : 625
2018-07-17 16:48:06.803 STDIO Thread-4-RandomSentenceSpout-executor[2 2] [INFO] 生成的句子为 --> an apple a day keeps the doctor away
2018-07-17 16:48:06.811 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] an : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] apple : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] a : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] day : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] keeps : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] the : 2509
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] doctor : 598
2018-07-17 16:48:06.812 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] away : 598
2018-07-17 16:48:07.004 STDIO Thread-4-RandomSentenceSpout-executor[2 2] [INFO] 生成的句子为 --> an apple a day keeps the doctor away
2018-07-17 16:48:07.017 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] an : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] apple : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] a : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] day : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] keeps : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] the : 2510
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] doctor : 599
2018-07-17 16:48:07.018 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] away : 599
2018-07-17 16:48:07.205 STDIO Thread-4-RandomSentenceSpout-executor[2 2] [INFO] 生成的句子为 --> an apple a day keeps the doctor away
2018-07-17 16:48:07.215 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] an : 600
2018-07-17 16:48:07.215 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] apple : 600
2018-07-17 16:48:07.215 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] a : 600
2018-07-17 16:48:07.215 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] day : 600
2018-07-17 16:48:07.215 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] keeps : 600
2018-07-17 16:48:07.216 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] the : 2511
2018-07-17 16:48:07.216 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] doctor : 600
2018-07-17 16:48:07.216 STDIO Thread-8-PrintBolt-executor[1 1] [INFO] away : 600

可以看到任务在正常的执行。

除了通过命令行查看,也可以在界面中查看,如下:

5.6、停止任务

在Storm集群中,停止任务有2种方式:(停止后,如果想继续运行该任务需要重新提交任务)

方式一:通过命令停止

#指定Topology的名称进行停止
storm kill WordCountTopology
​
Running: /export/software/jdk1.8.0_141/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/export/servers/storm -Dstorm.log.dir=/export/servers/storm/logs -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib -Dstorm.conf.file= -cp /export/servers/storm/lib/asm-5.0.3.jar:/export/servers/storm/lib/objenesis-2.1.jar:/export/servers/storm/lib/log4j-core-2.8.2.jar:/export/servers/storm/lib/reflectasm-1.10.1.jar:/export/servers/storm/lib/storm-rename-hack-1.1.1.jar:/export/servers/storm/lib/kryo-3.0.3.jar:/export/servers/storm/lib/log4j-over-slf4j-1.6.6.jar:/export/servers/storm/lib/slf4j-api-1.7.21.jar:/export/servers/storm/lib/servlet-api-2.5.jar:/export/servers/storm/lib/clojure-1.7.0.jar:/export/servers/storm/lib/log4j-slf4j-impl-2.8.2.jar:/export/servers/storm/lib/log4j-api-2.8.2.jar:/export/servers/storm/lib/disruptor-3.3.2.jar:/export/servers/storm/lib/storm-core-1.1.1.jar:/export/servers/storm/lib/minlog-1.3.0.jar:/export/servers/storm/lib/ring-cors-0.1.5.jar:/export/servers/storm/conf:/export/servers/storm/bin org.apache.storm.command.kill_topology WordCountTopology
3484 [main] INFO  o.a.s.u.NimbusClient - Found leader nimbus : node01:6627
3609 [main] INFO  o.a.s.c.kill-topology - Killed topology: WordCountTopology

方式二:通过管理界面停止

推荐使用第二种方式。

6、核心内容详解

通过以上的学习,我们基本掌握了Storm的应用开发。

6.1、Topology的并行度(Parallelism)

问题:

  • 如果Spout中产生的数据过多,下游的bolt处理不及时,怎么办?

  • 同理,bolt中产生的数据过多,下游的bolt处理不及时,怎么办?

  • 所提交的任务只被分配给了一个supervisor,另一个空闲,怎么办?

6.1.1、工作进程、执行器、任务

在了解Topology的并行度之前先要理清楚工作进程、执行器、任务的关系。

工作进程(worker):在Storm中,所提交的Topology将会在supervisor服务器上,启动独立的进程来执行。

worker数可以在config对象中设置:

config.setNumWorkers(2); // 设置工作进程数

执行器(Executor):是在worker中执行的线程,在向Topology添加spout或bolt时可以设置线程数;

如:

topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(),2);

说明:数字2代表是线程数,也是并行度数,但,并不是Topology的并行度。

任务(task):是在执行器中最小的工作单元,从storm 0.8后,task不再对应的是物理线程,每个 spout 或者 bolt 都会在集群中运行很多个 task。可以在代码中设置tast数,如:

topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt()).shuffleGrouping("RandomSentenceSpout").setNumTasks(4);

在拓扑的整个生命周期中每个组件的 task 数量都是保持不变的,不过每个组件的 executor 数量却是有可能会随着时间变化。在默认情况下 task 的数量是和 executor 的数量一样的,也就是说,默认情况下 Storm 会在每个线程上运行一个 task。

它们三者的关系如下:

6.1.2、案例

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(),2).setNumTasks(2);
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 4).shuffleGrouping("RandomSentenceSpout").setNumTasks(4);
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt(), 2).shuffleGrouping("SplitSentenceBolt");
        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
        config.setNumWorkers(2); // 设置工作进程数
​
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
//        LocalCluster localCluster = new LocalCluster();
//        localCluster.submitTopology("WordCountTopology", config, topology);
​
        try {
            //提交到集群
            StormSubmitter.submitTopology("WordCountTopology", config, topology);
        } catch (AlreadyAliveException e) {
            e.printStackTrace();
        } catch (InvalidTopologyException e) {
            e.printStackTrace();
        } catch (AuthorizationException e) {
            e.printStackTrace();
        }
​
    }
}

以上的Topology提交到集群后,总共是有多个worker、Executor、Task?

works:2

Executor:9

Task:8

对吗?

每个执行器至少会有一个任务。所以,任务数应该是9。

6.1.3、实际开发中,这些数该如何设置?

首先,这些数字不能拍脑袋设置,需要进行计算每个spout、bolt的执行时间和需要处理的数据量大小进行计算。才能设置出合理的数字,并且这些数字需要根据业务量的变化和进行调整。

6.2、Stream grouping(流分组)

如上图所示,BoltA向BoltB发送数据时,由于BoltB中有3个任务,那么应该发给哪一个呢?

流分组就是来解决这个问题的。

Storm内置了8个流分组方式:

package org.apache.storm.topology;
​
import org.apache.storm.generated.GlobalStreamId;
import org.apache.storm.generated.Grouping;
import org.apache.storm.grouping.CustomStreamGrouping;
import org.apache.storm.tuple.Fields;
​
​
public interface InputDeclarer<T extends InputDeclarer> {
​
    // 字段分组
    public T fieldsGrouping(String componentId, Fields fields);
    public T fieldsGrouping(String componentId, String streamId, Fields fields);
​
    // 全局分组
    public T globalGrouping(String componentId);
    public T globalGrouping(String componentId, String streamId);
​
    // 随机分组
    public T shuffleGrouping(String componentId);
    public T shuffleGrouping(String componentId, String streamId);
​
    // 本地或随机分组
    public T localOrShuffleGrouping(String componentId);
    public T localOrShuffleGrouping(String componentId, String streamId);
​
    // 无分组
    public T noneGrouping(String componentId);
    public T noneGrouping(String componentId, String streamId);
​
    // 广播分组
    public T allGrouping(String componentId);
    public T allGrouping(String componentId, String streamId);
​
    // 直接分组
    public T directGrouping(String componentId);
    public T directGrouping(String componentId, String streamId);
​
    // 部分关键字分组
    public T partialKeyGrouping(String componentId, Fields fields);
    public T partialKeyGrouping(String componentId, String streamId, Fields fields);
​
    // 自定义分组
    public T customGrouping(String componentId, CustomStreamGrouping grouping);
    public T customGrouping(String componentId, String streamId, CustomStreamGrouping grouping);
    
}
  • 字段分组(Fields Grouping ):根据指定的字段的值进行分组,举个栗子,流按照“user-id”进行分组,那么具有相同的“user-id”的tuple会发到同一个task,而具有不同“user-id”值的tuple可能会发到不同的task上。这种情况常常用在单词计数,而实际情况是很少用到,因为如果某个字段的某个值太多,就会导致task不均衡的问题。

  • 全局分组(Global grouping ):这种分组会将所有的tuple都发到一个taskid最小的task上。由于所有的tuple都发到唯一一个task上,势必在数据量大的时候会造成资源不够用的情况。

  • 随机分组(Shuffle grouping):随机的将tuple分发给bolt的各个task,每个bolt实例接收到相同数量的tuple。

  • 本地或随机分组(Local or shuffle grouping):和随机分组类似,但是如果目标Bolt在同一个工作进程中有一个或多个任务,那么元组将被随机分配到那些进程内task。简而言之就是如果发送者和接受者在同一个worker则会减少网络传输,从而提高整个拓扑的性能。有了此分组就完全可以不用shuffle grouping了。

  • 无分组(None grouping):不指定分组就表示你不关心数据流如何分组。目前来说不分组和随机分组效果是一样的,但是最终,Storm可能会使用与其订阅的bolt或spout在相同进程的bolt来执行这些tuple。

  • 广播分组(All grouping):将所有的tuple都复制之后再分发给Bolt所有的task,每一个订阅数据流的task都会接收到一份相同的完全的tuple的拷贝。

  • 直接分组(Direct grouping):这是一种特殊的分组策略。这种方式分组的流意味着将由元组的生成者决定消费者的哪个task能接收该元组。

  • 部分关键字分组(Partial Key grouping):流由分组中指定的字段分区,如“字段”分组,但是在两个下游Bolt之间进行负载平衡,当输入数据歪斜时,可以更好地利用资源。有了这个分组就完全可以不用Fields grouping了。

  • 自定义分组(Custom Grouping):通过实现CustomStreamGrouping接口来实现自己的分组策略。

6.2.1、案例

对于我们写的WordCount的程序应该使用哪一种? 原来使用的随机分组有没有问题?

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(), 2).setNumTasks(2);
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 4).shuffleGrouping("RandomSentenceSpout").setNumTasks(4);
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt(), 2).partialKeyGrouping("SplitSentenceBolt", new Fields("word"));
        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
        config.setNumWorkers(1); // 设置工作进程数
​
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
        LocalCluster localCluster = new LocalCluster();
        localCluster.submitTopology("WordCountTopology", config, topology);
​
//        try {
//            //提交到集群
//            StormSubmitter.submitTopology("WordCountTopology", config, topology);
//        } catch (AlreadyAliveException e) {
//            e.printStackTrace();
//        } catch (InvalidTopologyException e) {
//            e.printStackTrace();
//        } catch (AuthorizationException e) {
//            e.printStackTrace();
//        }
​
    }
}
​

测试:

生成的句子为 --> the cow jumped over the moon
生成的句子为 --> an apple a day keeps the doctor away
apple : 21
day : 21
keeps : 21
away : 21
over : 19
an : 21
a : 21
the : 71
doctor : 21
the : 72
cow : 19
jumped : 19
the : 73
moon : 19
生成的句子为 --> the cow jumped over the moon
生成的句子为 --> the cow jumped over the moon
over : 20
the : 74
cow : 20
jumped : 20
the : 75
moon : 20
over : 21
the : 76
cow : 21
jumped : 21
the : 77
moon : 21
生成的句子为 --> four score and seven years ago
four : 12
score : 12
years : 12
and : 26
seven : 26
ago : 12
生成的句子为 --> four score and seven years ago
four : 13
score : 13
years : 13
and : 27
seven : 27
ago : 13
生成的句子为 --> the cow jumped over the moon
over : 22
the : 78
cow : 22
jumped : 22
the : 79
moon : 22
生成的句子为 --> four score and seven years ago
four : 14
score : 14
years : 14
and : 28
seven : 28
ago : 14
生成的句子为 --> snow white and the seven dwarfs
snow : 15
white : 15
and : 29
the : 80
seven : 29
dwarfs : 15
生成的句子为 --> four score and seven years ago
four : 15
score : 15
years : 15
and : 30
seven : 30
ago : 15

6.2.2、建议

Storm提供了8种分组方式,实际常用的有几种? 一般常用的有2种:

  • 本地或随机分组

    • 优化了网络传输,优先在同一个进程中传递。

  • 部分关键字分组

    • 实现了根据字段分组,并且考虑了下游的负载均衡。

7、案例

将前面我们写的WordCount程序进行优化改造,结果存储到Redis,并且通过图表的形式将各个单词出现的次数进行展现。

7.1、部署Redis服务

yum -y install cpp binutils glibc glibc-kernheaders glibc-common glibc-devel gcc make gcc-c++ libstdc++-devel tcl
​
cd /export/software
wget http://download.redis.io/releases/redis-3.0.2.tar.gz  或者 rz 上传
tar -xvf redis-3.0.2.tar.gz -C /export/servers
cd /export/servers/
mv redis-3.0.2 redis
cd redis
make
make test #这个就不要执行了,需要很长时间
make install
​
mkdir /export/servers/redis-server
cp /export/servers/redis/redis.conf /export/servers/redis-server
vi /export/servers/redis-server/redis.conf
# 修改如下,默认为no
daemonize yes
​
cd /export/servers/redis-server/
#启动
redis-server ./redis.conf
#测试
redis-cli

7.2、导入jedis依赖

    <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>2.9.0</version>
       </dependency>

7.3、编写RedisBolt,实现存储数据到redis

package cn.itcast.storm;
​
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Tuple;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
​
import java.util.Map;
​
public class RedisBolt extends BaseRichBolt {
​
    private JedisPool jedisPool;
​
    public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
        jedisPool = new JedisPool(new JedisPoolConfig(), "node01",6379);
    }
​
    public void execute(Tuple input) {
        String word = input.getStringByField("word");
        Integer count = input.getIntegerByField("count");
​
        // 保存到redis中的key
        String key = "wordCount:" + word;
        Jedis jedis = null;
        try {
            jedis = this.jedisPool.getResource();
            jedis.set(key, String.valueOf(count));
        } finally {
            if(null != jedis){
                jedis.close();
            }
        }
    }
​
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
​
    }
}
​

7.4、修改WordCountTopology类

增加RedistBolt到Topology中。具体如下:

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(), 2).setNumTasks(2);
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 4).localOrShuffleGrouping("RandomSentenceSpout").setNumTasks(4);
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt(), 2).partialKeyGrouping("SplitSentenceBolt", new Fields("word"));
//        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
        topologyBuilder.setBolt("RedistBolt", new RedisBolt()).localOrShuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
        config.setNumWorkers(2); // 设置工作进程数
​
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
        LocalCluster localCluster = new LocalCluster();
        localCluster.submitTopology("WordCountTopology", config, topology);
​
//        try {
//            //提交到集群
//            StormSubmitter.submitTopology("WordCountTopology", config, topology);
//        } catch (AlreadyAliveException e) {
//            e.printStackTrace();
//        } catch (InvalidTopologyException e) {
//            e.printStackTrace();
//        } catch (AuthorizationException e) {
//            e.printStackTrace();
//        }
​
    }
}
​

7.5、测试

可以看到已经有数据存储到了Redis中。

7.6、创建工程 itcast-wordcount-web

该工程用于展示数据。

使用技术:SpringMVC +spring-data-redis + echarts

效果:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>itcast-bigdata</artifactId>
        <groupId>cn.itcast.bigdata</groupId>
        <version>1.0.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <packaging>war</packaging>
​
    <artifactId>itcast-wordcount-web</artifactId>
​
    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-webmvc</artifactId>
            <version>5.0.7.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.data</groupId>
            <artifactId>spring-data-redis</artifactId>
            <version>2.0.8.RELEASE</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>2.9.0</version>
        </dependency>
        <!-- Jackson Json处理工具包 -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.9.4</version>
        </dependency>
        <!-- JSP相关 -->
        <dependency>
            <groupId>jstl</groupId>
            <artifactId>jstl</artifactId>
            <version>1.2</version>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>servlet-api</artifactId>
            <version>2.5</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>javax.servlet</groupId>
            <artifactId>jsp-api</artifactId>
            <version>2.0</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.7</version>
        </dependency>
    </dependencies>
​
​
    <build>
        <finalName>${project.artifactId}</finalName>
        <plugins>
            <!-- 资源文件拷贝插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <version>2.7</version>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <!-- java编译插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.2</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
            <!-- 配置Tomcat插件 -->
            <plugin>
                <groupId>org.apache.tomcat.maven</groupId>
                <artifactId>tomcat7-maven-plugin</artifactId>
                <version>2.2</version>
                <configuration>
                    <path>/</path>
                    <port>8086</port>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

7.7、编写配置文件

7.7.1、log4j.properties

log4j.rootLogger=DEBUG,A1
​
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-d{yyyy-MM-dd HH:mm:ss} [%t] [%c]-[%p] %m%n

7.7.2、itcast-wordcount-servlet.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:p="http://www.springframework.org/schema/p"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:mvc="http://www.springframework.org/schema/mvc"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-4.0.xsd
        http://www.springframework.org/schema/mvc http://www.springframework.org/schema/mvc/spring-mvc-4.0.xsd
        http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.0.xsd">
    
    <!-- 扫描包 -->
    <context:component-scan base-package="cn.itcast.wordcount"/>
    
    <!-- 注解驱动 -->
    <mvc:annotation-driven />
​
    <!-- 配置视图解析器 -->
    <!-- 
        Example: prefix="/WEB-INF/jsp/", suffix=".jsp", viewname="test" -> "/WEB-INF/jsp/test.jsp" 
     -->
    <bean class="org.springframework.web.servlet.view.InternalResourceViewResolver">
        <property name="prefix" value="/WEB-INF/views/"/>
        <property name="suffix" value=".jsp"/>
    </bean>
​
    <!--静态资源交由web容器处理-->
    <mvc:default-servlet-handler/>
    
</beans>

7.7.3、itcast-wordcount-redis.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:p="http://www.springframework.org/schema/p"
       xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
​
    <bean id="jedisConnectionFactory" class="org.springframework.data.redis.connection.jedis.JedisConnectionFactory"
          p:use-pool="true" p:hostName="node01" p:port="6379"/>
​
    <bean id="stringRedisTemplate" class="org.springframework.data.redis.core.StringRedisTemplate"
          p:connection-factory-ref="jedisConnectionFactory"/>
​
</beans>

7.7.4、web.xml

需要创建webapp以及WEB-INF目录。

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://java.sun.com/xml/ns/javaee" xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd" id="WebApp_ID" version="2.5">
    <display-name>itcast-wordcount</display-name>
​
​
    <!-- 配置SpringMVC框架入口 -->
    <servlet>
        <servlet-name>itcast-wordcount</servlet-name>
        <servlet-class>org.springframework.web.servlet.DispatcherServlet</servlet-class>
        <init-param>
            <param-name>contextConfigLocation</param-name>
            <param-value>classpath:itcast-wordcount-*.xml</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
​
    <servlet-mapping>
        <servlet-name>itcast-wordcount</servlet-name>
        <url-pattern>/</url-pattern>
    </servlet-mapping>
​
    <welcome-file-list>
        <welcome-file>index.jsp</welcome-file>
    </welcome-file-list>
​
</web-app>

7.8、编写代码

7.8.1、编写Controller

package cn.itcast.wordcount.controller;
​
import cn.itcast.wordcount.service.WordCountService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
​
import java.util.List;
import java.util.Map;
​
@Controller
public class WordCountController {
​
    @Autowired
    private WordCountService wordCountService;
​
    @RequestMapping("view")
    public String wordCountView(){
        return "view";
    }
​
    @RequestMapping("data")
    @ResponseBody
    public Map<String,String> queryData(){
        return this.wordCountService.queryData();
    }
​
}
​

7.8.2、编写WordCountService

package cn.itcast.wordcount.service;
​
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.redis.core.RedisTemplate;
import org.springframework.stereotype.Service;
​
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
​
@Service
public class WordCountService {
​
    @Autowired
    private RedisTemplate redisTemplate;
​
    public Map<String, String> queryData() {
        Set<String> keys = this.redisTemplate.keys("wordCount:*");
        Map<String, String> result = new HashMap<>();
        for (String key : keys) {
            result.put(key.substring(key.indexOf(':') + 1), this.redisTemplate.opsForValue().get(key).toString());
        }
        return result;
    }
}
​

7.9、编写view.jsp

在WEB-INF/view下创建view.jsp

<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
    <title>Word Count View Page</title>
    <script type="application/javascript" src="/js/jquery.min.js"></script>
    <script type="application/javascript" src="/js/echarts.min.js"></script>
</head>
<body>
<div id="main" style="height: 100%"></div>
<script type="text/javascript">
    // 基于准备好的dom,初始化echarts实例
    var myChart = echarts.init(document.getElementById('main'));
​
    // 指定图表的配置项和数据
    var option = {
        title: {
            text: 'Word Count'
        },
        tooltip : {//鼠标悬浮弹窗提示
            trigger : 'item',
            show:true,
            showDelay: 5,
            hideDelay: 2,
            transitionDuration:0,
            formatter: function (params,ticket,callback) {
                // console.log(params);
                var res = "次数:"+params.value;
                return res;
            }
        },
        xAxis: {
            data: [],
            type: 'category',
            axisLabel: {
                interval: 0
            }
        },
        yAxis: {},
        series: [{
            name: '数量',
            type: 'bar',
            data: [],
            itemStyle: {
                color: '#2AAAE3'
            }
        }, {
            name: '折线',
            type: 'line',
            itemStyle: {
                color: '#FF3300'
            },
            data: []
        }
        ]
​
    };
​
    // 使用刚指定的配置项和数据显示图表。
    myChart.setOption(option);
    myChart.showLoading();
​
    // 异步加载数据
    $.get('/data', function (data) {
        var words = [];
        var counts = [];
        var counts2 = [];
        for (var d in data) {
            words.push(d);
            counts.push(data[d]);
            counts2.push(eval(data[d]) + 50);
        }
        myChart.hideLoading();
        // 填入数据
        myChart.setOption({
            xAxis: {
                data: words
            },
            series: [{
                name: '数量',
                data: counts
            },{
                name: '折线',
                data: counts2
            }]
        });
    });
​
​
</script>
​
</body>
</html>
​

7.10、itcast-bigdata-storm 项目打包

现在我们需要将itcast-bigdata-storm项目打包成jar包,发布到storm集群环境中。

7.10.1、修改WordCountTopology

package cn.itcast.storm;
​
import cn.itcast.storm.utils.SpringApplication;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //实例化Spring容器
        SpringApplication.init();
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(), 2).setNumTasks(2);
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 4).localOrShuffleGrouping("RandomSentenceSpout").setNumTasks(4);
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt(), 2).partialKeyGrouping("SplitSentenceBolt", new Fields("word"));
//        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
        topologyBuilder.setBolt("RedistBolt", new RedisBolt()).localOrShuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
        config.setNumWorkers(2); // 设置工作进程数
​
​
        //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
//        LocalCluster localCluster = new LocalCluster();
//        localCluster.submitTopology("WordCountTopology", config, topology);
​
        try {
            //提交到集群
            StormSubmitter.submitTopology("WordCountTopology", config, topology);
        } catch (AlreadyAliveException e) {
            e.printStackTrace();
        } catch (InvalidTopologyException e) {
            e.printStackTrace();
        } catch (AuthorizationException e) {
            e.printStackTrace();
        }
​
    }
}
​

7.10.2、添加打包插件

目前的打包会存一个问题,直接执行package命令打成的jar包中不包含第三方的依赖(比如,jedis依赖)。这样我们的程序是没有办法运行的。所以需要添加如下插件来解决这个问题。

   <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                    <archive>
                        <manifest>
                            <mainClass>cn.itcast.storm.WordCountTopology</mainClass>
                        </manifest>
                    </archive>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

打包的结果如下:

将该包上传到服务器,并且提交到storm。

storm jar itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar cn.itcast.storm.WordCountTopology

报错:

[root@node01 tmp]# storm jar itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar cn.itcast.storm.WordCountTopology
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/export/servers/storm/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.storm.config$read_storm_config.invoke(config.clj:78)
    at org.apache.storm.config$fn__908.invoke(config.clj:100)
    at org.apache.storm.config__init.load(Unknown Source)
    at org.apache.storm.config__init.<clinit>(Unknown Source)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at clojure.lang.RT.classForName(RT.java:2154)
    at clojure.lang.RT.classForName(RT.java:2163)
    at clojure.lang.RT.loadClassForName(RT.java:2182)
    at clojure.lang.RT.load(RT.java:436)
    at clojure.lang.RT.load(RT.java:412)
    at clojure.core$load$fn__5448.invoke(core.clj:5866)
    at clojure.core$load.doInvoke(core.clj:5865)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at clojure.core$load_one.invoke(core.clj:5671)
    at clojure.core$load_lib$fn__5397.invoke(core.clj:5711)
    at clojure.core$load_lib.doInvoke(core.clj:5710)
    at clojure.lang.RestFn.applyTo(RestFn.java:142)
    at clojure.core$apply.invoke(core.clj:632)
    at clojure.core$load_libs.doInvoke(core.clj:5753)
    at clojure.lang.RestFn.applyTo(RestFn.java:137)
    at clojure.core$apply.invoke(core.clj:634)
    at clojure.core$use.doInvoke(core.clj:5843)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at org.apache.storm.command.config_value$loading__5340__auto____12278.invoke(config_value.clj:16)
    at org.apache.storm.command.config_value__init.load(Unknown Source)
    at org.apache.storm.command.config_value__init.<clinit>(Unknown Source)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at clojure.lang.RT.classForName(RT.java:2154)
    at clojure.lang.RT.classForName(RT.java:2163)
    at clojure.lang.RT.loadClassForName(RT.java:2182)
    at clojure.lang.RT.load(RT.java:436)
    at clojure.lang.RT.load(RT.java:412)
    at clojure.core$load$fn__5448.invoke(core.clj:5866)
    at clojure.core$load.doInvoke(core.clj:5865)
    at clojure.lang.RestFn.invoke(RestFn.java:408)
    at clojure.lang.Var.invoke(Var.java:379)
    at org.apache.storm.command.config_value.<clinit>(Unknown Source)
Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/export/servers/storm/lib/storm-core-1.1.1.jar!/defaults.yaml, jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/defaults.yaml]
    at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:383)
    at org.apache.storm.utils.Utils.readDefaultConfig(Utils.java:427)
    at org.apache.storm.utils.Utils.readStormConfig(Utils.java:463)
    at org.apache.storm.utils.Utils.<clinit>(Utils.java:177)
    ... 39 more
Caused by: java.io.IOException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/export/servers/storm/lib/storm-core-1.1.1.jar!/defaults.yaml, jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/defaults.yaml]
    at org.apache.storm.utils.Utils.getConfigFileInputStream(Utils.java:409)
    at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:362)
    ... 42 more
Running: /export/software/jdk1.8.0_141/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/export/servers/storm -Dstorm.log.dir=/export/servers/storm/logs -Djava.library.path= -Dstorm.conf.file= -cp /export/servers/storm/lib/asm-5.0.3.jar:/export/servers/storm/lib/objenesis-2.1.jar:/export/servers/storm/lib/log4j-core-2.8.2.jar:/export/servers/storm/lib/reflectasm-1.10.1.jar:/export/servers/storm/lib/storm-rename-hack-1.1.1.jar:/export/servers/storm/lib/kryo-3.0.3.jar:/export/servers/storm/lib/log4j-over-slf4j-1.6.6.jar:/export/servers/storm/lib/slf4j-api-1.7.21.jar:/export/servers/storm/lib/servlet-api-2.5.jar:/export/servers/storm/lib/clojure-1.7.0.jar:/export/servers/storm/lib/log4j-slf4j-impl-2.8.2.jar:/export/servers/storm/lib/log4j-api-2.8.2.jar:/export/servers/storm/lib/disruptor-3.3.2.jar:/export/servers/storm/lib/storm-core-1.1.1.jar:/export/servers/storm/lib/minlog-1.3.0.jar:/export/servers/storm/lib/ring-cors-0.1.5.jar:itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar:/export/servers/storm/conf:/export/servers/storm/bin -Dstorm.jar=itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar -Dstorm.dependency.jars= -Dstorm.dependency.artifacts={} cn.itcast.storm.WordCountTopology
689  [main] INFO  o.s.c.s.ClassPathXmlApplicationContext - Refreshing org.springframework.context.support.ClassPathXmlApplicationContext@fe18270: startup date [Wed Jul 18 17:38:08 CST 2018]; root of context hierarchy
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/export/servers/storm/lib/log4j-slf4j-impl-2.8.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.storm.topology.TopologyBuilder$BoltGetter.customGrouping(TopologyBuilder.java:562)
    at org.apache.storm.topology.TopologyBuilder$BoltGetter.customGrouping(TopologyBuilder.java:557)
    at org.apache.storm.topology.TopologyBuilder$BoltGetter.partialKeyGrouping(TopologyBuilder.java:547)
    at org.apache.storm.topology.TopologyBuilder$BoltGetter.partialKeyGrouping(TopologyBuilder.java:476)
    at cn.itcast.storm.WordCountTopology.main(WordCountTopology.java:27)
Caused by: java.lang.RuntimeException: java.io.IOException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/export/servers/storm/lib/storm-core-1.1.1.jar!/defaults.yaml, jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/defaults.yaml]
    at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:383)
    at org.apache.storm.utils.Utils.readDefaultConfig(Utils.java:427)
    at org.apache.storm.utils.Utils.readStormConfig(Utils.java:463)
    at org.apache.storm.utils.Utils.<clinit>(Utils.java:177)
    ... 5 more
Caused by: java.io.IOException: Found multiple defaults.yaml resources. You're probably bundling the Storm jars with your topology jar. [jar:file:/export/servers/storm/lib/storm-core-1.1.1.jar!/defaults.yaml, jar:file:/tmp/itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar!/defaults.yaml]
    at org.apache.storm.utils.Utils.getConfigFileInputStream(Utils.java:409)
    at org.apache.storm.utils.Utils.findAndReadConfigFile(Utils.java:362)

错误显示,存在多个defaults.yaml文件,这是什么意思呢???

其实,在我们打包时把所有的依赖包都打进去了,其中也包含了storm相关的包,在集群环境中本来就已经存在相关的包,这样就冲突了,所以在打包时要排除掉storm相关的依赖包。

        <dependency>
            <groupId>org.apache.storm</groupId>
            <artifactId>storm-core</artifactId>
            <version>1.1.1</version>
            <scope>provided</scope>
        </dependency>

7.11、优化提交Topology逻辑

通过前面的测试会发现,WordCountTopology中的提交逻辑需要经常的本地、集群进行切换,非常的麻烦,现在我们对这个逻辑做下优化改进,具体如下:

package cn.itcast.storm;
​
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
​
public class WordCountTopology {
​
    public static void main(String[] args) {
​
        //第一步,定义TopologyBuilder对象,用于构建拓扑
        TopologyBuilder topologyBuilder = new TopologyBuilder();
​
        //第二步,设置spout和bolt
        topologyBuilder.setSpout("RandomSentenceSpout", new RandomSentenceSpout(), 2).setNumTasks(2);
        topologyBuilder.setBolt("SplitSentenceBolt", new SplitSentenceBolt(), 4).localOrShuffleGrouping("RandomSentenceSpout").setNumTasks(4);
        topologyBuilder.setBolt("WordCountBolt", new WordCountBolt(), 2).partialKeyGrouping("SplitSentenceBolt", new Fields("word"));
//        topologyBuilder.setBolt("PrintBolt", new PrintBolt()).shuffleGrouping("WordCountBolt");
        topologyBuilder.setBolt("RedistBolt", new RedisBolt()).localOrShuffleGrouping("WordCountBolt");
​
        //第三步,构建Topology对象
        StormTopology topology = topologyBuilder.createTopology();
        Config config = new Config();
​
​
        if (args == null || args.length == 0) {
            // 本地模式
​
            //第四步,提交拓扑到集群,这里先提交到本地的模拟环境中进行测试
            LocalCluster localCluster = new LocalCluster();
            localCluster.submitTopology("WordCountTopology", config, topology);
        } else {
            // 集群模式
​
            config.setNumWorkers(2); // 设置工作进程数
            try {
                //提交到集群,并且将参数作为拓扑的名称
                StormSubmitter.submitTopology(args[0], config, topology);
            } catch (AlreadyAliveException e) {
                e.printStackTrace();
            } catch (InvalidTopologyException e) {
                e.printStackTrace();
            } catch (AuthorizationException e) {
                e.printStackTrace();
            }
        }
    }
}

集群模式下用法:

# 参数:WordCountTopology2为Topology的名称
​
storm jar itcast-bigdata-storm-1.0.0-SNAPSHOT-jar-with-dependencies.jar cn.itcast.storm.WordCountTopology WordCountTopology2

可以看到,已经提交成功。

猜你喜欢

转载自blog.csdn.net/qq_41571974/article/details/83243495