大数据平台搭建和使用之十——HDFS,Kafka,Storm,HBase整合

这个系列指南使用真实集群搭建环境,不是伪集群,用了三台腾讯云服务器

或者访问我的个人博客站点,链接

各个组件的整合

在基于Hadoop平台的很多应用场景中,我们需要对数据进行离线和实时分析,离线分析可以很容易地借助于Hive来实现统计分析,但是对于实时的 需求Hive就不合适了。实时应用场景可以使用Storm,它是一个实时处理系统,它为实时处理类应用提供了一个计算模型,可以很容易地进行编程处理。为 了统一离线和实时计算,一般情况下,我们都希望将离线和实时计算的数据源的集合统一起来作为输入,然后将数据的流向分别经由实时系统和离线分析系统,分别进行分析处理,这时我们可以考虑将数据源(如使用Flume收集日志)直接连接一个消息中间件,如Kafka,可以整合 Flume+Kafka,Flume作为消息的Producer,生产的消息数据(日志数据、业务请求数据等等)发布到Kafka中,然后通过订阅的方 式,使用Storm的Topology作为消息的Consumer,在Storm集群中分别进行如下两个需求场景的处理:

  • 直接使用Storm的Topology对数据进行实时分析处理
  • 整合Storm+HDFS,将消息处理后写入HDFS进行离线分析处理

出现的一些问题

使用本地包来建立项目的话,各种包很混杂,版本不一,所以建议使用maven建立项目,从这个网站上找到包后,添加相应的dependency到pom文件中。添加完成之后右击项目,使用maven->update project来更新项目。

在进行新的topo测试时,先彻底删除kafka,storm,zookeeper下所有的相关文件。(相当于整个系统是新的)

  • 出现kill: sending signal to 23543 failed: No such process的报错时,参考这个链接来解决。


  • 参考这个链接来解决。

  • java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.Log4jLoggerFactory 参考这个以及这个还有这个链接来解决

  • 上述的错误只是凤毛麟角,很多错误当场解决后没有及时记录便遗忘了。

  • 强烈建议使用maven构建项目,然后将本地所有的包和集群中(注意,是所有主机)storm/lib下的包同步,集群中少的,就从maven仓库中复制进去。

  • 出现任何问题都要看storm的log,目录在storm安装目录下的Logs文件夹下。注意找准端口号。

  • 不要设置过大的并发度。内存吃不消。

storm+kafka

到这一步的时候,我还没有用maven来构建项目,事后证明尽量使用maven
如下是构建这个项目的时候使用到的jar包列表
1. kafka_xxx-xxx.jar
2. kafka-client-xxx.jar
3. storm-kafka-xxx.jar
4. json-simple-xxx.jar
5. guava-xxx.jar
6. curator-client-xxx.jar
7. curator-framework-xxx.jar
8. storm-kafka-client-xxx.jar(放在extlib文件夹里)
9. 有其他的话按照报错信息添加依赖包

样例代码

(注释还没有写)

package cn.colony.cloud.stormkafka;

import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.concurrent.atomic.AtomicInteger;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import storm.kafka.BrokerHosts;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;

public class MyKafkaTopology {

    public static class KafkaWordSplitter extends BaseRichBolt {

         private static final Log LOG = LogFactory.getLog(KafkaWordSplitter.class);
         private static final long serialVersionUID = 886149197481637894L;
         private OutputCollector collector;

         @Override
         public void prepare(Map stormConf, TopologyContext context,
                   OutputCollector collector) {
              this.collector = collector;              
         }

         @Override
         public void execute(Tuple input) {
              String line = input.getString(0);
              LOG.info("RECV[kafka -> splitter] " + line);
              String[] words = line.split("\\s+");
              for(String word : words) {
                   LOG.info("EMIT[splitter -> counter] " + word);
                   collector.emit(input, new Values(word, 1));
              }
              collector.ack(input);
         }

         @Override
         public void declareOutputFields(OutputFieldsDeclarer declarer) {
              declarer.declare(new Fields("word", "count"));         
         }

    }

    public static class WordCounter extends BaseRichBolt {

         private static final Log LOG = LogFactory.getLog(WordCounter.class);
         private static final long serialVersionUID = 886149197481637894L;
         private OutputCollector collector;
         private Map<String, AtomicInteger> counterMap;

         @Override
         public void prepare(Map stormConf, TopologyContext context,
                   OutputCollector collector) {
              this.collector = collector;    
              this.counterMap = new HashMap<String, AtomicInteger>();
         }

         @Override
         public void execute(Tuple input) {
              String word = input.getString(0);
              int count = input.getInteger(1);
              LOG.info("RECV[splitter -> counter] " + word + " : " + count);
              AtomicInteger ai = this.counterMap.get(word);
              if(ai == null) {
                   ai = new AtomicInteger();
                   this.counterMap.put(word, ai);
              }
              ai.addAndGet(count);
              collector.ack(input);
              LOG.info("CHECK statistics map: " + this.counterMap);
         }

         @Override
         public void cleanup() {
              LOG.info("The final result:");
              Iterator<Entry<String, AtomicInteger>> iter = this.counterMap.entrySet().iterator();
              while(iter.hasNext()) {
                   Entry<String, AtomicInteger> entry = iter.next();
                   LOG.info(entry.getKey() + "\t:\t" + entry.getValue().get());
              }

         }

         @Override
         public void declareOutputFields(OutputFieldsDeclarer declarer) {
              declarer.declare(new Fields("word", "count"));         
         }
    }

    public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
         String zks = "master:2181,slave1:2181,slave2:2181";
         String topic = "my-replicated-topic5";
         String zkRoot = "/storm"; // default zookeeper root configuration for storm
         String id = "word";

         BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");
         SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
         spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
         spoutConf.forceFromStart = false;
         spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
         spoutConf.zkPort = 2181;

         TopologyBuilder builder = new TopologyBuilder();
         builder.setSpout("kafka-reader", new KafkaSpout(spoutConf), 1);
         builder.setBolt("word-splitter", new KafkaWordSplitter(), 1).shuffleGrouping("kafka-reader");
         builder.setBolt("word-counter", new WordCounter()).fieldsGrouping("word-splitter", new Fields("word"));

         Config conf = new Config();

         String name = MyKafkaTopology.class.getSimpleName();
         if (args != null && args.length > 0) {
              // Nimbus host name passed from command line
              conf.put(Config.NIMBUS_HOST, args[0]);
              conf.setNumWorkers(3);
              StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
         } else {
              conf.setMaxTaskParallelism(3);
              LocalCluster cluster = new LocalCluster();
              cluster.submitTopology(name, conf, builder.createTopology());
              Thread.sleep(60000);
              cluster.shutdown();
         }
    }
}


要特别关注kafka的spout是否运行正常。重点是底下的executors中的uptime,它表示这个spout正常的时长。

到晚上来看进程的时候,发现进程已经死了
最后的遗言:

2018-07-29T17:53:27.218+0800 k.c.SimpleConsumer [INFO] Reconnect due to socket error: java.nio.channels.ClosedByInterruptException
2018-07-29T17:53:27.224+0800 s.k.KafkaUtils [WARN] Network error when fetching messages:
java.nio.channels.ClosedChannelException: null

除了上述问题之外,两台slave主机貌似存在内存泄漏情况。
参考链接在这里这里还有这里

storm+hdfs

官方的集成链接在这里
整合过程中的一些问题解决点这里
其他参考链接在这里这里还有这里

写好代码后,记得右击项目,run as maven clean
然后到命令行,进入项目地址,mvn clean package,生成的Jar包在项目的target文件里面,将这个jar包传到服务器上,再使用正常的storm jar命令执行。

如下是构建这个项目的时候使用到的jar包列表
1. hadoop-client-xxx.jar
2. hadoop-common-xxx.jar
3. hadoop-hdfs-xxx.jar
4. hadoop-hdfs-client-xxx.jar
5. storm-hdfs-xxx.jar
8. 有其他的话按照报错信息添加依赖包

样例代码

(还没有写注释)

package stormhdfs;

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map;
import java.util.Random;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy.TimeUnit;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;

import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;

public class StormToHDFSTopology {

     public static class EventSpout extends BaseRichSpout {

          private static final Log LOG = LogFactory.getLog(EventSpout.class);
          private static final long serialVersionUID = 886149197481637894L;
          private SpoutOutputCollector collector;
          private Random rand;
          private String[] records;

          public void open(Map conf, TopologyContext context,
                    SpoutOutputCollector collector) {
               this.collector = collector;    
               rand = new Random();
               records = new String[] {
                         "10001     ef2da82d4c8b49c44199655dc14f39f6     4.2.1     HUAWEI G610-U00     HUAWEI     2     70:72:3c:73:8b:22     2014-10-13 12:36:35",
                         "10001     ffb52739a29348a67952e47c12da54ef     4.3     GT-I9300     samsung     2     50:CC:F8:E4:22:E2     2014-10-13 12:36:02",
                         "10001     ef2da82d4c8b49c44199655dc14f39f6     4.2.1     HUAWEI G610-U00     HUAWEI     2     70:72:3c:73:8b:22     2014-10-13 12:36:35"
               };
          }

          public void nextTuple() {
               Utils.sleep(1000);
               DateFormat df = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss");
               Date d = new Date(System.currentTimeMillis());
               String minute = df.format(d);
               String record = records[rand.nextInt(records.length)];
               LOG.info("EMIT[spout -> hdfs] " + minute + " : " + record);
               collector.emit(new Values(minute, record));
          }

          public void declareOutputFields(OutputFieldsDeclarer declarer) {
               declarer.declare(new Fields("minute", "record"));         
          }


     }

     public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
          // use "|" instead of "," for field delimiter
          RecordFormat format = new DelimitedRecordFormat()
                  .withFieldDelimiter(" : ");

          // sync the filesystem after every 1k tuples
          SyncPolicy syncPolicy = new CountSyncPolicy(1000);

          // rotate files 
          FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimeUnit.MINUTES);

          FileNameFormat fileNameFormat = new DefaultFileNameFormat()
                  .withPath("/foo/").withPrefix("app_").withExtension(".log");

          HdfsBolt hdfsBolt = new HdfsBolt()
                  .withFsUrl("hdfs://master:9000")
                  .withFileNameFormat(fileNameFormat)
                  .withRecordFormat(format)
                  .withRotationPolicy(rotationPolicy)
                  .withSyncPolicy(syncPolicy);

          TopologyBuilder builder = new TopologyBuilder();
          builder.setSpout("event-spout", new EventSpout(), 3);
          builder.setBolt("hdfs-bolt", hdfsBolt, 2).fieldsGrouping("event-spout", new Fields("minute"));

          Config conf = new Config();

          String name = StormToHDFSTopology.class.getSimpleName();
          if (args != null && args.length > 0) {
               conf.put(Config.NIMBUS_HOST, args[0]);
               conf.setNumWorkers(3);
               StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
          } else {
               conf.setMaxTaskParallelism(3);
               LocalCluster cluster = new LocalCluster();
               cluster.submitTopology(name, conf, builder.createTopology());
               Thread.sleep(60000);
               cluster.shutdown();
          }
     }
}

kafka+storm+hdfs

官方的集成指南在这里

注意点

代码的构建原理和之前的一样,不再赘述。
在提交topology之前,有以下几项注意点:

  • pom.xml使用之前的pom,之前的Pom已经集成了kafka
  • storm和kafka集群已经正常启动,有nimbus,core,supervisor,kafka,zookeeper以及hadoop相关进程
  • kafka已经事先创建了相关的topic,这个topic在云集群中不建议多备份多分区(性能原因)(实验室里的服务器也好不到哪里去,集群对硬件的要求太高了)
  • 在zookeeper里建立/kafka路径,kafkaSpout将使用这个路径
  • 在maven项目里,pom.xml文件使用maven.plugins.shade来构建项目,并在pom文件里指定主类名称。

样例代码

(还没有写注释)

package kafkastormhdfs;

import java.util.Arrays;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy.TimeUnit;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;

import storm.kafka.BrokerHosts;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;

public class KafkaStormHDFS {

     public static class KafkaWordToUpperCase extends BaseRichBolt {

          private static final Log LOG = LogFactory.getLog(KafkaWordToUpperCase.class);
          private static final long serialVersionUID = -5207232012035109026L;
          private OutputCollector collector;

          public void prepare(Map stormConf, TopologyContext context,
                    OutputCollector collector) {
               this.collector = collector;              
          }

          public void execute(Tuple input) {
               String line = input.getString(0).trim();
               LOG.info("RECV[kafka -> splitter] " + line);
               if(!line.isEmpty()) {
                    String upperLine = line.toUpperCase();
                    LOG.info("EMIT[splitter -> counter] " + upperLine);
                    collector.emit(input, new Values(upperLine, upperLine.length()));
               }
               collector.ack(input);
          }

          public void declareOutputFields(OutputFieldsDeclarer declarer) {
               declarer.declare(new Fields("line", "len"));         
          }

     }

     public static class RealtimeBolt extends BaseRichBolt {

          private static final Log LOG = LogFactory.getLog(KafkaWordToUpperCase.class);
          private static final long serialVersionUID = -4115132557403913367L;
          private OutputCollector collector;

          public void prepare(Map stormConf, TopologyContext context,
                    OutputCollector collector) {
               this.collector = collector;              
          }

          public void execute(Tuple input) {
               String line = input.getString(0).trim();
               LOG.info("REALTIME: " + line);
               collector.ack(input);
          }

          public void declareOutputFields(OutputFieldsDeclarer declarer) {

          }

     }

     public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {

          // Configure Kafka
          String zks = "master:2181,slave1:2181,slave2:2181";
          String topic = "my-replicated-topic5";
          String zkRoot = "/storm"; // default zookeeper root configuration for storm
          String id = "word";
          BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");//pay attention!
          SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
          spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
          spoutConf.forceFromStart = false;
          spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
          spoutConf.zkPort = 2181;

          // Configure HDFS bolt
          RecordFormat format = new DelimitedRecordFormat()
                  .withFieldDelimiter("\t"); // use "\t" instead of "," for field delimiter
          SyncPolicy syncPolicy = new CountSyncPolicy(10); // sync the filesystem after every 10 tuples
          FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimeUnit.MINUTES); // rotate files
          FileNameFormat fileNameFormat = new DefaultFileNameFormat()
                  .withPath("/foo/").withPrefix("kafkastormhdfs_").withExtension(".log"); // set file name format
          HdfsBolt hdfsBolt = new HdfsBolt()
                  .withFsUrl("hdfs://master:9000")
                  .withFileNameFormat(fileNameFormat)
                  .withRecordFormat(format)
                  .withRotationPolicy(rotationPolicy)
                  .withSyncPolicy(syncPolicy);

          // configure & build topology
          TopologyBuilder builder = new TopologyBuilder();
          builder.setSpout("kafka-reader", new KafkaSpout(spoutConf), 1);
          builder.setBolt("to-upper", new KafkaWordToUpperCase(), 1).shuffleGrouping("kafka-reader");
          builder.setBolt("hdfs-bolt", hdfsBolt, 1).shuffleGrouping("to-upper");
          builder.setBolt("realtime", new RealtimeBolt(), 1).shuffleGrouping("to-upper");

          // submit topology
          Config conf = new Config();
          String name = KafkaStormHDFS.class.getSimpleName();
          if (args != null && args.length > 0) {
               String nimbus = args[0];
               conf.put(Config.NIMBUS_HOST, nimbus);
               conf.setNumWorkers(3);
               StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
          } else {
               conf.setMaxTaskParallelism(3);
               LocalCluster cluster = new LocalCluster();
               cluster.submitTopology(name, conf, builder.createTopology());
               Thread.sleep(60000);
               cluster.shutdown();
          }
     }
}

kafka+storm+hbase

注意点

  • 运行代码时确保storm,kafka,hbase能够正常运行
  • kafka在消费消息时,每个topology给一个不同的id
  • 集群中storm/lib里的包要和本地开发时maven的包匹配,也要和集群中真实运行的软件版本匹配
  • 使用正确的pom依赖定义,storm-hbase相关pom定义如下
<dependency>
        <groupId>org.apache.storm</groupId>
        <artifactId>storm-hbase</artifactId>
        <version>1.1.1</version>
        <scope>provided</scope>
        <exclusions>
            <exclusion>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-client</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-common</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.hbase</groupId>
                <artifactId>hbase-server</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>1.3.1</version>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-common</artifactId>
        <version>1.3.1</version>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-server</artifactId>
        <version>1.3.1</version>
        <exclusions>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>slf4j-log4j12</artifactId>
            </exclusion>
        </exclusions>
    </dependency>

样例代码

SpliterBolt

package cn.colony.cloud.kafkastormhbase;

import java.util.StringTokenizer;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

public class SpliterBolt extends BaseBasicBolt{

    private static final Log LOG = LogFactory.getLog(SpliterBolt.class);
    private static final long serialVersionUID = -6406436449580657911L;

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String sentence = tuple.getString(0);
        LOG.info("SpliterBolt RECV: " + sentence);
        StringTokenizer iter = new StringTokenizer(sentence);
        while (iter.hasMoreElements()){
            String value = iter.nextToken();
            LOG.info("SpliterBolt EMIT: " + value);
            collector.emit(new Values(value));
        }
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word"));

    }
}

CountBolt

package cn.colony.cloud.kafkastormhbase;

import java.util.HashMap;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;


public class CountBolt extends BaseBasicBolt{

    private static final Log LOG = LogFactory.getLog(CountBolt.class);
    private static final long serialVersionUID = 4906714131213303536L;
    Map<String, Integer> counts = new HashMap<String, Integer>(); 

    @Override
    public void execute(Tuple tuple, BasicOutputCollector collector) {
        String word = tuple.getString(0);
        LOG.info("CountBolt RECV: " + word);
        Integer count = counts.get(word);
        if (count == null)
            count = 0;
        count++;
        counts.put(word, count);
        //here to add the log function
        LOG.info("CountBolt EMIT: " + word+"\t"+count);
        collector.emit(new Values(word,count.toString()));
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(new Fields("word","count"));
    }
}

HbaseTopology

package cn.colony.cloud.kafkastormhbase;

import java.util.Arrays;
import java.util.Map;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.hbase.bolt.HBaseBolt;
import org.apache.storm.hbase.bolt.mapper.SimpleHBaseMapper;
import org.apache.storm.kafka.BrokerHosts;
import org.apache.storm.kafka.KafkaSpout;
import org.apache.storm.kafka.SpoutConfig;
import org.apache.storm.kafka.StringScheme;
import org.apache.storm.kafka.ZkHosts;
import org.apache.storm.shade.org.apache.curator.shaded.com.google.common.collect.Maps;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

public class HbaseTopology {

    private static final Log LOG = LogFactory.getLog(HbaseTopology.class);

    public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException{
        String zks = "master:2181,slave1:2181,slave2:2181";
        String topic = "HbaseTopology";
        String zkRoot = "/storm"; // default zookeeper root configuration for storm
        String id = "wordcounthbase"; // different id for different topology

        BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");
        SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
        spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
        spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
        spoutConf.zkPort = 2181;
        KafkaSpout kafkaSpout = new KafkaSpout(spoutConf);


        Config conf = new Config();
        Map<String, String> HBConfig = Maps.newHashMap();
        HBConfig.put("hbase.rootdir", "hdfs://master:9000/hbase");
        HBConfig.put("hbase.zookeeper.property.clientPort", "2181");
        HBConfig.put("hbase.zookeeper.quorum","master:2181,slave1:2181,slave2:2181");
        HBConfig.put("zookeeper.znode.parent", "/hbase");
        conf.put("HBCONFIG", HBConfig);

        SimpleHBaseMapper mapper = new SimpleHBaseMapper();
        mapper.withColumnFamily("result");//列族
        mapper.withColumnFields(new Fields("count"));//列限定
        mapper.withRowKeyField("word");//行键

        HBaseBolt hbaseBolt = new HBaseBolt("wordcount", mapper).withConfigKey("HBCONFIG");
        hbaseBolt.withFlushIntervalSecs(10);

        TopologyBuilder builder = new TopologyBuilder();
        builder.setSpout("kafka-spout", kafkaSpout,1);
        builder.setBolt("word-spliter", new SpliterBolt(),1).shuffleGrouping("kafka-spout");
        builder.setBolt("counter", new CountBolt(),1).fieldsGrouping("word-spliter", new Fields("word"));
        builder.setBolt("hbase", hbaseBolt,1).shuffleGrouping("counter");


        conf.put(Config.NIMBUS_HOST, "master");
        conf.setNumWorkers(4);
        StormSubmitter.submitTopology("hbase-topology", conf, builder.createTopology());
    }
}

猜你喜欢

转载自blog.csdn.net/moquancsdn/article/details/81700500
今日推荐