这个系列指南使用真实集群搭建环境,不是伪集群,用了三台腾讯云服务器
或者访问我的个人博客站点,链接
各个组件的整合
在基于Hadoop平台的很多应用场景中,我们需要对数据进行离线和实时分析,离线分析可以很容易地借助于Hive来实现统计分析,但是对于实时的 需求Hive就不合适了。实时应用场景可以使用Storm,它是一个实时处理系统,它为实时处理类应用提供了一个计算模型,可以很容易地进行编程处理。为 了统一离线和实时计算,一般情况下,我们都希望将离线和实时计算的数据源的集合统一起来作为输入,然后将数据的流向分别经由实时系统和离线分析系统,分别进行分析处理,这时我们可以考虑将数据源(如使用Flume收集日志)直接连接一个消息中间件,如Kafka,可以整合 Flume+Kafka,Flume作为消息的Producer,生产的消息数据(日志数据、业务请求数据等等)发布到Kafka中,然后通过订阅的方 式,使用Storm的Topology作为消息的Consumer,在Storm集群中分别进行如下两个需求场景的处理:
- 直接使用Storm的Topology对数据进行实时分析处理
- 整合Storm+HDFS,将消息处理后写入HDFS进行离线分析处理
出现的一些问题
使用本地包来建立项目的话,各种包很混杂,版本不一,所以建议使用maven建立项目,从这个网站上找到包后,添加相应的dependency到pom文件中。添加完成之后右击项目,使用maven->update project来更新项目。
在进行新的topo测试时,先彻底删除kafka,storm,zookeeper下所有的相关文件。(相当于整个系统是新的)
出现kill: sending signal to 23543 failed: No such process的报错时,参考这个链接来解决。
参考这个链接来解决。java.lang.NoClassDefFoundError: Could not initialize class org.apache.log4j.Log4jLoggerFactory 参考这个以及这个还有这个链接来解决
上述的错误只是凤毛麟角,很多错误当场解决后没有及时记录便遗忘了。
强烈建议使用maven构建项目,然后将本地所有的包和集群中(注意,是
所有
主机)storm/lib下的包同步
,集群中少的,就从maven仓库中复制进去。出现任何问题都要看storm的log,目录在storm安装目录下的Logs文件夹下。注意找准端口号。
不要设置过大的并发度。内存吃不消。
storm+kafka
到这一步的时候,我还没有用maven来构建项目,事后证明尽量使用maven
如下是构建这个项目的时候使用到的jar包列表
1. kafka_xxx-xxx.jar
2. kafka-client-xxx.jar
3. storm-kafka-xxx.jar
4. json-simple-xxx.jar
5. guava-xxx.jar
6. curator-client-xxx.jar
7. curator-framework-xxx.jar
8. storm-kafka-client-xxx.jar(放在extlib文件夹里)
9. 有其他的话按照报错信息添加依赖包
样例代码
(注释还没有写)
package cn.colony.cloud.stormkafka;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.util.Map.Entry;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
import storm.kafka.BrokerHosts;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
public class MyKafkaTopology {
public static class KafkaWordSplitter extends BaseRichBolt {
private static final Log LOG = LogFactory.getLog(KafkaWordSplitter.class);
private static final long serialVersionUID = 886149197481637894L;
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple input) {
String line = input.getString(0);
LOG.info("RECV[kafka -> splitter] " + line);
String[] words = line.split("\\s+");
for(String word : words) {
LOG.info("EMIT[splitter -> counter] " + word);
collector.emit(input, new Values(word, 1));
}
collector.ack(input);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
public static class WordCounter extends BaseRichBolt {
private static final Log LOG = LogFactory.getLog(WordCounter.class);
private static final long serialVersionUID = 886149197481637894L;
private OutputCollector collector;
private Map<String, AtomicInteger> counterMap;
@Override
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
this.counterMap = new HashMap<String, AtomicInteger>();
}
@Override
public void execute(Tuple input) {
String word = input.getString(0);
int count = input.getInteger(1);
LOG.info("RECV[splitter -> counter] " + word + " : " + count);
AtomicInteger ai = this.counterMap.get(word);
if(ai == null) {
ai = new AtomicInteger();
this.counterMap.put(word, ai);
}
ai.addAndGet(count);
collector.ack(input);
LOG.info("CHECK statistics map: " + this.counterMap);
}
@Override
public void cleanup() {
LOG.info("The final result:");
Iterator<Entry<String, AtomicInteger>> iter = this.counterMap.entrySet().iterator();
while(iter.hasNext()) {
Entry<String, AtomicInteger> entry = iter.next();
LOG.info(entry.getKey() + "\t:\t" + entry.getValue().get());
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
String zks = "master:2181,slave1:2181,slave2:2181";
String topic = "my-replicated-topic5";
String zkRoot = "/storm"; // default zookeeper root configuration for storm
String id = "word";
BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.forceFromStart = false;
spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
spoutConf.zkPort = 2181;
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka-reader", new KafkaSpout(spoutConf), 1);
builder.setBolt("word-splitter", new KafkaWordSplitter(), 1).shuffleGrouping("kafka-reader");
builder.setBolt("word-counter", new WordCounter()).fieldsGrouping("word-splitter", new Fields("word"));
Config conf = new Config();
String name = MyKafkaTopology.class.getSimpleName();
if (args != null && args.length > 0) {
// Nimbus host name passed from command line
conf.put(Config.NIMBUS_HOST, args[0]);
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, builder.createTopology());
Thread.sleep(60000);
cluster.shutdown();
}
}
}
要特别关注kafka的spout是否运行正常。重点是底下的executors中的uptime,它表示这个spout正常的时长。
到晚上来看进程的时候,发现进程已经死了
最后的遗言:
2018-07-29T17:53:27.218+0800 k.c.SimpleConsumer [INFO] Reconnect due to socket error: java.nio.channels.ClosedByInterruptException
2018-07-29T17:53:27.224+0800 s.k.KafkaUtils [WARN] Network error when fetching messages:
java.nio.channels.ClosedChannelException: null
除了上述问题之外,两台slave主机貌似存在内存泄漏
情况。
参考链接在这里,这里还有这里
storm+hdfs
官方的集成链接在这里
整合过程中的一些问题解决点这里
其他参考链接在这里,这里还有这里
写好代码后,记得右击项目,run as maven clean
然后到命令行,进入项目地址,mvn clean package,生成的Jar包在项目的target文件里面,将这个jar包传到服务器上,再使用正常的storm jar命令执行。
如下是构建这个项目的时候使用到的jar包列表
1. hadoop-client-xxx.jar
2. hadoop-common-xxx.jar
3. hadoop-hdfs-xxx.jar
4. hadoop-hdfs-client-xxx.jar
5. storm-hdfs-xxx.jar
8. 有其他的话按照报错信息添加依赖包
样例代码
(还没有写注释)
package stormhdfs;
import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map;
import java.util.Random;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy.TimeUnit;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.Utils;
public class StormToHDFSTopology {
public static class EventSpout extends BaseRichSpout {
private static final Log LOG = LogFactory.getLog(EventSpout.class);
private static final long serialVersionUID = 886149197481637894L;
private SpoutOutputCollector collector;
private Random rand;
private String[] records;
public void open(Map conf, TopologyContext context,
SpoutOutputCollector collector) {
this.collector = collector;
rand = new Random();
records = new String[] {
"10001 ef2da82d4c8b49c44199655dc14f39f6 4.2.1 HUAWEI G610-U00 HUAWEI 2 70:72:3c:73:8b:22 2014-10-13 12:36:35",
"10001 ffb52739a29348a67952e47c12da54ef 4.3 GT-I9300 samsung 2 50:CC:F8:E4:22:E2 2014-10-13 12:36:02",
"10001 ef2da82d4c8b49c44199655dc14f39f6 4.2.1 HUAWEI G610-U00 HUAWEI 2 70:72:3c:73:8b:22 2014-10-13 12:36:35"
};
}
public void nextTuple() {
Utils.sleep(1000);
DateFormat df = new SimpleDateFormat("yyyy-MM-dd_HH-mm-ss");
Date d = new Date(System.currentTimeMillis());
String minute = df.format(d);
String record = records[rand.nextInt(records.length)];
LOG.info("EMIT[spout -> hdfs] " + minute + " : " + record);
collector.emit(new Values(minute, record));
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("minute", "record"));
}
}
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
// use "|" instead of "," for field delimiter
RecordFormat format = new DelimitedRecordFormat()
.withFieldDelimiter(" : ");
// sync the filesystem after every 1k tuples
SyncPolicy syncPolicy = new CountSyncPolicy(1000);
// rotate files
FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimeUnit.MINUTES);
FileNameFormat fileNameFormat = new DefaultFileNameFormat()
.withPath("/foo/").withPrefix("app_").withExtension(".log");
HdfsBolt hdfsBolt = new HdfsBolt()
.withFsUrl("hdfs://master:9000")
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(syncPolicy);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("event-spout", new EventSpout(), 3);
builder.setBolt("hdfs-bolt", hdfsBolt, 2).fieldsGrouping("event-spout", new Fields("minute"));
Config conf = new Config();
String name = StormToHDFSTopology.class.getSimpleName();
if (args != null && args.length > 0) {
conf.put(Config.NIMBUS_HOST, args[0]);
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, builder.createTopology());
Thread.sleep(60000);
cluster.shutdown();
}
}
}
kafka+storm+hdfs
官方的集成指南在这里
注意点
代码的构建原理和之前的一样,不再赘述。
在提交topology之前,有以下几项注意点:
- pom.xml使用之前的pom,之前的Pom已经集成了kafka
- storm和kafka集群已经正常启动,有nimbus,core,supervisor,kafka,zookeeper以及hadoop相关进程
- kafka已经事先创建了相关的topic,这个topic在云集群中不建议多备份多分区(性能原因)(实验室里的服务器也好不到哪里去,集群对硬件的要求太高了)
- 在zookeeper里建立/kafka路径,kafkaSpout将使用这个路径
- 在maven项目里,pom.xml文件使用maven.plugins.shade来构建项目,并在pom文件里指定主类名称。
样例代码
(还没有写注释)
package kafkastormhdfs;
import java.util.Arrays;
import java.util.Map;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.TimedRotationPolicy.TimeUnit;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;
import storm.kafka.BrokerHosts;
import storm.kafka.KafkaSpout;
import storm.kafka.SpoutConfig;
import storm.kafka.StringScheme;
import storm.kafka.ZkHosts;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.TopologyBuilder;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class KafkaStormHDFS {
public static class KafkaWordToUpperCase extends BaseRichBolt {
private static final Log LOG = LogFactory.getLog(KafkaWordToUpperCase.class);
private static final long serialVersionUID = -5207232012035109026L;
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
public void execute(Tuple input) {
String line = input.getString(0).trim();
LOG.info("RECV[kafka -> splitter] " + line);
if(!line.isEmpty()) {
String upperLine = line.toUpperCase();
LOG.info("EMIT[splitter -> counter] " + upperLine);
collector.emit(input, new Values(upperLine, upperLine.length()));
}
collector.ack(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line", "len"));
}
}
public static class RealtimeBolt extends BaseRichBolt {
private static final Log LOG = LogFactory.getLog(KafkaWordToUpperCase.class);
private static final long serialVersionUID = -4115132557403913367L;
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
}
public void execute(Tuple input) {
String line = input.getString(0).trim();
LOG.info("REALTIME: " + line);
collector.ack(input);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
// Configure Kafka
String zks = "master:2181,slave1:2181,slave2:2181";
String topic = "my-replicated-topic5";
String zkRoot = "/storm"; // default zookeeper root configuration for storm
String id = "word";
BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");//pay attention!
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.forceFromStart = false;
spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
spoutConf.zkPort = 2181;
// Configure HDFS bolt
RecordFormat format = new DelimitedRecordFormat()
.withFieldDelimiter("\t"); // use "\t" instead of "," for field delimiter
SyncPolicy syncPolicy = new CountSyncPolicy(10); // sync the filesystem after every 10 tuples
FileRotationPolicy rotationPolicy = new TimedRotationPolicy(1.0f, TimeUnit.MINUTES); // rotate files
FileNameFormat fileNameFormat = new DefaultFileNameFormat()
.withPath("/foo/").withPrefix("kafkastormhdfs_").withExtension(".log"); // set file name format
HdfsBolt hdfsBolt = new HdfsBolt()
.withFsUrl("hdfs://master:9000")
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(syncPolicy);
// configure & build topology
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka-reader", new KafkaSpout(spoutConf), 1);
builder.setBolt("to-upper", new KafkaWordToUpperCase(), 1).shuffleGrouping("kafka-reader");
builder.setBolt("hdfs-bolt", hdfsBolt, 1).shuffleGrouping("to-upper");
builder.setBolt("realtime", new RealtimeBolt(), 1).shuffleGrouping("to-upper");
// submit topology
Config conf = new Config();
String name = KafkaStormHDFS.class.getSimpleName();
if (args != null && args.length > 0) {
String nimbus = args[0];
conf.put(Config.NIMBUS_HOST, nimbus);
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(name, conf, builder.createTopology());
} else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(name, conf, builder.createTopology());
Thread.sleep(60000);
cluster.shutdown();
}
}
}
kafka+storm+hbase
注意点
- 运行代码时确保storm,kafka,hbase能够正常运行
- kafka在消费消息时,每个topology给一个不同的id
- 集群中storm/lib里的包要和本地开发时maven的包匹配,也要和集群中真实运行的软件版本匹配
- 使用正确的pom依赖定义,storm-hbase相关pom定义如下
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>1.1.1</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.3.1</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.3.1</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.1</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
样例代码
SpliterBolt
package cn.colony.cloud.kafkastormhbase;
import java.util.StringTokenizer;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
public class SpliterBolt extends BaseBasicBolt{
private static final Log LOG = LogFactory.getLog(SpliterBolt.class);
private static final long serialVersionUID = -6406436449580657911L;
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String sentence = tuple.getString(0);
LOG.info("SpliterBolt RECV: " + sentence);
StringTokenizer iter = new StringTokenizer(sentence);
while (iter.hasMoreElements()){
String value = iter.nextToken();
LOG.info("SpliterBolt EMIT: " + value);
collector.emit(new Values(value));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
CountBolt
package cn.colony.cloud.kafkastormhbase;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.topology.BasicOutputCollector;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseBasicBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
public class CountBolt extends BaseBasicBolt{
private static final Log LOG = LogFactory.getLog(CountBolt.class);
private static final long serialVersionUID = 4906714131213303536L;
Map<String, Integer> counts = new HashMap<String, Integer>();
@Override
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
LOG.info("CountBolt RECV: " + word);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
//here to add the log function
LOG.info("CountBolt EMIT: " + word+"\t"+count);
collector.emit(new Values(word,count.toString()));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word","count"));
}
}
HbaseTopology
package cn.colony.cloud.kafkastormhbase;
import java.util.Arrays;
import java.util.Map;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.storm.Config;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.hbase.bolt.HBaseBolt;
import org.apache.storm.hbase.bolt.mapper.SimpleHBaseMapper;
import org.apache.storm.kafka.BrokerHosts;
import org.apache.storm.kafka.KafkaSpout;
import org.apache.storm.kafka.SpoutConfig;
import org.apache.storm.kafka.StringScheme;
import org.apache.storm.kafka.ZkHosts;
import org.apache.storm.shade.org.apache.curator.shaded.com.google.common.collect.Maps;
import org.apache.storm.spout.SchemeAsMultiScheme;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
public class HbaseTopology {
private static final Log LOG = LogFactory.getLog(HbaseTopology.class);
public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException{
String zks = "master:2181,slave1:2181,slave2:2181";
String topic = "HbaseTopology";
String zkRoot = "/storm"; // default zookeeper root configuration for storm
String id = "wordcounthbase"; // different id for different topology
BrokerHosts brokerHosts = new ZkHosts(zks,"/kafka/brokers");
SpoutConfig spoutConf = new SpoutConfig(brokerHosts, topic, zkRoot, id);
spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
spoutConf.zkServers = Arrays.asList(new String[] {"master", "slave1", "slave2"});
spoutConf.zkPort = 2181;
KafkaSpout kafkaSpout = new KafkaSpout(spoutConf);
Config conf = new Config();
Map<String, String> HBConfig = Maps.newHashMap();
HBConfig.put("hbase.rootdir", "hdfs://master:9000/hbase");
HBConfig.put("hbase.zookeeper.property.clientPort", "2181");
HBConfig.put("hbase.zookeeper.quorum","master:2181,slave1:2181,slave2:2181");
HBConfig.put("zookeeper.znode.parent", "/hbase");
conf.put("HBCONFIG", HBConfig);
SimpleHBaseMapper mapper = new SimpleHBaseMapper();
mapper.withColumnFamily("result");//列族
mapper.withColumnFields(new Fields("count"));//列限定
mapper.withRowKeyField("word");//行键
HBaseBolt hbaseBolt = new HBaseBolt("wordcount", mapper).withConfigKey("HBCONFIG");
hbaseBolt.withFlushIntervalSecs(10);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka-spout", kafkaSpout,1);
builder.setBolt("word-spliter", new SpliterBolt(),1).shuffleGrouping("kafka-spout");
builder.setBolt("counter", new CountBolt(),1).fieldsGrouping("word-spliter", new Fields("word"));
builder.setBolt("hbase", hbaseBolt,1).shuffleGrouping("counter");
conf.put(Config.NIMBUS_HOST, "master");
conf.setNumWorkers(4);
StormSubmitter.submitTopology("hbase-topology", conf, builder.createTopology());
}
}