flume,kafka,sparkstreaming,hbase,hive连接

环境:
flume-1.6.0-cdh5.5.2
spark-1.6.0
kafka_2.10-0.8.2.1
zookeeper-3.4.5
hbase-1.0.0-cdh5.5.2
hive-1.2.2

1.flume -> kafka:

1.1在flume的conf中编写cc.conf
vi test.conf

fk.sources = s1
fk.channels = c1
fk.sinks = k1
#配置监控的目录及其下的文件类型(这里是/tmp/test/log*.*)
fk.sources.s1.type = TAILDIR	
fk.sources.s1.postitionFile = /tmp/test.json 
fk.sources.s1.filegroups = dir01
fk.sources.s1.filegroups.dir01 = /tmp/test/log*.*
#配置测试环境下的通道
fk.channels.c1.type = memory
fk.channels.c1.capacity = 100
fk.channels.c1.transactionCapacity = 10
#设置投递到kafka
#requiredAcks :0 1 2 安全性越来越高,性能越来越小
fk.sinks.k1.type=org.apache.flume.sink.kafka.KafkaSink
fk.sinks.k1.brokerList = h201:9092,h202:9092,h203:9092
fk.sinks.k1.topic = fktest
fk.sinks.k1.requiredAcks = 0
fk.sinks.k1.batchSize = 2
#各环节连接
fk.sources.s1.channels = c1
fk.sinks.k1.channel = c1

1.2启动flume和kafka

flume启动:以后台进程方式启动,并把日志输出到指定文件中,方便查看日志和调试
nohup bin/flume-ng agent --conf conf --conf-file /usr/local/apache-flume-1.6.0-cdh5.5.2-bin/conf/test.conf --name fk -Dflume.root.logger=INFO,console > f_out 2>&1 &

kafka启动(需要启动zookeeper)kafka:

bin/kafka-server-start.sh config/server.properties &

创建消息主题
bin/kafka-topics.sh --create
–replication-factor 3
–partition 3
–topic fktest
–zookeeper h201:2181,h202:2181,h203:2181

2.kafka–spark-strming–hbase

2.1在hbase shell中建表(日志表rz)–可以不管,在spark中建表也可
create ‘rz’,‘test’,‘test’

2.2执行kafkaspark.scala,实现kafka–spark-strming–hbase
须导包spark-streaming-kafka-assembly_2.10-1.6.3.jar,临时用的话放在执行时的目录下即可

vi kafkaspark.scala

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.kafka.KafkaUtils
import org.apache.spark.streaming._
import kafka.serializer.StringDecoder
import scala.collection.mutable.ArrayBuffer
import org.apache.hadoop.hbase.{HBaseConfiguration,HColumnDescriptor, HTableDescriptor, TableName}
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapred.TableOutputFormat
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapred.JobConf
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
import org.apache.hadoop.hbase.client.HBaseAdmin

object kafkaspark{
def main(args:Array[String]): Unit = {
   val conf = new SparkConf().setAppName("kafkaspark").setMaster("local[2]")
   System.setProperty("spark.serializer","org.apache.spark.serializer.KryoSerializer")
   val sc = new SparkContext(conf)
   val ssc = new StreamingContext(sc, Seconds(3))
   val topics = Set("fktest")
   val brokers = "h201:9092,h202:9092,h203:9092"
   val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers, "serializer.class" -> "kafka.serializer.StringEncoder") 
//配置hbase参数
   val hbconf = HBaseConfiguration.create()
   conf.set("hbase.zookeeper.property.clientPort", "2181")
   val tablename = "rz"
   val jc = new JobConf(hbconf)
//配置hbase任务输出类型参数
   jc.setOutputFormat(classOf[TableOutputFormat])
   jc.set(TableOutputFormat.OUTPUT_TABLE, tablename)
//如果表不存在则创建目标表
   val admin = new HBaseAdmin(hconf)
   if (!admin.isTableAvailable(tablename)) {
   val td = new HTableDescriptor(TableName.valueOf(tablename))
   td.addFamily(new HColumnDescriptor( "test" ))
   admin.createTable(tableDesc)
    }
//接收kafka传过来的数据
   val kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics).map(_._2)
   kafkaStream.println
   kafkaStream.foreachRDD((rdd:RDD[String])=>{
rdd.map(i=>{
//存放到hbase中
   val put=new Put(Bytes.toBytes(System.nanoTime().toString+i))
   put.add(Bytes.toBytes("test"),Bytes.toBytes("test"),Bytes.toBytes(i))
   (new ImmutableBytesWritable, put)
}).saveAsHadoopDataset(jobConf)
})
    ssc.start()
    ssc.awaitTermination()  
}
}

2.3启动:
编译: scalac kafkaspark.scala
打包: jar cvf kafkaspark.jar *.class
提交: spark-submit --class “kafkaspark” --jars spark-streaming-kafka-assembly_2.10-1.6.3.jar kafkaspark.jar

3 日志表 hbase-hive对应:

在hive中操作(建一个hbase的外部表)

create external table rzb(key string, test string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'   
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,test:test")
TBLPROPERTIES ("hbase.table.name" = "rz");

猜你喜欢

转载自blog.csdn.net/weixin_43827745/article/details/84580648