spark streaming 同时处理两个不同kafka集群的数据

如题,总是不那么完美,要处理的数据在两个不同的kafka集群里面,日子得过,问题也得解决,我们创建两个DStream,连接两个不同的kafka集群的不同topic,然后再把这两个DStream union在一起处理,代码如下:

 
  1. package com.kingnet

  2.  
  3. import java.util

  4.  
  5. import org.apache.spark.SparkConf

  6. import org.apache.spark.streaming.kafka.KafkaUtils

  7. import org.apache.spark.streaming.{Seconds, StreamingContext}

  8. import org.joda.time.DateTime

  9. import org.joda.time.format.DateTimeFormat

  10.  
  11. import scala.collection.JavaConversions._

  12.  
  13. /** *

    扫描二维码关注公众号,回复: 3009223 查看本文章
  14. *

  15. */

  16. object IOSChannelNewActiveDids {

  17.  
  18. def createContext(params: KafkaStreamingParams) = {

  19.  
  20. // {"batchTime":5,"sources":[{"zookeeper":"name85:2181,name86:2181,name87:2181","group":"group1","topics":"test1","numThreads":"1"},{"zookeeper":"name85:2181,name86:2181,name87:2181","group":"group1","topics":"test2","numThreads":"1"}]}

  21.  
  22. val sparkConf = new SparkConf().setAppName("IOSChannelNewActiveDids")

  23. val ssc = new StreamingContext(sparkConf, Seconds(params.getBatchTime.toInt))

  24.  
  25. // ssc.checkpoint(checkpointDirectory)

  26. val rawdata = params.getSources.map(p => {

  27. val topicMap = p.getTopics.split(",").map((_, p.getNumThreads.toInt)).toMap

  28. KafkaUtils.createStream(ssc, p.getZookeeper, p.getGroup, topicMap).map(_._2)

  29. }).toSeq

  30. //把多个DStream union在一起处理。

  31. val union_rawdata = ssc.union(rawdata)

  32. union_rawdata.print()

  33. ssc

  34. }

  35.  
  36.  
  37. def main(args: Array[String]) {

  38.  
  39.  
  40. if (args.length < 1) {

  41. System.err.println("Usage: com.kingnet.IOSChannelNewActiveDids {\"batchTime\":5,\"sources\":[{\"zookeeper\":\"name85:2181,name86:2181,name87:2181\",\"group\":\"group1\",\"topics\":\"test1\",\"numThreads\":1},{\"zookeeper\":\"name85:2181,name86:2181,name87:2181\",\"group\":\"group1\",\"topics\":\"test2\",\"numThreads\":1}]}")

  42. System.exit(1)

  43. }

  44.  
  45. val params = GsonObject.getInstance().fromJson(args(0), classOf[KafkaStreamingParams])

  46. params.getSources.foreach(p => {

  47. println(p.getTopics)

  48. })

  49.  
  50. val ssc = createContext(params)

  51. ssc.start()

  52. ssc.awaitTermination()

  53.  
  54. }

  55. }

我们向args里面传递了一个json字符串作为参数,json字符串中配置了一个sources列表,里面指定了两个连接信息(我这里是测试,所以两个配置的zookerlist是相同的),然后我把这个json解析成了一个java对象:

 
  1. package com.kingnet;

  2.  
  3. import java.util.List;

  4.  
  5. /**

  6. * Created by xiaoj on 2016/7/13.

  7. */

  8. public class KafkaStreamingParams {

  9. private String batchTime;

  10. private List<KafkaParams> sources;

  11.  
  12. public String getBatchTime() {

  13. return batchTime;

  14. }

  15.  
  16. public void setBatchTime(String batchTime) {

  17. this.batchTime = batchTime;

  18. }

  19.  
  20. public List<KafkaParams> getSources() {

  21. return sources;

  22. }

  23.  
  24. public void setSources(List<KafkaParams> sources) {

  25. this.sources = sources;

  26. }

  27.  
  28. @Override

  29. public String toString() {

  30. return "KafkaStreamingParams{" +

  31. "batchTime='" + batchTime + '\'' +

  32. ", sources=" + sources +

  33. '}';

  34. }

  35.  
  36. class KafkaParams{

  37. private String zookeeper;

  38. private String group;

  39. private String topics;

  40. private String numThreads;

  41.  
  42. public String getZookeeper() {

  43. return zookeeper;

  44. }

  45.  
  46. public void setZookeeper(String zookeeper) {

  47. this.zookeeper = zookeeper;

  48. }

  49.  
  50. public String getGroup() {

  51. return group;

  52. }

  53.  
  54. public void setGroup(String group) {

  55. this.group = group;

  56. }

  57.  
  58. public String getTopics() {

  59. return topics;

  60. }

  61.  
  62. public void setTopics(String topics) {

  63. this.topics = topics;

  64. }

  65.  
  66. public String getNumThreads() {

  67. return numThreads;

  68. }

  69.  
  70. public void setNumThreads(String numThreads) {

  71. this.numThreads = numThreads;

  72. }

  73.  
  74. @Override

  75. public String toString() {

  76. return "KafkaParams{" +

  77. "zookeeper='" + zookeeper + '\'' +

  78. ", group='" + group + '\'' +

  79. ", topics='" + topics + '\'' +

  80. ", numThreads='" + numThreads + '\'' +

  81. '}';

  82. }

  83. }

  84. }


好吧,我经常这么干,在scala项目中创建java类,得益于强大的IDEA开发工具。

 
  1. package com.kingnet

  2.  
  3. import java.util

  4.  
  5. import com.google.gson.{Gson, GsonBuilder}

  6.  
  7. /**

  8. * Created by xiaoj on 2016/5/5.

  9. */

  10. object GsonObject {

  11. @volatile private var instance: Gson = null

  12.  
  13. def getInstance(): Gson = {

  14. if (instance == null) {

  15. synchronized {

  16. if (instance == null) {

  17. instance = new GsonBuilder().create()

  18. }

  19. }

  20. }

  21. instance

  22. }

  23.  
  24. def fromJson(s: String): Option[util.HashMap[String, Any]] = {

  25. try {

  26. Some(getInstance().fromJson(s,classOf[util.HashMap[String, Any]]))

  27. } catch {

  28. case e: Exception =>

  29. e.printStackTrace()

  30. None

  31. }

  32. }

  33. def toJson(src:Any) = {

  34. getInstance().toJson(src)

  35. }

  36. }


运行程序,传递一个json参数:{\"batchTime\":\"10\",\"sources\":[{\"zookeeper\":\"name85:2181,name86:2181,name87:2181\",\"group\":\"group1\",\"topics\":\"test1\",\"numThreads\":"1"},{\"zookeeper\":\"name85:2181,name86:2181,name87:2181\",\"group\":\"group1\",\"topics\":\"test2\",\"numThreads\":"1"}]}

打开两个kafka 的console producer分别往test1和test2两个topic里面写数据,然后在streaming程序控制台就会打印出接收到的消息了。

猜你喜欢

转载自blog.csdn.net/yisun123456/article/details/81906425