Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not se

Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not se
当从SparkSql得到的dataFrame,映射成RDD之后向hbase中直接保存数据的时候报错:


Exception in thread "main" org.apache.hadoop.mapred.InvalidJobConfException: Output directory not set in JobConf.


采用的是saveAsNewApiHadoopDataSet 


import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import SparkContext._
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Put
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.conf.Configuration
 


但是更换为saveAsHadoopDataset就可以使用,


//    var conf = new Configuration()
//    var tableName = "test_t1"
//    val jobConf = new JobConf(conf,this.getClass)
//    jobConf.set("hbase.zookeeper.quorum","10.172.10.169,10.172.10.168,10.172.10.170")
//    jobConf.setOutputKeyClass(classOf[ImmutableBytesWritable])
////    jobConf.setOutputValueClass(classOf[Put])
//    jobConf.setOutputFormat(classOf[org.apache.hadoop.hbase.mapred.TableOutputFormat])
//    jobConf.set(TableOutputFormat.OUTPUT_TABLE,"test_t1")
//    rdd1.map(
//      x => {
//        var put = new Put(Bytes.toBytes(x._1))
//        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("c1"), Bytes.toBytes(x._2))
//        (new ImmutableBytesWritable,put)
//      }    
//    ).saveAsHadoopDataset(jobConf)
    
不知道在哪里出现了错误,对比发现应该是 没有使用sc.hadoopConfiguration,而是使用的JobConf 作为参数,新API不能用旧的configuration 。


//    sc.hadoopConfiguration.set("hbase.zookeeper.quorum","10.172.10.169,10.172.10.168,10.172.10.170")
////    sc.hadoopConfiguration.set("zookeeper.znode.parent","/hbase")
//    sc.hadoopConfiguration.set(TableOutputFormat.OUTPUT_TABLE,"test_t1")
//    var job = new Job(sc.hadoopConfiguration)
//    job.setOutputKeyClass(classOf[ImmutableBytesWritable])
//    job.setOutputValueClass(classOf[Result])
//    job.setOutputFormatClass(classOf[TableOutputFormat[ImmutableBytesWritable]])
//    
//    rdd1.map(
//      x => {
//        var put = new Put(Bytes.toBytes(x._1))
//        put.addColumn(Bytes.toBytes("f1"), Bytes.toBytes("c1"), Bytes.toBytes(x._2))
//        (new ImmutableBytesWritable,put)
//      }    
//    ).saveAsNewAPIHadoopDataset(job.getConfiguration)

猜你喜欢

转载自blog.csdn.net/mtj66/article/details/80567168