如何获取hadoop mapreduce job运行信息

需要获取mapreduce的运行信息,比如运行状态,map,reduce的执行进度.
hadoop 50030端口提供web ui服务,没找到提供json或者xml的服务方式.
于是,查找hadoop 50030的加载:
\org\apache\hadoop\mapred\JobTracker.java:
JobTracker(final JobConf conf, String identifier, Clock clock, QueueManager qm)
-->
private void createInstrumentation()
-->
...
    String infoAddr = 
      NetUtils.getServerAddress(conf, "mapred.job.tracker.info.bindAddress",
                                "mapred.job.tracker.info.port",
                                "mapred.job.tracker.http.address");
    InetSocketAddress infoSocAddr = NetUtils.createSocketAddr(infoAddr);
    String infoBindAddress = infoSocAddr.getHostName();
    int tmpInfoPort = infoSocAddr.getPort();
    this.startTime = clock.getTime();
    infoServer = new HttpServer("job", infoBindAddress, tmpInfoPort, 
        tmpInfoPort == 0, conf, aclsManager.getAdminsAcl());
    infoServer.setAttribute("job.tracker", this); 
...

在这里JobTracker启了一个提供http服务的Jetty Server,并且设置了这个jetty实例infoServer的application属性job.tracker为 jobtracker本身(this)

在JobTracker运行节点上有jsp页面:
$HADOOP_HOME/webapps/job/jobtracker.jsp

  
...
  JobTracker tracker = (JobTracker) application.getAttribute("job.tracker");
  ClusterStatus status = tracker.getClusterStatus();
  ClusterMetrics metrics = tracker.getClusterMetrics();
  String trackerName =
           StringUtils.simpleHostname(tracker.getJobTrackerMachine());
  JobQueueInfo[] queues = tracker.getQueues();
  Vector<JobInProgress> runningJobs = tracker.runningJobs();
  Vector<JobInProgress> completedJobs = tracker.completedJobs();
  Vector<JobInProgress> failedJobs = tracker.failedJobs();
...


从这里获取了jobtracker对象,对此对象操作可获取到job的执行信息.

所以:
尝试仿照jobtracker.jsp页面写满足自己需求的jobtracker_1.jsp
直接拷贝一个测试一下.
访问: :50030/jobtracker_1.jsp 不成功.
设置WEB-INF/web.xml加入对url和servlet的对应关系,发现需要编译出org.apache.hadoop.mapred.jobtracker_1_jsp的类
而且,系统自带的这些jsp文件都以_jsp.class的形式存在于hadoop-core.jar里了.

jsp编译为servlet一遍是中间件(tomcat.resin)直接做的事情,而且每种中间件编译出来的类包是不同的,比如一般放到:
org.apache.jsp包下. 如何放到prg.apache.hadoop.mapred下呢?

先试一下,自己写一个jsp页面让resin,tomcat编译为class,再在web,xml中配置为他org.apache.jsp.xxxx可否.

在写jsp的时候发现,要获取JobTracker的信息.里面有很多变量诸如jobtracker.conf,这些变量都是包外不可见的.所以还不能把这个jsp对应的servlet编译为别的包下的类,只能编译到org.apache.hadoop.mapred包里.

搜索如何手动编译jsp为servlet,参考如下文章:
http://blog.csdn.net/codolio/article/details/5177236
通过org.apache.jasper.JspC来手工编译jsp为servlet.

java -cp /opt/jars/ant.jar:/opt/hadoop-1.0.4/lib/commons-logging-1.1.1.jar:/opt/hadoop-1.0.4/hadoop-ant-1.0.4.jar:/opt/hadoop-1.0.4/lib/commons-el-1.0.jar:/opt/hadoop-1.0.4/lib/jasper-compiler-5.5.12.jar:/opt/hadoop-1.0.4/lib/jasper-runtime-5.5.12.jar:/opt/hadoop-1.0.4/lib/servlet-api-2.5-20081211.jar:/opt/hadoop-1.0.4/lib/jsp-2.1/jsp-api-2.1.jar:/opt/hadoop-1.0.4/lib/commons-io-2.1.jar  org.apache.jasper.JspC-classpath /opt/hadoop-1.0.4/hadoop-core-1.0.4.jar:/opt/hadoop-1.0.4/hadoop-ant-1.0.4.jar:/opt/hadoop-1.0.4/lib/commons-logging-1.1.1.jar:/opt/hadoop-1.0.4/lib/commons-logging-api-1.0.4.jar:/opt/hadoop-1.0.4/lib/log4j-1.2.15.jar:/opt/hadoop-1.0.4/lib/commons-io-2.1.jar  -p org.apache.hadoop.mapred -compile -v -d dist -uriroot ./ -webxml dist/web.xml  xxxxx.jsp
注意:
-p参数可以指定编译出来的servlet的package

通过这种方法应该能得到.class和.java,但我却只得到了.java,我理解可能是因为我引用的jar包不全无法通过编译,没有报错信息,很奇怪.

于是基于这个.java文件自己编译为class打成jar包,放到了$HADOOP_HOME/lib/下.
配置job/WEB-INF/web.xml文件为自己编译的类

ok

猜你喜欢

转载自kissmett.iteye.com/blog/1895314
今日推荐