浅析CapacityScheduler调度模式下: 第一批启动map任务的container数量

一     前言:

1.        语术:

  • CS: Capacity Scheduler的简称。
  • Hadoop 版本: Version 2.7.1
  • 相关的代码文件: LeafQueue.java, Resources.java, ResourceCalculator.java, DefaultResourceCalculator.java, DominantResourceCalculator.java,RMContainerState.java
  • YARN中的Container有如下状态:定义在RMContainerState.java文件中
public enum RMContainerState {
  NEW, 
  RESERVED, 
  ALLOCATED, 
  ACQUIRED, 
  RUNNING, 
  COMPLETED, 
  EXPIRED, 
  RELEASED, 
  KILLED
}

        Container正常会按这6个状态变化: NEW => ALLOCATED => ACQUIRED => RUNNING => COMPLETED => Released

2.        简介

本篇主要介绍CS调度模式下:两种资源计算模式DefaultResourceCalculatorDominantResourceCalculator。 默认DefaultResourceCalculator 计算方式时,node第一次启动map的container数量是参数由yarn.nodemanager.resource.memory-mb决定。而配置成DominantResourceCalculator计算方式时,node第一次启动map的container数量是参数由yarn.nodemanager.resource.memory-mb和yarn.nodemanager.resource.cpu-vcores共同决定. 这也关系到Hadoop job的performance. 

二     内容:

1.        流程简介:故事要从文件LeafQueue.java说起,从Hadoop的log文件(yarn-xxx-resourcemanager-localhost.localdomain.log)可以清楚的看到container的资源申请最终要落实到LeafQueue(也就是defaultqueue,默认只有parent queue: root和leaf queue: default)。如下图:

跳进LeafQueue.java的代码,看private Resource assignContainer(。。。)函数。

 private Resource assignContainer(ResourceclusterResource,FiCaSchedulerNodenode,

     FiCaSchedulerApp application, Priority priority,

     ResourceRequest request, NodeType type,RMContainerrmContainer,

     MutableObject createdContainer, ResourceLimits currentResoureLimits)

{

。。。

//下面代码比较请求的资源capability(吐槽一下capability这个变量名,叫requestResource比较好)和node上的totalResource。如果requestResource超过了totalResource,就打出warn返回Resources.none了。其中lessThanOrEqual函数定义在Resources.java中,最终会调用到DefaultResourceCalculator或者DominantResourceCalculatorcompare(Resource clusterResource, Resource lhs, Resource rhs)函数。

 if (!Resources.lessThanOrEqual(resourceCalculator, clusterResource,

        capability, totalResource)) {

     LOG.warn("Node : " + node.getNodeID()

          + " does not have sufficientresource for request : " + request

          + " node total capability :" + node.getTotalResource());

     return Resources.none();

}

//下面代码判断availablenode上的availableResource)是否大于0。如果该变量小于0,就返回了。

assert Resources.greaterThan(

        resourceCalculator, clusterResource, available,Resources.none());

 。。。

//下面的代码判断availablenode上的availableResource)是否大于capabilityrequestResource)。如果大于表示有可用的resource,进入代码 if 立即分配资源。

// Can we allocate a container on thisnode?
   int availableContainers =
       resourceCalculator.computeAvailableContainers(available,capability);
。。。
if (availableContainers> 0) {
。。。
}else{
      // if we are allowed to allocate but thisnode doesn't have space, reserve it or
      // if this was an already a reservedcontainer, reserve it again
。。。
}

}

//到此 private Resource assignContainer(。。。)函数分析结束。

2.        CS相关的.xml文件(yarn-site.xml和capacity-scheduler.xml)

yarn-site.xml:

<property>
    <name> yarn.resourcemanager.scheduler.class</name>  
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>

capacity-scheduler.xml:

<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>

    <description>
      The ResourceCalculator implementation tobe used to compare
      Resources in the scheduler.
      The default i.e.DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator usesdominant-resource to compare
      multi-dimensional resources such asMemory, CPU etc.
    </description>
  </property>

其中yarn.scheduler.capacity.resource-calculator可以配置为值:org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

3.        DefaultResourceCalculator 与DominantResourceCalculator的不同主要体现在它们实现了抽象类父类ResourceCalculator的一些计算和比较函数。例如:DefaultResourceCalculator 与DominantResourceCalculator的public intcompare(Resource clusterResource, Resource lhs, Resource rhs) {…}函数,这个函数它们分别实现了虚函数public abstract int compare(Resource clusterResource, Resource lhs, Resourcerhs);  

1)        DefaultResourceCalculator类的compare函数码只比较了内存

  public int compare(Resourceunused, Resourcelhs, Resourcerhs) {
    // Only considermemory
    returnlhs.getMemory() -rhs.getMemory();
  }

2)        DominantResourceCalculator类的compare函数码把内存和CPUvcores都比较了。

public int compare(ResourceclusterResource,Resource lhs,Resource rhs) {
     if (lhs.equals(rhs)) {
      return 0;
    }     
  if (isInvalidDivisor(clusterResource)) {
     。。。
    }
    floatl = getResourceAsValue(clusterResource,lhs,true);
    floatr = getResourceAsValue(clusterResource,rhs,true); 
    if (l <r) {
      return -1;
    } elseif (l >r) {
      return 1;
    } else {
      l = getResourceAsValue(clusterResource,lhs,false);
      r = getResourceAsValue(clusterResource,rhs,false);
      if (l <r) {
        return -1;
      } elseif (l >r) {
        return 1;
      }
    }
       return 0;
  }

3)        DefaultResourceCalculator 和 DominantResourceCalculator的computeAvailableContainers函数也是类似的。如下:

DefaultResourceCalculator类:

 public int computeAvailableContainers(Resourceavailable, Resource required) {

    // Only consider memory

    return available.getMemory() / required.getMemory();

  }

DominantResourceCalculator类:

  public int computeAvailableContainers(Resourceavailable,Resource required) {

    return Math.min(

        available.getMemory() /required.getMemory(),

        available.getVirtualCores() /required.getVirtualCores());

  }

4.     CS调度模式下,设置DominantResourceCalculator计算方式时,配置yarn.nodemanager.resource.cpu-vcores的值大于 lscpu中看的物理CPU(s)是否会带来更好的performance? 答案还得看上面贴的代码,此处再贴一遍:

DominantResourceCalculator类:

  public int computeAvailableContainers(Resourceavailable,Resource required) {

    return Math.min(

        available.getMemory() /required.getMemory(),

        available.getVirtualCores() /required.getVirtualCores());

  }

在比较 可用resource 和 请求resource 时,是取内存和CPU 的(可用resource/请求resource)比 的 最小值,也就是说只有CPU这个比例成短板时,配置yarn.nodemanager.resource.cpu-vcores的值大于实际CPU core会到来理论上的提高,具体要看CPU处理速度。



 5.  附录

1)相关的代码:

Resources.java:

 public static boolean lessThanOrEqual(

      ResourceCalculator resourceCalculator,

      Resource clusterResource,

      Resource lhs, Resource rhs) {

    return(resourceCalculator.compare(clusterResource, lhs, rhs) <= 0);

  }

2)Hadoop权威指南第四版: Page 297
                                                                                WARNING
While the number of cores is tracked during scheduling (so a container won’t be allocated on a machine where thereare no spare cores, for example), the node manager will not, by default, limit actual CPU usage of running containers.This means that a container can abuse its allocation by using more CPU than it was given, possibly starving other containers running on the same host. YARN has support for enforcing CPU limits using Linux cgroups. The node
manager’s container executor class (yarn.nodemanager.container-executor.class) must be set to use the
LinuxContainerExecutor class, which in turn must be configured to use cgroups (see the properties under
yarn.nodemanager.linux-container-executor).

猜你喜欢

转载自blog.csdn.net/don_chiang709/article/details/80083730
今日推荐