hadoop yarn-distributedshell

  this application is introduced to run shell command in distributed nodes(containers) as it named,so it's is ealy and let's to go ahead.

1.run 'ls' command in containers

2.which path does that command run on ?

3.how to run meaningful commands depend on nodes

1.run 'ls' command in containers

  

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command ls -num_containers 1 -container_memory 300 -master_memory 400 

   so the command 'ls' will run on any containers .and the result will like this:

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000002/stdout
container_tokens
default_container_executor.sh
launch_container.sh
tmp

   why this file contains these content?u can lookk into the <nodemanager.log> 

2015-06-04 15:55:10,424 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/AppMaster.jar(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/10/AppMaster.jar) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,502 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/shellCommands(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/11/shellCommands) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,644 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/container_1433403689317_0001_01_000001/default_container_executor.sh]

   u will see there is a file named 'defaultc_container_executor.sh' placed in the working dir(current container name).so the result from this command is correct.

2.which path does that command run on ?

  yes,the result is absoulte right,but how to verify to current working dir is lied in 'container_1433385109839_0001_01_000001'?

  of course,it 's simple too,u can use 'pwd' instead of 'ls' for the shell_command param.

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command pwd -num_containers 1 -container_memory 300 -master_memory 400 

  now ,check out the stdout file,the result will like this:

/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0002/container_1433403689317_0002_01_000002

   but this time,the dir is bit differences from point 1,as this is the second app;) 

3.how to run meaningful commands depend on nodes

  but u if want to use a *custom script*(use some params in command params) to run on *node-specified*(ie different result for different nodes),u can use a script file to achieve this:

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_script ls-command.sh -num_containers 1 -container_memory 300 -master_memory 400

   and the file 'ls-command.sh' is simple: 

ls -al /tmp/

   yep,this file must be alllowed to be executable,so do it prior to run this command: 

chmod +x ls-command.sh

    

appendix:

A.  from the <nodemanager.log>,we found this info:

2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1433403689317_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001

  so if u check out the final dir appache,nothing will be there:

ll /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/
total 0

  

B.the AM is responsible for setupping the containers.yeah,finally the NM will startup the containers

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000001/AppMaster.stderr 
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Initializing ApplicationMaster
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Application master for app, appId=1, clustertimestamp=1433385109839, attemptId=1
2015-06-04 12:26:09.755 java[1261:1903] Unable to load realm info from SCDynamicStore
15/06/04 12:26:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/04 12:26:10 INFO impl.TimelineClientImpl: Timeline service is not enabled
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Starting ApplicationMaster
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Executing with tokens:
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@7950d786)
15/06/04 12:26:10 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8030
15/06/04 12:26:10 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
15/06/04 12:26:10 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Received 0 previous AM's running containers on AM registration.
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:12 INFO impl.AMRMClientImpl: Received new token for : localhost:52226
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000002, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.ContainerManagementProtocolProxy: Opening proxy : localhost:52226
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000003, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000003, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
15/06/04 12:26:14 INFO impl.ContainerManagementProtocolProxy: Closing proxy : localhost:52226
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
15/06/04 12:26:14 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/06/04 12:26:15 INFO distributedshell.ApplicationMaster: Application Master completed successfully. exiting

   and always the AM will start previously at first container then others.

C.questions:my macbook pro is configured by 8g ram and i5(2.4g) two cores cpu,but i found i got a 32 vcores from above:

15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32

   anyone knows that?so i will dig into it tomorrow.. 

   after i recreated a new job on a big cluster(32g mem,8 cpus),these info were kept the same,so i thought these are the config values set in code or xml.

  today,i dig into 'CapacityScheduler#getMaximumAllocation()'

  public Resource getMaximumAllocation() {
    int maximumMemory = getInt(
        YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB);
    int maximumCores = getInt(
        YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES);
    return Resources.createResource(maximumMemory, maximumCores);
  }
  public Resource getMinimumAllocation() {
    int minimumMemory = getInt(
        YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
    int minimumCores = getInt(
        YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES);
    return Resources.createResource(minimumMemory, minimumCores);
  }
    
case property default in code default in xml description  
max xx.scheduler.maximum-allocation-mb 8g 8g

max ram per container.

The maximum allocation for every container request at the RM,

    in MBs. Memory requests higher than this won't take effect,

    and will get capped to this value

 
  xx.scheduler.maximum-allocation-vcores 4cores 32 cores max vcores per container.

The maximum allocation for every container request at the RM,

    in terms of virtual CPU cores. Requests higher than this won't take effect,

    and will get capped to this value

 
min xx.scheduler.minimum-allocation-mb 1g 1g    
  xx.scheduler.minimum-allocation-vcore 1core 1core    

 of course ,there are some questions lied there:

 1.if a node configed 4g,and sure a max-allocation-mb should be less or equals than 4g,but now,my task need 5g to run on it,how about it?i think this node will never run any tasks.so a fix resolution is necessary,e.g:

// A resource ask cannot exceed the max. 
    if (amMemory > maxMem) {
      LOG.info("AM memory specified above max threshold of cluster. Using max value."
          + ", specified=" + amMemory
          + ", max=" + maxMem);
      amMemory = maxMem;
    }

D.container id does not restrictly follow the app attempt id But app id

  container id

container_1433385109839_0001_01_000003

  app attempt id

application_1433385109839_0001_00001

  app id

application_1433385109839_0001

  since one app maybe contain multi attempts,so the container must bind to app id instead of attempt id for umbilical relationship.

 ref:

http://dongxicheng.org/mapreduce-nextgen/how-to-run-distributedshell/

猜你喜欢

转载自leibnitz.iteye.com/blog/2217103