hadoop yarn-distributedshell

this application is introduced to run shell command in distributed nodes(containers) as it named,so it's is ealy and let's to go ahead.

1.run 'ls' command in containers

2.which path does that command run on ?

3.how to run meaningful commands depend on nodes

1.run 'ls' command in containers

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command ls -num_containers 1 -container_memory 300 -master_memory 400

so the command 'ls' will run on any containers .and the result will like this:

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000002/stdout
container_tokens
default_container_executor.sh
launch_container.sh
tmp

why this file contains these content?u can lookk into the <nodemanager.log>

2015-06-04 15:55:10,424 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/AppMaster.jar(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/10/AppMaster.jar) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,502 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/shellCommands(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/11/shellCommands) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,644 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/container_1433403689317_0001_01_000001/default_container_executor.sh]

u will see there is a file named 'defaultc_container_executor.sh' placed in the working dir(current container name).so the result from this command is correct.

2.which path does that command run on ?

yes,the result is absoulte right,but how to verify to current working dir is lied in 'container_1433385109839_0001_01_000001'?

of course,it 's simple too,u can use 'pwd' instead of 'ls' for the shell_command param.

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command pwd -num_containers 1 -container_memory 300 -master_memory 400

now ,check out the stdout file,the result will like this:

/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0002/container_1433403689317_0002_01_000002

but this time,the dir is bit differences from point 1,as this is the second app;)

3.how to run meaningful commands depend on nodes

but u if want to use a *custom script*(use some params in command params) to run on *node-specified*(ie different result for different nodes),u can use a script file to achieve this:

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_script ls-command.sh -num_containers 1 -container_memory 300 -master_memory 400

and the file 'ls-command.sh' is simple:

ls -al /tmp/

yep,this file must be alllowed to be executable,so do it prior to run this command:

chmod +x ls-command.sh

appendix:

A. from the <nodemanager.log>,we found this info:

2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1433403689317_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001

so if u check out the final dir appache,nothing will be there:

ll /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/
total 0

B.the AM is responsible for setupping the containers.yeah,finally the NM will startup the containers

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000001/AppMaster.stderr 
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Initializing ApplicationMaster
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Application master for app, appId=1, clustertimestamp=1433385109839, attemptId=1
2015-06-04 12:26:09.755 java[1261:1903] Unable to load realm info from SCDynamicStore
15/06/04 12:26:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/04 12:26:10 INFO impl.TimelineClientImpl: Timeline service is not enabled
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Starting ApplicationMaster
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Executing with tokens:
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@7950d786)
15/06/04 12:26:10 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8030
15/06/04 12:26:10 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
15/06/04 12:26:10 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Received 0 previous AM's running containers on AM registration.
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:12 INFO impl.AMRMClientImpl: Received new token for : localhost:52226
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000002, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.ContainerManagementProtocolProxy: Opening proxy : localhost:52226
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000003, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000003, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
15/06/04 12:26:14 INFO impl.ContainerManagementProtocolProxy: Closing proxy : localhost:52226
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
15/06/04 12:26:14 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/06/04 12:26:15 INFO distributedshell.ApplicationMaster: Application Master completed successfully. exiting

and always the AM will start previously at first container then others.

C.questions:my macbook pro is configured by 8g ram and i5(2.4g) two cores cpu,but i found i got a 32 vcores from above:

15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32

anyone knows that?so i will dig into it tomorrow..

after i recreated a new job on a big cluster(32g mem,8 cpus),these info were kept the same,so i thought these are the config values set in code or xml.

today,i dig into 'CapacityScheduler#getMaximumAllocation()'

  public Resource getMaximumAllocation() {
    int maximumMemory = getInt(
        YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB);
    int maximumCores = getInt(
        YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES);
    return Resources.createResource(maximumMemory, maximumCores);
  }

  public Resource getMinimumAllocation() {
    int minimumMemory = getInt(
        YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
    int minimumCores = getInt(
        YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES,
        YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES);
    return Resources.createResource(minimumMemory, minimumCores);
  }

case	property	default in code	default in xml	description
max	xx.scheduler.maximum-allocation-mb	8g	8g	max ram per container. The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value
	xx.scheduler.maximum-allocation-vcores	4cores	32 cores	max vcores per container. The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value
min	xx.scheduler.minimum-allocation-mb	1g	1g
	xx.scheduler.minimum-allocation-vcore	1core	1core

of course ,there are some questions lied there:

1.if a node configed 4g,and sure a max-allocation-mb should be less or equals than 4g,but now,my task need 5g to run on it,how about it?i think this node will never run any tasks.so a fix resolution is necessary,e.g:

// A resource ask cannot exceed the max. 
    if (amMemory > maxMem) {
      LOG.info("AM memory specified above max threshold of cluster. Using max value."
          + ", specified=" + amMemory
          + ", max=" + maxMem);
      amMemory = maxMem;
    }

D.container id does not restrictly follow the app attempt id But app id

container id

container_1433385109839_0001_01_000003

app attempt id

application_1433385109839_0001_00001

app id

application_1433385109839_0001

since one app maybe contain multi attempts,so the container must bind to app id instead of attempt id for umbilical relationship.

ref:

http://dongxicheng.org/mapreduce-nextgen/how-to-run-distributedshell/

hadoop yarn-distributedshell

猜你喜欢