2020/11/27 [email protected]

文章目录

Hadoop Benchmarking

Hadoop Benchmarking

一、调试集群

在开始测试之前应当启用HDFS服务以及YARN服务

在启动yarn服务时发现resourceManager启动不了，通过查看日志发现错误：

2020-11-23 15:56:44,775 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2020-11-23 15:56:45,408 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/home/bduser101/modules/hadoop/etc/hadoop/core-site.xml
2020-11-23 15:56:45,931 FATAL org.apache.hadoop.conf.Configuration: error parsing conf java.io.BufferedInputStream@74294adb
org.xml.sax.SAXParseException; lineNumber: 19; columnNumber: 38; An 'include' failed, and no 'fallback' element was found.

这个错误来自配置联邦时修改了core-site.xml中的引入文件，将mountTable.xml文件中的配置都写入core-site.xml中，并将该文件同步至所有节点之后，即可正常启动yarn服务。

【错误来源】
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="mountTable.xml"/>

更正后：

二、测试组件

当我们部署完一个新的集群，或者对集群做了升级，或调整集群中的性能参数后，想观察集群性能的变化，那么我们就需要一些集群测试工具。

hadoop自带的测试包，在这个测试包下有很多测试工具，其中DFSCIOTest、mrbench、nnbench应用广泛。

$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar

DFSCIOTest: Distributed i/o benchmark of libhdfs.

（测试libhdfs中的分布式I/O的基准。Libhdfs是一个为C/C++应用程序提供HDFS文件服务的共享库。）
DistributedFSCheck: Distributed checkup of the file system consistency.

（文件系统一致性的分布式检查）
JHLogAnalyzer: Job History Log analyzer.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
NNdataGenerator: Generate the data to be used by NNloadGenerator
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
SliveTest: HDFS Stress Test and Live Data Verification.
TestDFSIO: Distributed i/o benchmark.

（分布式的I/O基准）
fail: a job that always fails
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

（测量HDFS的吞吐量）
largesorter: Large-Sort tester
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
minicluster: Single process HDFS and MR cluster.
mrbench: A map/reduce benchmark that can create many small jobs

（创建大量小作业的MapReduce基准）
nnbench: A benchmark that stresses the namenode.

（NameNode的性能基）
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.

（文件系统读写测试）
testmapredsort: A map/reduce program that validates the map-reduce framework’s sort.

（用于校验MapReduce框架的排序的程序）
testsequencefile: A test for flat files of binary key value pairs.

（对包含二进制键值对的文本文件的测试）
testsequencefileinputformat: A test for sequence file input format.

（对序列文件输入格式的测试）
testtextinputformat: A test for text input format.

（对文本输入格式的测试。）
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

（对比输出一个排序块的Map作业和输出多个排序块的Map作业的性能）

2.1、TestDFSIO

TestDFSIO用于测试HDFS的IO性能，使用一个MapReduce作业来并发地执行读写操作，每个map任务用于读或写每个文件，map的输出用于收集与处理文件相关的统计信息，reduce用于累积统计信息，并产生summary。

TestDFSIO的用法如下：

$>:hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO
Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]

在测试程序执行结束之后会在本地文件目录下生成文件TestDFSIO_results.log，可以查看运行结果的日志

2.1.1、向HDFS上传10个100MB的文件

$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
20/11/23 17:03:48 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/23 17:03:48 INFO fs.TestDFSIO: nrFiles = 10
20/11/23 17:03:48 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/23 17:03:48 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/23 17:03:48 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/23 17:03:50 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

遇到错误WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Method)不用慌，根据网上大多数人的情况来看，这是hadoop的bug

20/11/23 17:04:39 INFO mapreduce.Job:  map 0% reduce 0%
20/11/23 17:05:03 INFO mapreduce.Job:  map 13% reduce 0%
20/11/23 17:05:20 INFO mapreduce.Job:  map 17% reduce 0%
20/11/23 17:05:21 INFO mapreduce.Job:  map 20% reduce 0%
20/11/23 17:05:35 INFO mapreduce.Job:  map 20% reduce 7%
20/11/23 17:05:44 INFO mapreduce.Job:  map 27% reduce 7%
20/11/23 17:05:50 INFO mapreduce.Job:  map 30% reduce 10%
20/11/23 17:05:58 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:06:52 INFO mapreduce.Job:  map 80% reduce 10%
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull

遇到错误org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull；这是关于平衡器的bug 具体参照官方文档https://issues.apache.org/jira/browse/hdfs-8093

排除办法有

系统或hdfs是否有空间

修改配置文件core-site.xml 将配置项fs.default.name从viewfs://my-cluser改为hdfs://node101:8020
```
hadoop/bin$>./hdfs dfsadmin -report
```
结果显示集群剩余空间仍然有很多
datanode数是否正常
是否在safemode
防火墙关闭
配置方面
把NameNode的tmp文件清空，然后重新格式化NameNode

20/11/23 17:06:53 INFO mapreduce.Job:  map 77% reduce 10%
20/11/23 17:07:14 INFO mapreduce.Job:  map 80% reduce 10%
20/11/23 17:07:27 INFO mapreduce.Job:  map 90% reduce 13%
Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

遇到错误Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.

这个问题实际上就是data stream操作过程中文件被删掉了。，通常是因为Mapred多个task操作同一个文件，一个task完成后删掉文件导致
此错误与hadoop的特性有关:Hadoop不会尝试诊断和修复运行缓慢的任务，而是尝试检测（推测）它们并为其运行备份任务。真正的原因是，在任务执行缓慢的情况下，Hadoop运行另一个任务以执行相同的操作（在我的情况下是将数据保存在hadoop的文件系统中），当两个相同的任务中的一个完成时，将删除一些临时文件，另一个任务完成之后将会删除同样的临时文件，所以这样会造成这种错误
这个错误本身并不会影响该测试程序的运行结果，可以忽略。可以通过关闭spark和hadoop的推测来解决此问题：

程序运行结束之后会有以下信息打印，包括该测试程序在运行期间的mapReduce任务，吞吐量，速率等数据

20/11/23 17:07:28 INFO mapreduce.Job:  map 87% reduce 13%
20/11/23 17:07:31 INFO mapreduce.Job:  map 90% reduce 13%
20/11/23 17:07:32 INFO mapreduce.Job:  map 93% reduce 13%
20/11/23 17:07:33 INFO mapreduce.Job:  map 97% reduce 13%
20/11/23 17:07:34 INFO mapreduce.Job:  map 100% reduce 13%
20/11/23 17:07:36 INFO mapreduce.Job:  map 100% reduce 100%
20/11/23 17:07:37 INFO mapreduce.Job: Job job_1606119502234_0004 completed successfully
20/11/23 17:07:38 INFO mapreduce.Job: Counters: 57
        File System Counters
                FILE: Number of bytes read=857
                FILE: Number of bytes written=1377714
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2330
                HDFS: Number of bytes written=1048576078
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Failed map tasks=2
                Killed map tasks=6
                Launched map tasks=19
                Launched reduce tasks=1
                Other local map tasks=1
                Data-local map tasks=18
                Total time spent by all maps in occupied slots (ms)=1723294
                Total time spent by all reduces in occupied slots (ms)=133402
                Total time spent by all map tasks (ms)=1723294
                Total time spent by all reduce tasks (ms)=133402
                Total vcore-milliseconds taken by all map tasks=1723294
                Total vcore-milliseconds taken by all reduce tasks=133402
                Total megabyte-milliseconds taken by all map tasks=1764653056
                Total megabyte-milliseconds taken by all reduce tasks=136603648
        Map-Reduce Framework
                Map input records=10
                Map output records=50
                Map output bytes=751
                Map output materialized bytes=911
                Input split bytes=1210
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=911
                Reduce input records=50
                Reduce output records=5
                Spilled Records=100
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=71333
                CPU time spent (ms)=66340
                Physical memory (bytes) snapshot=1764884480
                Virtual memory (bytes) snapshot=22712225792
                Total committed heap usage (bytes)=2045894656
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

测试日志

20/11/23 17:07:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/11/23 17:07:38 INFO fs.TestDFSIO:             Date & time: Mon Nov 23 17:07:38 CST 2020
20/11/23 17:07:38 INFO fs.TestDFSIO:         Number of files: 10
20/11/23 17:07:38 INFO fs.TestDFSIO:  Total MBytes processed: 1000		
20/11/23 17:07:38 INFO fs.TestDFSIO:       Throughput mb/sec: 2.09		吞吐量
20/11/23 17:07:38 INFO fs.TestDFSIO:  Average IO rate mb/sec: 3.48		平均IO速率
20/11/23 17:07:38 INFO fs.TestDFSIO:   IO rate std deviation: 2.52		IO率STD偏差
20/11/23 17:07:38 INFO fs.TestDFSIO:      Test exec time sec: 226.16	测试执行时间秒
20/11/23 17:07:38 INFO fs.TestDFSIO:

在公司测试集群执行相同的测试10次之后的统计分析

2.1.2、从HDFS读取10个1000MB的文件

在读取之前应当运行上一个测试用例，以生成数据

$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000

20/11/24 15:16:24 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:16:24 INFO fs.TestDFSIO: nrFiles = 10
20/11/24 15:16:24 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/24 15:16:24 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:16:24 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:16:26 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files

依然会遇到之前上传文件时的Exception：WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Method)

20/11/24 15:16:29 INFO mapreduce.JobSubmitter: number of splits:10
20/11/24 15:16:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0004
20/11/24 15:16:29 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0004
20/11/24 15:16:29 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0004/
20/11/24 15:16:29 INFO mapreduce.Job: Running job: job_1606189764138_0004
20/11/24 15:16:43 INFO mapreduce.Job: Job job_1606189764138_0004 running in uber mode : false
20/11/24 15:16:43 INFO mapreduce.Job:  map 0% reduce 0%
20/11/24 15:17:18 INFO mapreduce.Job:  map 27% reduce 0%
20/11/24 15:17:20 INFO mapreduce.Job:  map 40% reduce 0%
20/11/24 15:17:38 INFO mapreduce.Job:  map 80% reduce 0%
20/11/24 15:17:43 INFO mapreduce.Job:  map 80% reduce 13%
20/11/24 15:17:45 INFO mapreduce.Job:  map 97% reduce 13%
20/11/24 15:17:47 INFO mapreduce.Job:  map 100% reduce 13%
20/11/24 15:17:49 INFO mapreduce.Job:  map 100% reduce 100%
20/11/24 15:17:51 INFO mapreduce.Job: Job job_1606189764138_0004 completed successfully
20/11/24 15:17:51 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=847
                FILE: Number of bytes written=1377672
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=962493722
                HDFS: Number of bytes written=78
                HDFS: Number of read operations=53
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Launched map tasks=11
                Launched reduce tasks=1
                Data-local map tasks=11
                Total time spent by all maps in occupied slots (ms)=522728
                Total time spent by all reduces in occupied slots (ms)=23213
                Total time spent by all map tasks (ms)=522728
                Total time spent by all reduce tasks (ms)=23213
                Total vcore-milliseconds taken by all map tasks=522728
                Total vcore-milliseconds taken by all reduce tasks=23213
                Total megabyte-milliseconds taken by all map tasks=535273472
                Total megabyte-milliseconds taken by all reduce tasks=23770112
        Map-Reduce Framework
                Map input records=10
                Map output records=50
                Map output bytes=741
                Map output materialized bytes=901
                Input split bytes=1210
                Combine input records=0
                Combine output records=0
                Reduce input groups=5
                Reduce shuffle bytes=901
                Reduce input records=50
                Reduce output records=5
                Spilled Records=100
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=16407
                CPU time spent (ms)=19210
                Physical memory (bytes) snapshot=1245241344
                Virtual memory (bytes) snapshot=22678716416
                Total committed heap usage (bytes)=1248374784
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

测试日志

20/11/24 15:17:51 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
20/11/24 15:17:51 INFO fs.TestDFSIO:             Date & time: Tue Nov 24 15:17:51 CST 2020
20/11/24 15:17:51 INFO fs.TestDFSIO:         Number of files: 10
20/11/24 15:17:51 INFO fs.TestDFSIO:  Total MBytes processed: 917.9
20/11/24 15:17:51 INFO fs.TestDFSIO:       Throughput mb/sec: 21.73		吞吐量
20/11/24 15:17:51 INFO fs.TestDFSIO:  Average IO rate mb/sec: 31.66		平均IO速率
20/11/24 15:17:51 INFO fs.TestDFSIO:   IO rate std deviation: 32.76		IO率STD偏差
20/11/24 15:17:51 INFO fs.TestDFSIO:      Test exec time sec: 84.11		执行测试时间
20/11/24 15:17:51 INFO fs.TestDFSIO:

在公司测试集群执行相同的测试10次之后的统计分析

测试结束之后删除测试数据

[bduser101@node101 mapreduce]$ hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -clean
20/11/24 15:00:47 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:00:47 INFO fs.TestDFSIO: nrFiles = 1
20/11/24 15:00:47 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
20/11/24 15:00:47 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:00:47 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:00:48 INFO fs.TestDFSIO: Cleaning up test files

2.2、nnbench

nnbench用于测试NameNode的负载，它会生成很多与HDFS相关的请求，给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。

nnbench的用法如下：

$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar nnbench
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
     -operation <Available operations are create_write open_read rename delete. This option is mandatory>
      * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
     -maps <number of maps. default is 1. This is not mandatory>
     -reduces <number of reduces. default is 1. This is not mandatory>
     -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
     -blockSize <Block size in bytes. default is 1. This is not mandatory>
     -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
     -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
     -numberOfFiles <number of files to create. default is 1. This is not mandatory>
     -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
     -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
     -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
     -help: Display the help statement

2.2.1、使用12个mapper和6个reducer创建1000个文件

hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-'hostname -s'

20/11/24 16:28:56 INFO hdfs.NNBench: Test Inputs:
20/11/24 16:28:56 INFO hdfs.NNBench:            Test Operation: create_write
20/11/24 16:28:56 INFO hdfs.NNBench:                Start time: 2020-11-24 16:30:56,726
20/11/24 16:28:56 INFO hdfs.NNBench:            Number of maps: 12
20/11/24 16:28:56 INFO hdfs.NNBench:         Number of reduces: 6
20/11/24 16:28:56 INFO hdfs.NNBench:                Block Size: 1
20/11/24 16:28:56 INFO hdfs.NNBench:            Bytes to write: 0
20/11/24 16:28:56 INFO hdfs.NNBench:        Bytes per checksum: 1
20/11/24 16:28:56 INFO hdfs.NNBench:           Number of files: 1
20/11/24 16:28:56 INFO hdfs.NNBench:        Replication factor: 3
20/11/24 16:28:56 INFO hdfs.NNBench:                  Base dir: /benchmarks/NNBench-hostname -s
20/11/24 16:28:56 INFO hdfs.NNBench:      Read file after open: true
20/11/24 16:28:59 INFO hdfs.NNBench: Deleting data directory
20/11/24 16:28:59 INFO hdfs.NNBench: Creating 12 control files

依然会遇到之前上传文件时的Exception：WARN hdfs.DFSClient: Caught exception java.lang.Interrupted

Exceptionat java.lang.Object.wait(Native Method)略过

20/11/24 16:29:00 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
20/11/24 16:29:00 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 16:29:00 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 16:29:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
20/11/24 16:29:01 INFO mapred.FileInputFormat: Total input paths to process : 12
20/11/24 16:29:01 INFO mapreduce.JobSubmitter: number of splits:12
20/11/24 16:29:01 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
20/11/24 16:29:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0006
20/11/24 16:29:02 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0006
20/11/24 16:29:02 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0006/
20/11/24 16:29:02 INFO mapreduce.Job: Running job: job_1606189764138_0006
20/11/24 16:29:14 INFO mapreduce.Job: Job job_1606189764138_0006 running in uber mode : false
20/11/24 16:29:15 INFO mapreduce.Job:  map 0% reduce 0%
20/11/24 16:30:10 INFO mapreduce.Job:  map 67% reduce 0%
20/11/24 16:31:16 INFO mapreduce.Job:  map 75% reduce 0%
20/11/24 16:31:17 INFO mapreduce.Job:  map 83% reduce 0%
20/11/24 16:31:18 INFO mapreduce.Job:  map 89% reduce 0%
20/11/24 16:31:19 INFO mapreduce.Job:  map 100% reduce 0%
20/11/24 16:31:29 INFO mapreduce.Job:  map 100% reduce 17%
20/11/24 16:31:39 INFO mapreduce.Job:  map 100% reduce 33%
20/11/24 16:31:40 INFO mapreduce.Job:  map 100% reduce 50%
20/11/24 16:31:48 INFO mapreduce.Job:  map 100% reduce 100%
20/11/24 16:31:49 INFO mapreduce.Job: Job job_1606189764138_0006 completed successfully
20/11/24 16:31:50 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=2232
                FILE: Number of bytes written=2282144
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=3076
                HDFS: Number of bytes written=171
                HDFS: Number of read operations=66
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=12012
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Launched map tasks=12
                Launched reduce tasks=6
                Data-local map tasks=12
                Total time spent by all maps in occupied slots (ms)=1447578
                Total time spent by all reduces in occupied slots (ms)=130151
                Total time spent by all map tasks (ms)=1447578
                Total time spent by all reduce tasks (ms)=130151
                Total vcore-milliseconds taken by all map tasks=1447578
                Total vcore-milliseconds taken by all reduce tasks=130151
                Total megabyte-milliseconds taken by all map tasks=1482319872
                Total megabyte-milliseconds taken by all reduce tasks=133274624
        Map-Reduce Framework
                Map input records=12
                Map output records=84
                Map output bytes=2028
                Map output materialized bytes=2628
                Input split bytes=1586
                Combine input records=0
                Combine output records=0
                Reduce input groups=7
                Reduce shuffle bytes=2628
                Reduce input records=84
                Reduce output records=7
                Spilled Records=168
                Shuffled Maps =72
                Failed Shuffles=0
                Merged Map outputs=72
                GC time elapsed (ms)=19664
                CPU time spent (ms)=62030
                Physical memory (bytes) snapshot=1698738176
                Virtual memory (bytes) snapshot=37147820032
                Total committed heap usage (bytes)=1589854208
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

测试日志：

在公司测试集群执行相同的测试之后的日志

2.3、mrbench

mrbench会多次重复执行一个小作业，用于检查在机群上小作业的运行是否可重复以及运行是否高效。

mrbench的用法如下：

$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar mrbench -help
MRBenchmark.0.0.2
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

2.3.1、运行一个小作业50次

$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar mrbench -numRuns 50

程序执行过程*50

20/11/24 17:13:48 INFO mapred.MRBench: Running job 49: input=viewfs://my-cluster/benchmarks/MRBench/mr_input output=viewfs://my-cluster/benchmarks/MRBench/mr_output/output_1197878541
20/11/24 17:13:48 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 17:13:48 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 17:13:48 INFO mapred.FileInputFormat: Total input paths to process : 1
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 INFO mapreduce.JobSubmitter: number of splits:2
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1252)
        at java.lang.Thread.join(Thread.java:1326)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0057
20/11/24 17:13:48 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0057
20/11/24 17:13:48 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0057/
20/11/24 17:13:48 INFO mapreduce.Job: Running job: job_1606189764138_0057
20/11/24 17:14:03 INFO mapreduce.Job: Job job_1606189764138_0057 running in uber mode : false
20/11/24 17:14:03 INFO mapreduce.Job:  map 0% reduce 0%
20/11/24 17:14:16 INFO mapreduce.Job:  map 100% reduce 0%
20/11/24 17:14:26 INFO mapreduce.Job:  map 100% reduce 100%
20/11/24 17:14:26 INFO mapreduce.Job: Job job_1606189764138_0057 completed successfully
20/11/24 17:14:26 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=13
                FILE: Number of bytes written=375073
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=243
                HDFS: Number of bytes written=3
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=22296
                Total time spent by all reduces in occupied slots (ms)=5883
                Total time spent by all map tasks (ms)=22296
                Total time spent by all reduce tasks (ms)=5883
                Total vcore-milliseconds taken by all map tasks=22296
                Total vcore-milliseconds taken by all reduce tasks=5883
                Total megabyte-milliseconds taken by all map tasks=22831104
                Total megabyte-milliseconds taken by all reduce tasks=6024192
        Map-Reduce Framework
                Map input records=1
                Map output records=1
                Map output bytes=5
                Map output materialized bytes=19
                Input split bytes=240
                Combine input records=0
                Combine output records=0
                Reduce input groups=1
                Reduce shuffle bytes=19
                Reduce input records=1
                Reduce output records=1
                Spilled Records=2
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=500
                CPU time spent (ms)=2030
                Physical memory (bytes) snapshot=499757056
                Virtual memory (bytes) snapshot=6190952448
                Total committed heap usage (bytes)=264908800
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

结果：平均作业完成时间为37秒

DataLines       Maps    Reduces AvgTime (milliseconds)
1               2       1       37481

在公司测试集群执行相同的测试之后的日志

结果：平均作业完成时间为23秒

DataLines       Maps    Reduces AvgTime (milliseconds)
1               2       1       23073

2.4、Teragen-TeraSort-Teravalidate

Hadoop的TeraSort是一个常用的测试，目的是利用MapReduce来尽可能快的对数据进行排序。TeraSort使用MapReduce框架通过分区操作将Map过程中的结果输出到Reduce任务，确保整体排序的顺序。TeraSort测试可以很好的对MapReduce框架的每个过程进行压力测试，为调优和配置Hadoop集群提供一个合理的参考。

2.4.1、Teragen生成测试数据

在进行TeraSort测试之前的一个准备过程就是数据的产生，可以使用teragen命令来生成TeraSort测试输入的数据。teragen命令的第一个参数是记录的数目，第二个参数是生成数据的HDFS目录。下面这个命令在HDFS的terasort-input目录中生成1GB的数据，由1千万条记录组成。

$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teragen 参数1 参数2

参数一：数据的行数，每行为100b，如1G数据就是102410241024/100=10737418行
参数二：输出文件

$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teragen 10000000 /terasortTest/terasort-input
2020-11-27 09:52:43,103 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 09:52:43,399 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 09:52:43,432 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656)
2020-11-27 09:52:43,488 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 09:52:43,563 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0183
2020-11-27 09:52:43,786 INFO terasort.TeraGen: Generating 10000000 using 2
2020-11-27 09:52:43,845 INFO mapreduce.JobSubmitter: number of splits:2
2020-11-27 09:52:43,881 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 09:52:43,882 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 09:52:43,971 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0183
2020-11-27 09:52:43,972 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656)]
2020-11-27 09:52:44,171 INFO conf.Configuration: resource-types.xml not found
2020-11-27 09:52:44,172 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 09:52:44,201 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 09:52:45,023 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0183
2020-11-27 09:52:45,069 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0183/
2020-11-27 09:52:45,070 INFO mapreduce.Job: Running job: job_1595472078301_0183
2020-11-27 09:52:53,212 INFO mapreduce.Job: Job job_1595472078301_0183 running in uber mode : false
2020-11-27 09:52:53,213 INFO mapreduce.Job:  map 0% reduce 0%
2020-11-27 09:53:08,296 INFO mapreduce.Job:  map 100% reduce 0%
2020-11-27 09:53:09,306 INFO mapreduce.Job: Job job_1595472078301_0183 completed successfully
2020-11-27 09:53:09,395 INFO mapreduce.Job: Counters: 33
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=460804
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=167
                HDFS: Number of bytes written=1000000000
                HDFS: Number of read operations=12
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Job Counters 
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=25469
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=25469
                Total vcore-milliseconds taken by all map tasks=25469
                Total megabyte-milliseconds taken by all map tasks=104321024
        Map-Reduce Framework
                Map input records=10000000
                Map output records=10000000
                Input split bytes=167
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=210
                CPU time spent (ms)=38190
                Physical memory (bytes) snapshot=959725568
                Virtual memory (bytes) snapshot=5937156096
                Total committed heap usage (bytes)=1221066752
                Peak Map Physical memory (bytes)=480038912
                Peak Map Virtual memory (bytes)=2998444032
        org.apache.hadoop.examples.terasort.TeraGen$Counters
                CHECKSUM=21472776955442690
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=1000000000

2.4.2、TeraSort数据排序

$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar terasort 参数1 参数2

参数1：测试数据目录文件
参数2：排序结果输出文件

$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar terasort /terasortTest/terasort-input /terasortTest/terasort-output
2020-11-27 09:59:17,085 INFO terasort.TeraSort: starting
2020-11-27 09:59:18,411 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 09:59:18,447 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656)
2020-11-27 09:59:18,529 INFO input.FileInputFormat: Total input files to process : 2
Spent 467ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 469ms
Sampling 4 splits of 4
Making 1 from 100000 sampled records
Computing parititions took 365ms
Spent 837ms computing partitions.
2020-11-27 09:59:19,288 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 09:59:19,352 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 09:59:19,440 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0184
2020-11-27 09:59:19,589 INFO mapreduce.JobSubmitter: number of splits:4
2020-11-27 09:59:19,622 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 09:59:19,623 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 09:59:19,714 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0184
2020-11-27 09:59:19,716 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656)]
2020-11-27 09:59:19,955 INFO conf.Configuration: resource-types.xml not found
2020-11-27 09:59:19,955 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 09:59:19,989 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 09:59:20,759 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0184
2020-11-27 09:59:20,798 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0184/
2020-11-27 09:59:20,799 INFO mapreduce.Job: Running job: job_1595472078301_0184
2020-11-27 09:59:28,922 INFO mapreduce.Job: Job job_1595472078301_0184 running in uber mode : false
2020-11-27 09:59:28,923 INFO mapreduce.Job:  map 0% reduce 0%
2020-11-27 09:59:48,008 INFO mapreduce.Job:  map 67% reduce 0%
2020-11-27 10:00:01,058 INFO mapreduce.Job:  map 83% reduce 0%
2020-11-27 10:00:05,072 INFO mapreduce.Job:  map 92% reduce 0%
2020-11-27 10:00:06,075 INFO mapreduce.Job:  map 100% reduce 0%
2020-11-27 10:00:23,122 INFO mapreduce.Job:  map 100% reduce 93%
2020-11-27 10:00:27,133 INFO mapreduce.Job:  map 100% reduce 100%
2020-11-27 10:00:27,137 INFO mapreduce.Job: Job job_1595472078301_0184 completed successfully
2020-11-27 10:00:27,228 INFO mapreduce.Job: Counters: 53
        File System Counters
                FILE: Number of bytes read=291816558
                FILE: Number of bytes written=584792131
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000476
                HDFS: Number of bytes written=1000000000
                HDFS: Number of read operations=17
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=4
                Launched reduce tasks=1
                Rack-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=127073
                Total time spent by all reduces in occupied slots (ms)=18762
                Total time spent by all map tasks (ms)=127073
                Total time spent by all reduce tasks (ms)=18762
                Total vcore-milliseconds taken by all map tasks=127073
                Total vcore-milliseconds taken by all reduce tasks=18762
                Total megabyte-milliseconds taken by all map tasks=520491008
                Total megabyte-milliseconds taken by all reduce tasks=76849152
        Map-Reduce Framework
                Map input records=10000000
                Map output records=10000000
                Map output bytes=1020000000
                Map output materialized bytes=291816558
                Input split bytes=476
                Combine input records=0
                Combine output records=0
                Reduce input groups=10000000
                Reduce shuffle bytes=291816558
                Reduce input records=10000000
                Reduce output records=10000000
                Spilled Records=20000000
                Shuffled Maps =4
                Failed Shuffles=0
                Merged Map outputs=4
                GC time elapsed (ms)=2065
                CPU time spent (ms)=175820
                Physical memory (bytes) snapshot=4377735168
                Virtual memory (bytes) snapshot=14680014848
                Total committed heap usage (bytes)=3767533568
                Peak Map Physical memory (bytes)=1008046080
                Peak Map Virtual memory (bytes)=2935164928
                Peak Reduce Physical memory (bytes)=481501184
                Peak Reduce Virtual memory (bytes)=2947211264
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1000000000
        File Output Format Counters 
                Bytes Written=1000000000
2020-11-27 10:00:27,229 INFO terasort.TeraSort: done

2.4.3、TeraValidate验证

验证TeraSort基准测试程序结果的正确性，可以使用teravalidate命令来执行。

$hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teravalidate 参数1 参数2

$hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teravalidate /terasortTest/terasort-output /terasortTest/terasort-teravalidate
2020-11-27 10:17:59,401 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 10:17:59,698 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 10:17:59,738 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656)
2020-11-27 10:17:59,788 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 10:17:59,865 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0185
2020-11-27 10:18:00,123 INFO input.FileInputFormat: Total input files to process : 1
Spent 45ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
2020-11-27 10:18:00,184 INFO mapreduce.JobSubmitter: number of splits:1
2020-11-27 10:18:00,215 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 10:18:00,216 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 10:18:00,301 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0185
2020-11-27 10:18:00,303 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656)]
2020-11-27 10:18:00,503 INFO conf.Configuration: resource-types.xml not found
2020-11-27 10:18:00,503 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 10:18:00,583 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 10:18:01,346 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0185
2020-11-27 10:18:01,396 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0185/
2020-11-27 10:18:01,397 INFO mapreduce.Job: Running job: job_1595472078301_0185
2020-11-27 10:18:09,515 INFO mapreduce.Job: Job job_1595472078301_0185 running in uber mode : false
2020-11-27 10:18:09,516 INFO mapreduce.Job:  map 0% reduce 0%
2020-11-27 10:18:28,605 INFO mapreduce.Job:  map 55% reduce 0%
2020-11-27 10:18:30,622 INFO mapreduce.Job:  map 100% reduce 0%
2020-11-27 10:18:38,646 INFO mapreduce.Job:  map 100% reduce 100%
2020-11-27 10:18:38,651 INFO mapreduce.Job: Job job_1595472078301_0185 completed successfully
2020-11-27 10:18:38,746 INFO mapreduce.Job: Counters: 53
        File System Counters
                FILE: Number of bytes read=98
                FILE: Number of bytes written=461817
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=1000000120
                HDFS: Number of bytes written=24
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=18950
                Total time spent by all reduces in occupied slots (ms)=4943
                Total time spent by all map tasks (ms)=18950
                Total time spent by all reduce tasks (ms)=4943
                Total vcore-milliseconds taken by all map tasks=18950
                Total vcore-milliseconds taken by all reduce tasks=4943
                Total megabyte-milliseconds taken by all map tasks=77619200
                Total megabyte-milliseconds taken by all reduce tasks=20246528
        Map-Reduce Framework
                Map input records=10000000
                Map output records=3
                Map output bytes=82
                Map output materialized bytes=90
                Input split bytes=120
                Combine input records=0
                Combine output records=0
                Reduce input groups=3
                Reduce shuffle bytes=90
                Reduce input records=3
                Reduce output records=1
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=1301
                CPU time spent (ms)=35340
                Physical memory (bytes) snapshot=1258213376
                Virtual memory (bytes) snapshot=5866979328
                Total committed heap usage (bytes)=1422917632
                Peak Map Physical memory (bytes)=1001738240
                Peak Map Virtual memory (bytes)=2933329920
                Peak Reduce Physical memory (bytes)=381505536
                Peak Reduce Virtual memory (bytes)=2933649408
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=1000000000
        File Output Format Counters 
                Bytes Written=24

查看验证文件内容，该文件如果有内容，说明结果集是有序的

$ hadoop fs -cat /terasortTest/terasort-teravalidate/part-r-00000m
checksum        4c49607ac53602

2.5、SliveTest

SliveTest位于hadoop的test包中，代码结构清晰，其主要功能是通过大量map制造多种rpc请求，检测Namenode的性能。我们可以设定map数量，每个map发起的rpc请求次数，每一种rpc操作占总操作的百分比，以及读写数据量、block size等配置。

下面列出slive可以调用的rpc操作种类：

ls	列出路径下所有文件和目录
append	追加写文件操作
create	创建文件操作
delete	删除文件操作
mkdir	创建目录操作
rename	重命名文件操作
read	读取文件内容操作

使用-help选项查看帮助信息

hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar SliveTest -help
usage: SliveTest 0.1.0
 -append <arg>         pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -appendSize <arg>     Min,max for size to append
                       (min=max=MAX_LONG=blocksize)
 -baseDir <arg>        Base directory path
 -blockSize <arg>      Min,max for dfs file block size
 -cleanup <arg>        Cleanup & remove directory after reporting
 -create <arg>         pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -delete <arg>         pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -dirSize <arg>        Max files per directory
 -duration <arg>       Duration of a map task in seconds (MAX_INT for no
                       limit)
 -exitOnError          Exit on first error
 -files <arg>          Max total number of files
 -help                 Usage information
 -ls <arg>             pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -maps <arg>           Number of maps
 -mkdir <arg>          pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -ops <arg>            Max number of operations per map
 -packetSize <arg>     Dfs write packet size
 -queue <arg>          Queue name
 -read <arg>           pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -readSize <arg>       Min,max for size to read (min=max=MAX_LONG=read
                       entire file)
 -reduces <arg>        Number of reduces
 -rename <arg>         pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -replication <arg>    Min,max value for replication amount
 -resFile <arg>        Result file name
 -seed <arg>           Random number seed
 -sleep <arg>          Min,max for millisecond of random sleep to perform
                       (between operations)
 -truncate <arg>       pct,distribution where distribution is one of
                       beg,end,uniform,mid
 -truncateSize <arg>   Min,max for size to truncate
                       (min=max=MAX_LONG=blocksize)
 -truncateWait <arg>   Should wait for truncate recovery
 -writeSize <arg>      Min,max for size to write
                       (min=max=MAX_LONG=blocksize)

默认情况下，每个map有1000次操作，7种操作均匀的随机出现。slivetest运行时相关参数如下表所示：

maps	一共运行多少个mapper，默认值为10
ops	每个map跑多少个操作，默认值为1000
duration	每个map task的持续时间，默认值为MAX_INT，也就是无限制
exitOnError	遇到第一个Error是否要立即退出，默认不退出
files	最大生成文件数，默认为10
dirSize	每个文件夹最多允许生成多少个文件，默认为32
baseDir	SliveTest运行后默认存放的文件根目录，默认为“/test/slive”
resFile	结果文件名，默认为“part-0000”
replication	备份数，可设置最小，最大备份数，默认为3
blockSize	设置文件block大小，默认为64M（64*1048576）
readSize	读入大小可设置为最小值，最大值形式，例如“-readSize 100,1000”，默认无限制（min=max=MAX_LONG=read entire file）
writeSize	写入大小，最小，最大形式，默认等于blockSize（min=max=blocksize）
sleep	在不同次操作之间随机的插入sleep，这个参数用于定义sleep的时间范围，设置同样是最小，最大，单位是毫秒，默认为0）
appendSize	追加写大小，最小，最大形式，默认等于blockSize（min=max=blocksize）
seed	随机数种子
cleanup	执行完所有操作并报告之后，清理目录
queue	指定队列名，默认为“default”
packetSize	指定写入的包大小
ls	指定ls操作占总操作数的百分比
append	指定append操作占总操作数的百分比
create	指定create操作占总操作数的百分比
delete	指定delete操作占总操作数的百分比
mkdir	指定mkdir操作占总操作数的百分比
rename	指定rename操作占总操作数的百分比
read	指定read操作占总操作数的百分比

使用示例：

指定map数100个，reduces数50个，create操作占比50%，执行完成后清理结果
hadoop jar hadoop-mapreduce-client-jobclient-3.1.1.3.1.0.17-1-tests.jar\
SliveTest\
-maps 100\
-reduces 50\
-baseDir /tmp/slivetest\
-create 50\
-cleanup true

执行完成后会生成一个报告

Basic report for operation type AppendOp
-------------
Measurement "bytes_written" = 2348810240
Measurement "failures" = 7621
Measurement "files_not_found" = 6544
Measurement "milliseconds_taken" = 14347
Measurement "op_count" = 14200
Measurement "successes" = 35
Rate for measurement "bytes_written" = 156.13 MB/sec
Rate for measurement "op_count" = 989.754 operations/sec
Rate for measurement "successes" = 2.44 successes/sec
-------------
Basic report for operation type CreateOp
-------------
Measurement "bytes_written" = 10536091648
Measurement "failures" = 99843
Measurement "milliseconds_taken" = 63315
Measurement "op_count" = 100000
Measurement "successes" = 157
Rate for measurement "bytes_written" = 158.699 MB/sec
Rate for measurement "op_count" = 1579.405 operations/sec
Rate for measurement "successes" = 2.48 successes/sec
-------------
Basic report for operation type DeleteOp
-------------
Measurement "failures" = 6490
Measurement "milliseconds_taken" = 12158
Measurement "op_count" = 14200
Measurement "successes" = 7710
Rate for measurement "op_count" = 1167.955 operations/sec
Rate for measurement "successes" = 634.15 successes/sec
-------------
Basic report for operation type ListOp
-------------
Measurement "dir_entries" = 20891
Measurement "files_not_found" = 6
Measurement "milliseconds_taken" = 21439
Measurement "op_count" = 14200
Measurement "successes" = 14194
Rate for measurement "dir_entries" = 974.439 directory entries/sec
Rate for measurement "op_count" = 662.344 operations/sec
Rate for measurement "successes" = 662.064 successes/sec
-------------
Basic report for operation type MkdirOp
-------------
Measurement "milliseconds_taken" = 18552
Measurement "op_count" = 14200
Measurement "successes" = 14200
Rate for measurement "op_count" = 765.416 operations/sec
Rate for measurement "successes" = 765.416 successes/sec
-------------
Basic report for operation type ReadOp
-------------
Measurement "bad_files" = 4190
Measurement "bytes_read" = 8388608000
Measurement "chunks_unverified" = 0
Measurement "chunks_verified" = 1048575750
Measurement "files_not_found" = 6648
Measurement "milliseconds_taken" = 101147
Measurement "op_count" = 14200
Measurement "successes" = 3362
Rate for measurement "bytes_read" = 79.093 MB/sec
Rate for measurement "op_count" = 140.39 operations/sec
Rate for measurement "successes" = 33.239 successes/sec
-------------
Basic report for operation type RenameOp
-------------
Measurement "failures" = 10371
Measurement "milliseconds_taken" = 5921
Measurement "op_count" = 14200
Measurement "successes" = 3829
Rate for measurement "op_count" = 2398.244 operations/sec
Rate for measurement "successes" = 646.681 successes/sec
-------------
Basic report for operation type SliveMapper
-------------
Measurement "milliseconds_taken" = 9595422
Measurement "op_count" = 199400
Rate for measurement "op_count" = 20.781 operations/sec
-------------
Basic report for operation type TruncateOp
-------------
Measurement "bytes_written" = 0
Measurement "failures" = 7666
Measurement "files_not_found" = 6432
Measurement "milliseconds_taken" = 95
Measurement "op_count" = 14200
Measurement "successes" = 102
Rate for measurement "bytes_written" = 0 MB/sec
Rate for measurement "op_count" = 149473.684 operations/sec
Rate for measurement "successes" = 1073.684 successes/sec
-------------

https://wenku.baidu.com/view/6f5b4ceb0b4c2e3f56276327.html

http://www.docin.com/p-1946489098.html

https://blog.csdn.net/weixin_30853329/article/details/97429096?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

921
Measurement “op_count” = 14200
Measurement “successes” = 3829
Rate for measurement “op_count” = 2398.244 operations/sec
Rate for measurement “successes” = 646.681 successes/sec

Basic report for operation type SliveMapper

Measurement “milliseconds_taken” = 9595422
Measurement “op_count” = 199400
Rate for measurement “op_count” = 20.781 operations/sec

Basic report for operation type TruncateOp

Measurement “bytes_written” = 0
Measurement “failures” = 7666
Measurement “files_not_found” = 6432
Measurement “milliseconds_taken” = 95
Measurement “op_count” = 14200
Measurement “successes” = 102
Rate for measurement “bytes_written” = 0 MB/sec
Rate for measurement “op_count” = 149473.684 operations/sec
Rate for measurement “successes” = 1073.684 successes/sec




> https://wenku.baidu.com/view/6f5b4ceb0b4c2e3f56276327.html
>
> http://www.docin.com/p-1946489098.html
>
> https://blog.csdn.net/weixin_30853329/article/details/97429096?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control

Hadoop性能测试-Benchmarking

文章目录

Hadoop Benchmarking

一、调试集群

二、测试组件

2.1、TestDFSIO

2.1.1、向HDFS上传10个100MB的文件

2.1.2、从HDFS读取10个1000MB的文件

2.2、nnbench

2.2.1、使用12个mapper和6个reducer创建1000个文件

2.3、mrbench

2.3.1、运行一个小作业50次

2.4、Teragen-TeraSort-Teravalidate

2.4.1、Teragen生成测试数据

2.4.2、TeraSort数据排序

2.4.3、TeraValidate验证

2.5、SliveTest

921
Measurement “op_count” = 14200
Measurement “successes” = 3829
Rate for measurement “op_count” = 2398.244 operations/sec
Rate for measurement “successes” = 646.681 successes/sec

Basic report for operation type SliveMapper

Measurement “milliseconds_taken” = 9595422
Measurement “op_count” = 199400
Rate for measurement “op_count” = 20.781 operations/sec

Basic report for operation type TruncateOp

猜你喜欢

Hadoop性能测试-Benchmarking

文章目录

Hadoop Benchmarking

一、调试集群

二、测试组件

2.1、TestDFSIO

2.1.1、向HDFS上传10个100MB的文件

2.1.2、从HDFS读取10个1000MB的文件

2.2、nnbench

2.2.1、使用12个mapper和6个reducer创建1000个文件

2.3、mrbench

2.3.1、运行一个小作业50次

2.4、Teragen-TeraSort-Teravalidate

2.4.1、Teragen生成测试数据

2.4.2、TeraSort数据排序

2.4.3、TeraValidate验证

2.5、SliveTest

921 Measurement “op_count” = 14200 Measurement “successes” = 3829 Rate for measurement “op_count” = 2398.244 operations/sec Rate for measurement “successes” = 646.681 successes/sec

Basic report for operation type SliveMapper

Measurement “milliseconds_taken” = 9595422 Measurement “op_count” = 199400 Rate for measurement “op_count” = 20.781 operations/sec

Basic report for operation type TruncateOp

猜你喜欢

921
Measurement “op_count” = 14200
Measurement “successes” = 3829
Rate for measurement “op_count” = 2398.244 operations/sec
Rate for measurement “successes” = 646.681 successes/sec

Measurement “milliseconds_taken” = 9595422
Measurement “op_count” = 199400
Rate for measurement “op_count” = 20.781 operations/sec