2020/11/27 [email protected]
文章目录
- Hadoop Benchmarking
-
- 一、调试集群
- 二、测试组件
- 921 Measurement "op_count" = 14200 Measurement "successes" = 3829 Rate for measurement "op_count" = 2398.244 operations/sec Rate for measurement "successes" = 646.681 successes/sec
- Basic report for operation type SliveMapper
- Measurement "milliseconds_taken" = 9595422 Measurement "op_count" = 199400 Rate for measurement "op_count" = 20.781 operations/sec
- Basic report for operation type TruncateOp
- Measurement "bytes_written" = 0 Measurement "failures" = 7666 Measurement "files_not_found" = 6432 Measurement "milliseconds_taken" = 95 Measurement "op_count" = 14200 Measurement "successes" = 102 Rate for measurement "bytes_written" = 0 MB/sec Rate for measurement "op_count" = 149473.684 operations/sec Rate for measurement "successes" = 1073.684 successes/sec
Hadoop Benchmarking
一、调试集群
在开始测试之前应当启用HDFS服务以及YARN服务
在启动yarn服务时发现resourceManager启动不了,通过查看日志发现错误:
2020-11-23 15:56:44,775 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2020-11-23 15:56:45,408 INFO org.apache.hadoop.conf.Configuration: found resource core-site.xml at file:/home/bduser101/modules/hadoop/etc/hadoop/core-site.xml
2020-11-23 15:56:45,931 FATAL org.apache.hadoop.conf.Configuration: error parsing conf java.io.BufferedInputStream@74294adb
org.xml.sax.SAXParseException; lineNumber: 19; columnNumber: 38; An 'include' failed, and no 'fallback' element was found.
这个错误来自配置联邦时修改了core-site.xml中的引入文件,将mountTable.xml文件中的配置都写入core-site.xml中,并将该文件同步至所有节点之后,即可正常启动yarn服务。
【错误来源】
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="mountTable.xml"/>
更正后:
二、测试组件
当我们部署完一个新的集群,或者对集群做了升级,或调整集群中的性能参数后,想观察集群性能的变化,那么我们就需要一些集群测试工具。
hadoop自带的测试包,在这个测试包下有很多测试工具,其中DFSCIOTest、mrbench、nnbench应用广泛。
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.3.jar
-
DFSCIOTest: Distributed i/o benchmark of libhdfs.
(测试libhdfs中的分布式I/O的基准。Libhdfs是一个为C/C++应用程序提供HDFS文件服务的共享库。)
-
DistributedFSCheck: Distributed checkup of the file system consistency.
(文件系统一致性的分布式检查)
-
JHLogAnalyzer: Job History Log analyzer.
-
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
-
NNdataGenerator: Generate the data to be used by NNloadGenerator
-
NNloadGenerator: Generate load on Namenode using NN loadgenerator run WITHOUT MR
-
NNloadGeneratorMR: Generate load on Namenode using NN loadgenerator run as MR job
-
NNstructureGenerator: Generate the structure to be used by NNdataGenerator
-
SliveTest: HDFS Stress Test and Live Data Verification.
-
TestDFSIO: Distributed i/o benchmark.
(分布式的I/O基准)
-
fail: a job that always fails
-
filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
(测量HDFS的吞吐量)
-
largesorter: Large-Sort tester
-
loadgen: Generic map/reduce load generator
-
mapredtest: A map/reduce test check.
-
minicluster: Single process HDFS and MR cluster.
-
mrbench: A map/reduce benchmark that can create many small jobs
(创建大量小作业的MapReduce基准)
-
nnbench: A benchmark that stresses the namenode.
(NameNode的性能基)
-
sleep: A job that sleeps at each map and reduce task.
-
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
-
testfilesystem: A test for FileSystem read/write.
(文件系统读写测试)
-
testmapredsort: A map/reduce program that validates the map-reduce framework’s sort.
(用于校验MapReduce框架的排序的程序)
-
testsequencefile: A test for flat files of binary key value pairs.
(对包含二进制键值对的文本文件的测试)
-
testsequencefileinputformat: A test for sequence file input format.
(对序列文件输入格式的测试)
-
testtextinputformat: A test for text input format.
(对文本输入格式的测试。)
-
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
(对比输出一个排序块的Map作业和输出多个排序块的Map作业的性能)
2.1、TestDFSIO
TestDFSIO用于测试HDFS的IO性能,使用一个MapReduce作业来并发地执行读写操作,每个map任务用于读或写每个文件,map的输出用于收集与处理文件相关的统计信息,reduce用于累积统计信息,并产生summary。
TestDFSIO的用法如下:
$>:hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO
Usage: TestDFSIO [genericOptions] -read | -write | -append | -clean [-nrFiles N] [-fileSize Size[B|KB|MB|GB|TB]] [-resFile resultFileName] [-bufferSize Bytes] [-rootDir]
在测试程序执行结束之后会在本地文件目录下生成文件TestDFSIO_results.log,可以查看运行结果的日志
2.1.1、向HDFS上传10个100MB的文件
$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 100
20/11/23 17:03:48 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/23 17:03:48 INFO fs.TestDFSIO: nrFiles = 10
20/11/23 17:03:48 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/23 17:03:48 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/23 17:03:48 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/23 17:03:50 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files
遇到错误WARN hdfs.DFSClient: Caught exception java.lang.Interrupted
Exceptionat java.lang.Object.wait(Native Method)不用慌,根据网上大多数人的情况来看,这是hadoop的bug
20/11/23 17:04:39 INFO mapreduce.Job: map 0% reduce 0%
20/11/23 17:05:03 INFO mapreduce.Job: map 13% reduce 0%
20/11/23 17:05:20 INFO mapreduce.Job: map 17% reduce 0%
20/11/23 17:05:21 INFO mapreduce.Job: map 20% reduce 0%
20/11/23 17:05:35 INFO mapreduce.Job: map 20% reduce 7%
20/11/23 17:05:44 INFO mapreduce.Job: map 27% reduce 7%
20/11/23 17:05:50 INFO mapreduce.Job: map 30% reduce 10%
20/11/23 17:05:58 INFO mapreduce.Job: map 77% reduce 10%
20/11/23 17:06:52 INFO mapreduce.Job: map 80% reduce 10%
Error: org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull
遇到错误org.apache.hadoop.ipc.RemoteException(java.io.IOException): BP-112231132-192.168.159.101-1584624837518:blk_1073742201_1382 does not exist or is not under Constructionnull;这是关于平衡器的bug 具体参照官方文档https://issues.apache.org/jira/browse/hdfs-8093
排除办法有
-
系统或hdfs是否有空间
修改配置文件core-site.xml 将配置项fs.default.name从viewfs://my-cluser改为hdfs://node101:8020
hadoop/bin$>./hdfs dfsadmin -report
结果显示集群剩余空间仍然有很多
-
datanode数是否正常
-
是否在safemode
-
防火墙关闭
-
配置方面
-
把NameNode的tmp文件清空,然后重新格式化NameNode
20/11/23 17:06:53 INFO mapreduce.Job: map 77% reduce 10%
20/11/23 17:07:14 INFO mapreduce.Job: map 80% reduce 10%
20/11/23 17:07:27 INFO mapreduce.Job: map 90% reduce 13%
Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.
遇到错误Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /benchmarks/TestDFSIO/io_data/test_io_6 (inode 16834): File does not exist. Holder DFSClient_attempt_1606119502234_0004_m_000006_0_-1509478354_1 does not have any open files.
-
这个问题实际上就是data stream操作过程中文件被删掉了。,通常是因为Mapred多个task操作同一个文件,一个task完成后删掉文件导致
-
此错误与hadoop的特性有关:Hadoop不会尝试诊断和修复运行缓慢的任务,而是尝试检测(推测)它们并为其运行备份任务。真正的原因是,在任务执行缓慢的情况下,Hadoop运行另一个任务以执行相同的操作(在我的情况下是将数据保存在hadoop的文件系统中),当两个相同的任务中的一个完成时,将删除一些临时文件,另一个任务完成之后将会删除同样的临时文件,所以这样会造成这种错误
-
这个错误本身并不会影响该测试程序的运行结果,可以忽略。可以通过关闭spark和hadoop的推测来解决此问题:
程序运行结束之后会有以下信息打印,包括该测试程序在运行期间的mapReduce任务,吞吐量,速率等数据
20/11/23 17:07:28 INFO mapreduce.Job: map 87% reduce 13%
20/11/23 17:07:31 INFO mapreduce.Job: map 90% reduce 13%
20/11/23 17:07:32 INFO mapreduce.Job: map 93% reduce 13%
20/11/23 17:07:33 INFO mapreduce.Job: map 97% reduce 13%
20/11/23 17:07:34 INFO mapreduce.Job: map 100% reduce 13%
20/11/23 17:07:36 INFO mapreduce.Job: map 100% reduce 100%
20/11/23 17:07:37 INFO mapreduce.Job: Job job_1606119502234_0004 completed successfully
20/11/23 17:07:38 INFO mapreduce.Job: Counters: 57
File System Counters
FILE: Number of bytes read=857
FILE: Number of bytes written=1377714
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2330
HDFS: Number of bytes written=1048576078
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=12
VIEWFS: Number of bytes read=0
VIEWFS: Number of bytes written=0
VIEWFS: Number of read operations=0
VIEWFS: Number of large read operations=0
VIEWFS: Number of write operations=0
Job Counters
Failed map tasks=2
Killed map tasks=6
Launched map tasks=19
Launched reduce tasks=1
Other local map tasks=1
Data-local map tasks=18
Total time spent by all maps in occupied slots (ms)=1723294
Total time spent by all reduces in occupied slots (ms)=133402
Total time spent by all map tasks (ms)=1723294
Total time spent by all reduce tasks (ms)=133402
Total vcore-milliseconds taken by all map tasks=1723294
Total vcore-milliseconds taken by all reduce tasks=133402
Total megabyte-milliseconds taken by all map tasks=1764653056
Total megabyte-milliseconds taken by all reduce tasks=136603648
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=751
Map output materialized bytes=911
Input split bytes=1210
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=911
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=71333
CPU time spent (ms)=66340
Physical memory (bytes) snapshot=1764884480
Virtual memory (bytes) snapshot=22712225792
Total committed heap usage (bytes)=2045894656
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
测试日志
20/11/23 17:07:38 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
20/11/23 17:07:38 INFO fs.TestDFSIO: Date & time: Mon Nov 23 17:07:38 CST 2020
20/11/23 17:07:38 INFO fs.TestDFSIO: Number of files: 10
20/11/23 17:07:38 INFO fs.TestDFSIO: Total MBytes processed: 1000
20/11/23 17:07:38 INFO fs.TestDFSIO: Throughput mb/sec: 2.09 吞吐量
20/11/23 17:07:38 INFO fs.TestDFSIO: Average IO rate mb/sec: 3.48 平均IO速率
20/11/23 17:07:38 INFO fs.TestDFSIO: IO rate std deviation: 2.52 IO率STD偏差
20/11/23 17:07:38 INFO fs.TestDFSIO: Test exec time sec: 226.16 测试执行时间秒
20/11/23 17:07:38 INFO fs.TestDFSIO:
在公司测试集群执行相同的测试10次之后的统计分析
2.1.2、从HDFS读取10个1000MB的文件
在读取之前应当运行上一个测试用例,以生成数据
$>cd /home/bduser101/modules/hadoop/share/hadoop/mapreduce
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
20/11/24 15:16:24 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:16:24 INFO fs.TestDFSIO: nrFiles = 10
20/11/24 15:16:24 INFO fs.TestDFSIO: nrBytes (MB) = 100.0
20/11/24 15:16:24 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:16:24 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:16:26 INFO fs.TestDFSIO: creating control file: 104857600 bytes, 10 files
依然会遇到之前上传文件时的Exception:WARN hdfs.DFSClient: Caught exception java.lang.Interrupted
Exceptionat java.lang.Object.wait(Native Method)
20/11/24 15:16:29 INFO mapreduce.JobSubmitter: number of splits:10
20/11/24 15:16:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0004
20/11/24 15:16:29 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0004
20/11/24 15:16:29 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0004/
20/11/24 15:16:29 INFO mapreduce.Job: Running job: job_1606189764138_0004
20/11/24 15:16:43 INFO mapreduce.Job: Job job_1606189764138_0004 running in uber mode : false
20/11/24 15:16:43 INFO mapreduce.Job: map 0% reduce 0%
20/11/24 15:17:18 INFO mapreduce.Job: map 27% reduce 0%
20/11/24 15:17:20 INFO mapreduce.Job: map 40% reduce 0%
20/11/24 15:17:38 INFO mapreduce.Job: map 80% reduce 0%
20/11/24 15:17:43 INFO mapreduce.Job: map 80% reduce 13%
20/11/24 15:17:45 INFO mapreduce.Job: map 97% reduce 13%
20/11/24 15:17:47 INFO mapreduce.Job: map 100% reduce 13%
20/11/24 15:17:49 INFO mapreduce.Job: map 100% reduce 100%
20/11/24 15:17:51 INFO mapreduce.Job: Job job_1606189764138_0004 completed successfully
20/11/24 15:17:51 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=847
FILE: Number of bytes written=1377672
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=962493722
HDFS: Number of bytes written=78
HDFS: Number of read operations=53
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
VIEWFS: Number of bytes read=0
VIEWFS: Number of bytes written=0
VIEWFS: Number of read operations=0
VIEWFS: Number of large read operations=0
VIEWFS: Number of write operations=0
Job Counters
Launched map tasks=11
Launched reduce tasks=1
Data-local map tasks=11
Total time spent by all maps in occupied slots (ms)=522728
Total time spent by all reduces in occupied slots (ms)=23213
Total time spent by all map tasks (ms)=522728
Total time spent by all reduce tasks (ms)=23213
Total vcore-milliseconds taken by all map tasks=522728
Total vcore-milliseconds taken by all reduce tasks=23213
Total megabyte-milliseconds taken by all map tasks=535273472
Total megabyte-milliseconds taken by all reduce tasks=23770112
Map-Reduce Framework
Map input records=10
Map output records=50
Map output bytes=741
Map output materialized bytes=901
Input split bytes=1210
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=901
Reduce input records=50
Reduce output records=5
Spilled Records=100
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=16407
CPU time spent (ms)=19210
Physical memory (bytes) snapshot=1245241344
Virtual memory (bytes) snapshot=22678716416
Total committed heap usage (bytes)=1248374784
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
测试日志
20/11/24 15:17:51 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
20/11/24 15:17:51 INFO fs.TestDFSIO: Date & time: Tue Nov 24 15:17:51 CST 2020
20/11/24 15:17:51 INFO fs.TestDFSIO: Number of files: 10
20/11/24 15:17:51 INFO fs.TestDFSIO: Total MBytes processed: 917.9
20/11/24 15:17:51 INFO fs.TestDFSIO: Throughput mb/sec: 21.73 吞吐量
20/11/24 15:17:51 INFO fs.TestDFSIO: Average IO rate mb/sec: 31.66 平均IO速率
20/11/24 15:17:51 INFO fs.TestDFSIO: IO rate std deviation: 32.76 IO率STD偏差
20/11/24 15:17:51 INFO fs.TestDFSIO: Test exec time sec: 84.11 执行测试时间
20/11/24 15:17:51 INFO fs.TestDFSIO:
在公司测试集群执行相同的测试10次之后的统计分析
测试结束之后删除测试数据
[bduser101@node101 mapreduce]$ hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar TestDFSIO -clean
20/11/24 15:00:47 INFO fs.TestDFSIO: TestDFSIO.1.8
20/11/24 15:00:47 INFO fs.TestDFSIO: nrFiles = 1
20/11/24 15:00:47 INFO fs.TestDFSIO: nrBytes (MB) = 1.0
20/11/24 15:00:47 INFO fs.TestDFSIO: bufferSize = 1000000
20/11/24 15:00:47 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
20/11/24 15:00:48 INFO fs.TestDFSIO: Cleaning up test files
2.2、nnbench
nnbench用于测试NameNode的负载,它会生成很多与HDFS相关的请求,给NameNode施加较大的压力。这个测试能在HDFS上模拟创建、读取、重命名和删除文件等操作。
nnbench的用法如下:
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar nnbench
NameNode Benchmark 0.4
Usage: nnbench <options>
Options:
-operation <Available operations are create_write open_read rename delete. This option is mandatory>
* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
-maps <number of maps. default is 1. This is not mandatory>
-reduces <number of reduces. default is 1. This is not mandatory>
-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
-blockSize <Block size in bytes. default is 1. This is not mandatory>
-bytesToWrite <Bytes to write. default is 0. This is not mandatory>
-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
-numberOfFiles <number of files to create. default is 1. This is not mandatory>
-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
-help: Display the help statement
2.2.1、使用12个mapper和6个reducer创建1000个文件
hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar nnbench -operation create_write -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true -baseDir /benchmarks/NNBench-'hostname -s'
20/11/24 16:28:56 INFO hdfs.NNBench: Test Inputs:
20/11/24 16:28:56 INFO hdfs.NNBench: Test Operation: create_write
20/11/24 16:28:56 INFO hdfs.NNBench: Start time: 2020-11-24 16:30:56,726
20/11/24 16:28:56 INFO hdfs.NNBench: Number of maps: 12
20/11/24 16:28:56 INFO hdfs.NNBench: Number of reduces: 6
20/11/24 16:28:56 INFO hdfs.NNBench: Block Size: 1
20/11/24 16:28:56 INFO hdfs.NNBench: Bytes to write: 0
20/11/24 16:28:56 INFO hdfs.NNBench: Bytes per checksum: 1
20/11/24 16:28:56 INFO hdfs.NNBench: Number of files: 1
20/11/24 16:28:56 INFO hdfs.NNBench: Replication factor: 3
20/11/24 16:28:56 INFO hdfs.NNBench: Base dir: /benchmarks/NNBench-hostname -s
20/11/24 16:28:56 INFO hdfs.NNBench: Read file after open: true
20/11/24 16:28:59 INFO hdfs.NNBench: Deleting data directory
20/11/24 16:28:59 INFO hdfs.NNBench: Creating 12 control files
依然会遇到之前上传文件时的Exception:WARN hdfs.DFSClient: Caught exception java.lang.Interrupted
Exceptionat java.lang.Object.wait(Native Method)略过
20/11/24 16:29:00 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
20/11/24 16:29:00 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 16:29:00 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 16:29:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
20/11/24 16:29:01 INFO mapred.FileInputFormat: Total input paths to process : 12
20/11/24 16:29:01 INFO mapreduce.JobSubmitter: number of splits:12
20/11/24 16:29:01 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
20/11/24 16:29:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0006
20/11/24 16:29:02 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0006
20/11/24 16:29:02 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0006/
20/11/24 16:29:02 INFO mapreduce.Job: Running job: job_1606189764138_0006
20/11/24 16:29:14 INFO mapreduce.Job: Job job_1606189764138_0006 running in uber mode : false
20/11/24 16:29:15 INFO mapreduce.Job: map 0% reduce 0%
20/11/24 16:30:10 INFO mapreduce.Job: map 67% reduce 0%
20/11/24 16:31:16 INFO mapreduce.Job: map 75% reduce 0%
20/11/24 16:31:17 INFO mapreduce.Job: map 83% reduce 0%
20/11/24 16:31:18 INFO mapreduce.Job: map 89% reduce 0%
20/11/24 16:31:19 INFO mapreduce.Job: map 100% reduce 0%
20/11/24 16:31:29 INFO mapreduce.Job: map 100% reduce 17%
20/11/24 16:31:39 INFO mapreduce.Job: map 100% reduce 33%
20/11/24 16:31:40 INFO mapreduce.Job: map 100% reduce 50%
20/11/24 16:31:48 INFO mapreduce.Job: map 100% reduce 100%
20/11/24 16:31:49 INFO mapreduce.Job: Job job_1606189764138_0006 completed successfully
20/11/24 16:31:50 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=2232
FILE: Number of bytes written=2282144
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=3076
HDFS: Number of bytes written=171
HDFS: Number of read operations=66
HDFS: Number of large read operations=0
HDFS: Number of write operations=12012
VIEWFS: Number of bytes read=0
VIEWFS: Number of bytes written=0
VIEWFS: Number of read operations=0
VIEWFS: Number of large read operations=0
VIEWFS: Number of write operations=0
Job Counters
Launched map tasks=12
Launched reduce tasks=6
Data-local map tasks=12
Total time spent by all maps in occupied slots (ms)=1447578
Total time spent by all reduces in occupied slots (ms)=130151
Total time spent by all map tasks (ms)=1447578
Total time spent by all reduce tasks (ms)=130151
Total vcore-milliseconds taken by all map tasks=1447578
Total vcore-milliseconds taken by all reduce tasks=130151
Total megabyte-milliseconds taken by all map tasks=1482319872
Total megabyte-milliseconds taken by all reduce tasks=133274624
Map-Reduce Framework
Map input records=12
Map output records=84
Map output bytes=2028
Map output materialized bytes=2628
Input split bytes=1586
Combine input records=0
Combine output records=0
Reduce input groups=7
Reduce shuffle bytes=2628
Reduce input records=84
Reduce output records=7
Spilled Records=168
Shuffled Maps =72
Failed Shuffles=0
Merged Map outputs=72
GC time elapsed (ms)=19664
CPU time spent (ms)=62030
Physical memory (bytes) snapshot=1698738176
Virtual memory (bytes) snapshot=37147820032
Total committed heap usage (bytes)=1589854208
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
测试日志:
在公司测试集群执行相同的测试之后的日志
2.3、mrbench
mrbench会多次重复执行一个小作业,用于检查在机群上小作业的运行是否可重复以及运行是否高效。
mrbench的用法如下:
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar mrbench -help
MRBenchmark.0.0.2
Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]
2.3.1、运行一个小作业50次
$>hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar mrbench -numRuns 50
程序执行过程*50
20/11/24 17:13:48 INFO mapred.MRBench: Running job 49: input=viewfs://my-cluster/benchmarks/MRBench/mr_input output=viewfs://my-cluster/benchmarks/MRBench/mr_output/output_1197878541
20/11/24 17:13:48 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 17:13:48 INFO client.RMProxy: Connecting to ResourceManager at node101/192.168.159.101:8032
20/11/24 17:13:48 INFO mapred.FileInputFormat: Total input paths to process : 1
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 INFO mapreduce.JobSubmitter: number of splits:2
20/11/24 17:13:48 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:716)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:476)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:652)
20/11/24 17:13:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1606189764138_0057
20/11/24 17:13:48 INFO impl.YarnClientImpl: Submitted application application_1606189764138_0057
20/11/24 17:13:48 INFO mapreduce.Job: The url to track the job: http://node101:8088/proxy/application_1606189764138_0057/
20/11/24 17:13:48 INFO mapreduce.Job: Running job: job_1606189764138_0057
20/11/24 17:14:03 INFO mapreduce.Job: Job job_1606189764138_0057 running in uber mode : false
20/11/24 17:14:03 INFO mapreduce.Job: map 0% reduce 0%
20/11/24 17:14:16 INFO mapreduce.Job: map 100% reduce 0%
20/11/24 17:14:26 INFO mapreduce.Job: map 100% reduce 100%
20/11/24 17:14:26 INFO mapreduce.Job: Job job_1606189764138_0057 completed successfully
20/11/24 17:14:26 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=13
FILE: Number of bytes written=375073
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=243
HDFS: Number of bytes written=3
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
VIEWFS: Number of bytes read=0
VIEWFS: Number of bytes written=0
VIEWFS: Number of read operations=0
VIEWFS: Number of large read operations=0
VIEWFS: Number of write operations=0
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=22296
Total time spent by all reduces in occupied slots (ms)=5883
Total time spent by all map tasks (ms)=22296
Total time spent by all reduce tasks (ms)=5883
Total vcore-milliseconds taken by all map tasks=22296
Total vcore-milliseconds taken by all reduce tasks=5883
Total megabyte-milliseconds taken by all map tasks=22831104
Total megabyte-milliseconds taken by all reduce tasks=6024192
Map-Reduce Framework
Map input records=1
Map output records=1
Map output bytes=5
Map output materialized bytes=19
Input split bytes=240
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=19
Reduce input records=1
Reduce output records=1
Spilled Records=2
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=500
CPU time spent (ms)=2030
Physical memory (bytes) snapshot=499757056
Virtual memory (bytes) snapshot=6190952448
Total committed heap usage (bytes)=264908800
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
结果:平均作业完成时间为37秒
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 37481
在公司测试集群执行相同的测试之后的日志
结果:平均作业完成时间为23秒
DataLines Maps Reduces AvgTime (milliseconds)
1 2 1 23073
2.4、Teragen-TeraSort-Teravalidate
Hadoop的TeraSort是一个常用的测试,目的是利用MapReduce来尽可能快的对数据进行排序。TeraSort使用MapReduce框架通过分区操作将Map过程中的结果输出到Reduce任务,确保整体排序的顺序。TeraSort测试可以很好的对MapReduce框架的每个过程进行压力测试,为调优和配置Hadoop集群提供一个合理的参考。
2.4.1、Teragen生成测试数据
在进行TeraSort测试之前的一个准备过程就是数据的产生,可以使用teragen命令来生成TeraSort测试输入的数据。teragen命令的第一个参数是记录的数目,第二个参数是生成数据的HDFS目录。下面这个命令在HDFS的terasort-input目录中生成1GB的数据,由1千万条记录组成。
$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teragen 参数1 参数2
- 参数一:数据的行数,每行为100b,如1G数据就是102410241024/100=10737418行
- 参数二:输出文件
$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teragen 10000000 /terasortTest/terasort-input
2020-11-27 09:52:43,103 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 09:52:43,399 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 09:52:43,432 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656)
2020-11-27 09:52:43,488 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 09:52:43,563 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0183
2020-11-27 09:52:43,786 INFO terasort.TeraGen: Generating 10000000 using 2
2020-11-27 09:52:43,845 INFO mapreduce.JobSubmitter: number of splits:2
2020-11-27 09:52:43,881 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 09:52:43,882 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 09:52:43,971 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0183
2020-11-27 09:52:43,972 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606441963380, maxDate=1607046763380, sequenceNumber=10228, masterKeyId=656)]
2020-11-27 09:52:44,171 INFO conf.Configuration: resource-types.xml not found
2020-11-27 09:52:44,172 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 09:52:44,201 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 09:52:45,023 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0183
2020-11-27 09:52:45,069 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0183/
2020-11-27 09:52:45,070 INFO mapreduce.Job: Running job: job_1595472078301_0183
2020-11-27 09:52:53,212 INFO mapreduce.Job: Job job_1595472078301_0183 running in uber mode : false
2020-11-27 09:52:53,213 INFO mapreduce.Job: map 0% reduce 0%
2020-11-27 09:53:08,296 INFO mapreduce.Job: map 100% reduce 0%
2020-11-27 09:53:09,306 INFO mapreduce.Job: Job job_1595472078301_0183 completed successfully
2020-11-27 09:53:09,395 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=460804
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=167
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=25469
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=25469
Total vcore-milliseconds taken by all map tasks=25469
Total megabyte-milliseconds taken by all map tasks=104321024
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Input split bytes=167
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=210
CPU time spent (ms)=38190
Physical memory (bytes) snapshot=959725568
Virtual memory (bytes) snapshot=5937156096
Total committed heap usage (bytes)=1221066752
Peak Map Physical memory (bytes)=480038912
Peak Map Virtual memory (bytes)=2998444032
org.apache.hadoop.examples.terasort.TeraGen$Counters
CHECKSUM=21472776955442690
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=1000000000
2.4.2、TeraSort数据排序
$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar terasort 参数1 参数2
- 参数1:测试数据目录文件
- 参数2:排序结果输出文件
$ hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar terasort /terasortTest/terasort-input /terasortTest/terasort-output
2020-11-27 09:59:17,085 INFO terasort.TeraSort: starting
2020-11-27 09:59:18,411 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 09:59:18,447 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656)
2020-11-27 09:59:18,529 INFO input.FileInputFormat: Total input files to process : 2
Spent 467ms computing base-splits.
Spent 2ms computing TeraScheduler splits.
Computing input splits took 469ms
Sampling 4 splits of 4
Making 1 from 100000 sampled records
Computing parititions took 365ms
Spent 837ms computing partitions.
2020-11-27 09:59:19,288 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 09:59:19,352 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 09:59:19,440 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0184
2020-11-27 09:59:19,589 INFO mapreduce.JobSubmitter: number of splits:4
2020-11-27 09:59:19,622 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 09:59:19,623 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 09:59:19,714 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0184
2020-11-27 09:59:19,716 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606442358393, maxDate=1607047158393, sequenceNumber=10229, masterKeyId=656)]
2020-11-27 09:59:19,955 INFO conf.Configuration: resource-types.xml not found
2020-11-27 09:59:19,955 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 09:59:19,989 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 09:59:20,759 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0184
2020-11-27 09:59:20,798 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0184/
2020-11-27 09:59:20,799 INFO mapreduce.Job: Running job: job_1595472078301_0184
2020-11-27 09:59:28,922 INFO mapreduce.Job: Job job_1595472078301_0184 running in uber mode : false
2020-11-27 09:59:28,923 INFO mapreduce.Job: map 0% reduce 0%
2020-11-27 09:59:48,008 INFO mapreduce.Job: map 67% reduce 0%
2020-11-27 10:00:01,058 INFO mapreduce.Job: map 83% reduce 0%
2020-11-27 10:00:05,072 INFO mapreduce.Job: map 92% reduce 0%
2020-11-27 10:00:06,075 INFO mapreduce.Job: map 100% reduce 0%
2020-11-27 10:00:23,122 INFO mapreduce.Job: map 100% reduce 93%
2020-11-27 10:00:27,133 INFO mapreduce.Job: map 100% reduce 100%
2020-11-27 10:00:27,137 INFO mapreduce.Job: Job job_1595472078301_0184 completed successfully
2020-11-27 10:00:27,228 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=291816558
FILE: Number of bytes written=584792131
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1000000476
HDFS: Number of bytes written=1000000000
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=4
Launched reduce tasks=1
Rack-local map tasks=4
Total time spent by all maps in occupied slots (ms)=127073
Total time spent by all reduces in occupied slots (ms)=18762
Total time spent by all map tasks (ms)=127073
Total time spent by all reduce tasks (ms)=18762
Total vcore-milliseconds taken by all map tasks=127073
Total vcore-milliseconds taken by all reduce tasks=18762
Total megabyte-milliseconds taken by all map tasks=520491008
Total megabyte-milliseconds taken by all reduce tasks=76849152
Map-Reduce Framework
Map input records=10000000
Map output records=10000000
Map output bytes=1020000000
Map output materialized bytes=291816558
Input split bytes=476
Combine input records=0
Combine output records=0
Reduce input groups=10000000
Reduce shuffle bytes=291816558
Reduce input records=10000000
Reduce output records=10000000
Spilled Records=20000000
Shuffled Maps =4
Failed Shuffles=0
Merged Map outputs=4
GC time elapsed (ms)=2065
CPU time spent (ms)=175820
Physical memory (bytes) snapshot=4377735168
Virtual memory (bytes) snapshot=14680014848
Total committed heap usage (bytes)=3767533568
Peak Map Physical memory (bytes)=1008046080
Peak Map Virtual memory (bytes)=2935164928
Peak Reduce Physical memory (bytes)=481501184
Peak Reduce Virtual memory (bytes)=2947211264
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1000000000
File Output Format Counters
Bytes Written=1000000000
2020-11-27 10:00:27,229 INFO terasort.TeraSort: done
2.4.3、TeraValidate验证
验证TeraSort基准测试程序结果的正确性,可以使用teravalidate命令来执行。
$hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teravalidate 参数1 参数2
- 参数1:数据集目录文件
- 参数2:验证文件输出目录文件
$hadoop jar hadoop-mapreduce-examples-3.1.1.3.1.0.17-1.jar teravalidate /terasortTest/terasort-output /terasortTest/terasort-teravalidate
2020-11-27 10:17:59,401 INFO client.AHSProxy: Connecting to Application History server at hebsjzx-hadoop-67-5.bonc.com/10.252.67.5:10200
2020-11-27 10:17:59,698 INFO hdfs.DFSClient: Created token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656 on ha-hdfs:beh001
2020-11-27 10:17:59,738 INFO security.TokenCache: Got dt for hdfs://beh001; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656)
2020-11-27 10:17:59,788 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
2020-11-27 10:17:59,865 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1595472078301_0185
2020-11-27 10:18:00,123 INFO input.FileInputFormat: Total input files to process : 1
Spent 45ms computing base-splits.
Spent 3ms computing TeraScheduler splits.
2020-11-27 10:18:00,184 INFO mapreduce.JobSubmitter: number of splits:1
2020-11-27 10:18:00,215 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
2020-11-27 10:18:00,216 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2020-11-27 10:18:00,301 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1595472078301_0185
2020-11-27 10:18:00,303 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:beh001, Ident: (token for hadoop: HDFS_DELEGATION_TOKEN [email protected], renewer=hadoop, realUser=, issueDate=1606443479677, maxDate=1607048279677, sequenceNumber=10230, masterKeyId=656)]
2020-11-27 10:18:00,503 INFO conf.Configuration: resource-types.xml not found
2020-11-27 10:18:00,503 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-11-27 10:18:00,583 INFO impl.TimelineClientImpl: Timeline service address: hebsjzx-hadoop-67-5.bonc.com:8190
2020-11-27 10:18:01,346 INFO impl.YarnClientImpl: Submitted application application_1595472078301_0185
2020-11-27 10:18:01,396 INFO mapreduce.Job: The url to track the job: https://hebsjzx-hadoop-67-6.bonc.com:8090/proxy/application_1595472078301_0185/
2020-11-27 10:18:01,397 INFO mapreduce.Job: Running job: job_1595472078301_0185
2020-11-27 10:18:09,515 INFO mapreduce.Job: Job job_1595472078301_0185 running in uber mode : false
2020-11-27 10:18:09,516 INFO mapreduce.Job: map 0% reduce 0%
2020-11-27 10:18:28,605 INFO mapreduce.Job: map 55% reduce 0%
2020-11-27 10:18:30,622 INFO mapreduce.Job: map 100% reduce 0%
2020-11-27 10:18:38,646 INFO mapreduce.Job: map 100% reduce 100%
2020-11-27 10:18:38,651 INFO mapreduce.Job: Job job_1595472078301_0185 completed successfully
2020-11-27 10:18:38,746 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=98
FILE: Number of bytes written=461817
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1000000120
HDFS: Number of bytes written=24
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=18950
Total time spent by all reduces in occupied slots (ms)=4943
Total time spent by all map tasks (ms)=18950
Total time spent by all reduce tasks (ms)=4943
Total vcore-milliseconds taken by all map tasks=18950
Total vcore-milliseconds taken by all reduce tasks=4943
Total megabyte-milliseconds taken by all map tasks=77619200
Total megabyte-milliseconds taken by all reduce tasks=20246528
Map-Reduce Framework
Map input records=10000000
Map output records=3
Map output bytes=82
Map output materialized bytes=90
Input split bytes=120
Combine input records=0
Combine output records=0
Reduce input groups=3
Reduce shuffle bytes=90
Reduce input records=3
Reduce output records=1
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1301
CPU time spent (ms)=35340
Physical memory (bytes) snapshot=1258213376
Virtual memory (bytes) snapshot=5866979328
Total committed heap usage (bytes)=1422917632
Peak Map Physical memory (bytes)=1001738240
Peak Map Virtual memory (bytes)=2933329920
Peak Reduce Physical memory (bytes)=381505536
Peak Reduce Virtual memory (bytes)=2933649408
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1000000000
File Output Format Counters
Bytes Written=24
查看验证文件内容,该文件如果有内容,说明结果集是有序的
$ hadoop fs -cat /terasortTest/terasort-teravalidate/part-r-00000m
checksum 4c49607ac53602
2.5、SliveTest
SliveTest位于hadoop的test包中,代码结构清晰,其主要功能是通过大量map制造多种rpc请求,检测Namenode的性能。我们可以设定map数量,每个map发起的rpc请求次数,每一种rpc操作占总操作的百分比,以及读写数据量、block size等配置。
下面列出slive可以调用的rpc操作种类:
ls | 列出路径下所有文件和目录 |
---|---|
append | 追加写文件操作 |
create | 创建文件操作 |
delete | 删除文件操作 |
mkdir | 创建目录操作 |
rename | 重命名文件操作 |
read | 读取文件内容操作 |
使用-help选项查看帮助信息
hadoop jar hadoop-mapreduce-client-jobclient-2.7.6-tests.jar SliveTest -help
usage: SliveTest 0.1.0
-append <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-appendSize <arg> Min,max for size to append
(min=max=MAX_LONG=blocksize)
-baseDir <arg> Base directory path
-blockSize <arg> Min,max for dfs file block size
-cleanup <arg> Cleanup & remove directory after reporting
-create <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-delete <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-dirSize <arg> Max files per directory
-duration <arg> Duration of a map task in seconds (MAX_INT for no
limit)
-exitOnError Exit on first error
-files <arg> Max total number of files
-help Usage information
-ls <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-maps <arg> Number of maps
-mkdir <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-ops <arg> Max number of operations per map
-packetSize <arg> Dfs write packet size
-queue <arg> Queue name
-read <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-readSize <arg> Min,max for size to read (min=max=MAX_LONG=read
entire file)
-reduces <arg> Number of reduces
-rename <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-replication <arg> Min,max value for replication amount
-resFile <arg> Result file name
-seed <arg> Random number seed
-sleep <arg> Min,max for millisecond of random sleep to perform
(between operations)
-truncate <arg> pct,distribution where distribution is one of
beg,end,uniform,mid
-truncateSize <arg> Min,max for size to truncate
(min=max=MAX_LONG=blocksize)
-truncateWait <arg> Should wait for truncate recovery
-writeSize <arg> Min,max for size to write
(min=max=MAX_LONG=blocksize)
默认情况下,每个map有1000次操作,7种操作均匀的随机出现。slivetest运行时相关参数如下表所示:
maps | 一共运行多少个mapper,默认值为10 |
---|---|
ops | 每个map跑多少个操作,默认值为1000 |
duration | 每个map task的持续时间,默认值为MAX_INT,也就是无限制 |
exitOnError | 遇到第一个Error是否要立即退出,默认不退出 |
files | 最大生成文件数,默认为10 |
dirSize | 每个文件夹最多允许生成多少个文件,默认为32 |
baseDir | SliveTest运行后默认存放的文件根目录,默认为“/test/slive” |
resFile | 结果文件名,默认为“part-0000” |
replication | 备份数,可设置最小,最大备份数,默认为3 |
blockSize | 设置文件block大小,默认为64M(64*1048576) |
readSize | 读入大小可设置为最小值,最大值形式,例如“-readSize 100,1000”,默认无限制(min=max=MAX_LONG=read entire file) |
writeSize | 写入大小,最小,最大形式,默认等于blockSize(min=max=blocksize) |
sleep | 在不同次操作之间随机的插入sleep,这个参数用于定义sleep的时间范围,设置同样是最小,最大,单位是毫秒,默认为0) |
appendSize | 追加写大小,最小,最大形式,默认等于blockSize(min=max=blocksize) |
seed | 随机数种子 |
cleanup | 执行完所有操作并报告之后,清理目录 |
queue | 指定队列名,默认为“default” |
packetSize | 指定写入的包大小 |
ls | 指定ls操作占总操作数的百分比 |
append | 指定append操作占总操作数的百分比 |
create | 指定create操作占总操作数的百分比 |
delete | 指定delete操作占总操作数的百分比 |
mkdir | 指定mkdir操作占总操作数的百分比 |
rename | 指定rename操作占总操作数的百分比 |
read | 指定read操作占总操作数的百分比 |
使用示例:
指定map数100个,reduces数50个,create操作占比50%,执行完成后清理结果
hadoop jar hadoop-mapreduce-client-jobclient-3.1.1.3.1.0.17-1-tests.jar\
SliveTest\
-maps 100\
-reduces 50\
-baseDir /tmp/slivetest\
-create 50\
-cleanup true
执行完成后会生成一个报告
Basic report for operation type AppendOp
-------------
Measurement "bytes_written" = 2348810240
Measurement "failures" = 7621
Measurement "files_not_found" = 6544
Measurement "milliseconds_taken" = 14347
Measurement "op_count" = 14200
Measurement "successes" = 35
Rate for measurement "bytes_written" = 156.13 MB/sec
Rate for measurement "op_count" = 989.754 operations/sec
Rate for measurement "successes" = 2.44 successes/sec
-------------
Basic report for operation type CreateOp
-------------
Measurement "bytes_written" = 10536091648
Measurement "failures" = 99843
Measurement "milliseconds_taken" = 63315
Measurement "op_count" = 100000
Measurement "successes" = 157
Rate for measurement "bytes_written" = 158.699 MB/sec
Rate for measurement "op_count" = 1579.405 operations/sec
Rate for measurement "successes" = 2.48 successes/sec
-------------
Basic report for operation type DeleteOp
-------------
Measurement "failures" = 6490
Measurement "milliseconds_taken" = 12158
Measurement "op_count" = 14200
Measurement "successes" = 7710
Rate for measurement "op_count" = 1167.955 operations/sec
Rate for measurement "successes" = 634.15 successes/sec
-------------
Basic report for operation type ListOp
-------------
Measurement "dir_entries" = 20891
Measurement "files_not_found" = 6
Measurement "milliseconds_taken" = 21439
Measurement "op_count" = 14200
Measurement "successes" = 14194
Rate for measurement "dir_entries" = 974.439 directory entries/sec
Rate for measurement "op_count" = 662.344 operations/sec
Rate for measurement "successes" = 662.064 successes/sec
-------------
Basic report for operation type MkdirOp
-------------
Measurement "milliseconds_taken" = 18552
Measurement "op_count" = 14200
Measurement "successes" = 14200
Rate for measurement "op_count" = 765.416 operations/sec
Rate for measurement "successes" = 765.416 successes/sec
-------------
Basic report for operation type ReadOp
-------------
Measurement "bad_files" = 4190
Measurement "bytes_read" = 8388608000
Measurement "chunks_unverified" = 0
Measurement "chunks_verified" = 1048575750
Measurement "files_not_found" = 6648
Measurement "milliseconds_taken" = 101147
Measurement "op_count" = 14200
Measurement "successes" = 3362
Rate for measurement "bytes_read" = 79.093 MB/sec
Rate for measurement "op_count" = 140.39 operations/sec
Rate for measurement "successes" = 33.239 successes/sec
-------------
Basic report for operation type RenameOp
-------------
Measurement "failures" = 10371
Measurement "milliseconds_taken" = 5921
Measurement "op_count" = 14200
Measurement "successes" = 3829
Rate for measurement "op_count" = 2398.244 operations/sec
Rate for measurement "successes" = 646.681 successes/sec
-------------
Basic report for operation type SliveMapper
-------------
Measurement "milliseconds_taken" = 9595422
Measurement "op_count" = 199400
Rate for measurement "op_count" = 20.781 operations/sec
-------------
Basic report for operation type TruncateOp
-------------
Measurement "bytes_written" = 0
Measurement "failures" = 7666
Measurement "files_not_found" = 6432
Measurement "milliseconds_taken" = 95
Measurement "op_count" = 14200
Measurement "successes" = 102
Rate for measurement "bytes_written" = 0 MB/sec
Rate for measurement "op_count" = 149473.684 operations/sec
Rate for measurement "successes" = 1073.684 successes/sec
-------------
https://wenku.baidu.com/view/6f5b4ceb0b4c2e3f56276327.html
http://www.docin.com/p-1946489098.html
https://blog.csdn.net/weixin_30853329/article/details/97429096?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control
921
Measurement “op_count” = 14200
Measurement “successes” = 3829
Rate for measurement “op_count” = 2398.244 operations/sec
Rate for measurement “successes” = 646.681 successes/sec
Basic report for operation type SliveMapper
Measurement “milliseconds_taken” = 9595422
Measurement “op_count” = 199400
Rate for measurement “op_count” = 20.781 operations/sec
Basic report for operation type TruncateOp
Measurement “bytes_written” = 0
Measurement “failures” = 7666
Measurement “files_not_found” = 6432
Measurement “milliseconds_taken” = 95
Measurement “op_count” = 14200
Measurement “successes” = 102
Rate for measurement “bytes_written” = 0 MB/sec
Rate for measurement “op_count” = 149473.684 operations/sec
Rate for measurement “successes” = 1073.684 successes/sec
> https://wenku.baidu.com/view/6f5b4ceb0b4c2e3f56276327.html
>
> http://www.docin.com/p-1946489098.html
>
> https://blog.csdn.net/weixin_30853329/article/details/97429096?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.control