HDFS fsimage文件解析与反解析

参考:
http://hadoop.apache.org/docs/r3.1.1/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
https://blog.csdn.net/androidlushangderen/article/details/50937069

目录

  • 概述
  • 使用
    • 命令使用说明
    • XMl Processor
    • FileDistribution Processor
    • Delimited Processor

概述

HDFS中解析fsImage的工具是 Offline Image Viewer ,对应的命令是hdfs oiv。Offline Image Viewer提供了几下几种处理器:

  • XML:将fsimage文件中的所有内容解析为XML格式的文件,该处理器的输出可以通过XML工具进行自动化处理和分析。由于XML语法的冗长,该处理器生成的输出文件非常大。
  • FileDistribution:分析namespace image中的文件大小分布的一个工具
  • Web:启动一个HTTP服务,对外暴露一个只读的WebHDFS API。不支持secure 模式。
  • Delimited(experimental):生成一个文本文件,其中包含inodes-under-construction和inode所共有的所有元素,用分隔符分隔,默认分隔符是\t,但可以通过-delimiter参数更改。
  • ReverseXML(experimental):对应XML processor,将XML格式的文件反解析为FSImage。这个处理器可以很容易地创建用于测试的fsimages,并在fsimage出现损坏时手动编辑fsimages。

使用

命令使用说明

$ hdfs oiv -h
Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE
Offline Image Viewer
View a Hadoop fsimage INPUTFILE using the specified PROCESSOR,
saving the results in OUTPUTFILE.

The oiv utility will attempt to parse correctly formed image files
and will abort fail with mal-formed image files.

The tool works offline and does not require a running cluster in
order to process an image file.

The following image processors are available:
  * XML: This processor creates an XML document with all elements of
    the fsimage enumerated, suitable for further analysis by XML
    tools.
  * ReverseXML: This processor takes an XML file and creates a
    binary fsimage containing the same elements.
  * FileDistribution: This processor analyzes the file size
    distribution in the image.
    -maxSize specifies the range [0, maxSize] of file sizes to be
     analyzed (128GB by default).
    -step defines the granularity of the distribution. (2MB by default)
    -format formats the output result in a human-readable fashion
     rather than a number of bytes. (false by default)
  * Web: Run a viewer to expose read-only WebHDFS API.
    -addr specifies the address to listen. (localhost:5978 by default)
    It does not support secure mode nor HTTPS.
  * Delimited (experimental): Generate a text file with all of the elements common
    to both inodes and inodes-under-construction, separated by a
    delimiter. The default delimiter is \t, though this may be
    changed via the -delimiter argument.

Required command line arguments:
-i,--inputFile <arg>   FSImage or XML file to process.

Optional command line arguments:
-o,--outputFile <arg>  Name of output file. If the specified
                       file exists, it will be overwritten.
                       (output to stdout by default)
                       If the input file was an XML file, we
                       will also create an <outputFile>.md5 file.
-p,--processor <arg>   Select which type of processor to apply
                       against image file. (XML|FileDistribution|
                       ReverseXML|Web|Delimited)
                       The default is Web.
-delimiter <arg>       Delimiting string to use with Delimited processor.  
-t,--temp <arg>        Use temporary dir to cache intermediate result to generate
                       Delimited outputs. If not set, Delimited processor constructs
                       the namespace in memory before outputting text.
-h,--help              Display usage information and exit

  选项说明

  • -i,–inputFile 必选项,指定FSImage或者XML文件
  • -o,–outputFile 可选项,指定输出文件,默认是stdout
  • -p,–processor 可选项,指定processor,默认是Web processor
  • -delimiter 可选项,用于使用Delimited processor时,指定输出字符串的分隔符
  • -t,–temp 可选项,用于使用Delimited processor时,指定临时目录缓存输出。

XMl Processor

  使用示例

bin/hdfs oiv -p XML -i fsimage_0000000000007905322 -o fsimage.xml

  文件内容截取如下:

<?xml version="1.0"?>
<fsimage><version><layoutVersion>-64</layoutVersion><onDiskVersion>1</onDiskVersion><oivRevision>Unknown</oivRevision></version>
<NameSection><namespaceId>585529621</namespaceId><genstampV1>1000</genstampV1><genstampV2>304682</genstampV2><genstampV1Limit>0</genstampV1Limit><lastAllocatedBlockId>1074043378</lastAllocatedBlockId><txid>7895152</txid></NameSection>
<ErasureCodingSection>
<erasureCodingPolicy>
<policyId>5</policyId><policyName>RS-10-4-1024k</policyName><cellSize>1048576</cellSize><policyState>DISABLED</policyState><ecSchema>
<codecName>rs</codecName><dataUnits>10</dataUnits><parityUnits>4</parityUnits></ecSchema>
</erasureCodingPolicy>
...
...
</ErasureCodingSection>

<INodeSection><lastInodeId>2196642</lastInodeId><numInodes>53934</numInodes>
<inode><id>16385</id><type>DIRECTORY</type><name></name><mtime>1556176976929</mtime><permission>hadoop:hadoop:0755</permission><nsquota>9223372036854775807</nsquota><dsquota>-1</dsquota></inode>
...
</blocks>
...
...
</fsimage>

  从XML文件中可以看到fsimage中包含了几类section

  • 命名空间类Section, 包括namespaceId, rollingUpgradeStartTime等类型的变量.
  • INode相关Section,包含了文件,目录相关inode的信息
  • fileUnderConstructionSection正在构建中的文件信息.
  • SnapShot快照相关信息.
  • SecretManager安全管理相关信息
  • CacheManager缓存管理相关信息

FileDistribution Processor

  使用示例

hdfs oiv -i fsimage_0000000000007905322 -p FileDistribution -format
Processed 0 inodes.
Size Range      NumFiles
[0 B, 0 B]      863
(0 B, 2 MB]     36238
(2 MB, 4 MB]    3072
(4 MB, 6 MB]    1780
(6 MB, 8 MB]    573
(8 MB, 10 MB]   70
(10 MB, 12 MB]  922
(12 MB, 14 MB]  33
(14 MB, 16 MB]  10
...
...
totalFiles = 49974
totalDirectories = 4000
totalBlocks = 54112
totalSpace = 6468980524520
maxFileSize = 19466558832

Delimited Processor

  使用示例

hdfs oiv -i current/fsimage_0000000000007961778 -p Delimited 
Path    Replication     ModificationTime        AccessTime      PreferredBlockSize      BlocksCount     FileSize        NSQUOTA DSQUOTA Permission      UserName   GroupName
/       0       2019-04-25 15:22        1970-01-01 08:00        0       0       0       9223372036857     -1      drwxr-xr-x      hadoop  hadoop
/tmp    0       2019-07-15 16:25        1970-01-01 08:00        0       0       0       -1      -1      drwxrwxrwx      hadoop  hadoop
...
发布了57 篇原创文章 · 获赞 3 · 访问量 1万+

猜你喜欢

转载自blog.csdn.net/CPP_MAYIBO/article/details/97313134
今日推荐