[Hadoop shell命令]--处理hdfs上错误的block块并修复

情景:运行Spark程序出现报错

1、报错信息:
17/05/09 14:30:58 WARN scheduler.TaskSetManager: Lost task 28162.1 in stage 0.0 (TID 30490, 127.0.0.1): java.io.IOException: Cannot obtain block length for LocatedBlock{BP-203532773-dfsfdf-1476004795661:blk_1080431162_6762963; getBlockSize()=411; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:1004,DS-e9905a06-4607-4113-b717-709a087b8b96,DISK], DatanodeInfoWithStorage[127.0.0.1:1004,DS-a5046b43-4416-45d9-8ff6-44891bcdf3b8,DISK], DatanodeInfoWithStorage[127.0.0.1:1004,DS-f6b04bbe-9555-4ac8-b06a-3317eb229511,DISK]]}

2、解决参考:
https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html
3、开始检查文件


 

hdfs fsck /user/admin/data/cdn/20170509 -locations -blocks -files
 Status: HEALTHY
 Total size:    2115443944 B (Total open files size: 7684855 B)
 Total dirs:    1
 Total files:    67353
 Total symlinks:        0 (Files currently being written: 367)
 Total blocks (validated):    67339 (avg. block size 31414 B) (Total open file blocks (not validated): 357)
 Minimally replicated blocks:    67339 (100.0 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    3.0
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        6
 Number of racks:        1
--------------------- 

发现:有357个文件处于打开状态

4、再列出有问题的文件
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite

Total size:    2123128799 B
 Total dirs:    1
 Total files:    67720
 Total symlinks:        0
 Total blocks (validated):    67696 (avg. block size 31362 B)
 ************************
  CORRUPT FILES:    253
  MISSING BLOCKS:    253
  MISSING SIZE:        7473074 B
 ************************
 Minimally replicated blocks:    67443 (99.626274 %)
 Over-replicated blocks:    0 (0.0 %)
 Under-replicated blocks:    0 (0.0 %)
 Mis-replicated blocks:        0 (0.0 %)
 Default replication factor:    3
 Average block replication:    2.9887881
 Corrupt blocks:        0
 Missing replicas:        0 (0.0 %)
 Number of data-nodes:        6
 Number of racks:        1
FSCK ended at Wed May 10 10:01:56 CST 2017 in 1357 milliseconds

The filesystem under path '/user/admin/data/cdn/20170509' is CORRUPT


(1)找到有问题的文件

cat tmp.txt |tr '/' '\n' |grep ngaahcs-acc |tr ':' ' '|awk '{print $1}' |sort |uniq |grep -v "2017112318"


(2)最好的解决方法:删除tmp文件
hdfs dfs -rmr /user/admin/data/cdn/20170509/*.tmp

然而没有解决!!
(3)删除tmp文件后,再执行
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite


或者用这种方式查找那些文件
[root@eeeee spark]# hdfs fsck /user/admin/data/cdn/20170509 -openforwrite |grep "/user/admin/data/cdn//20170509"
Connecting to namenode via http://rrrrrr:50070

/user/admin/data/cdn//20170509/ngaahcs-access.log..201705090002.1494259322790.gz 250 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log..201705090002.1494259322790.gz: MISSING 1 blocks of total size 250 B.......
/user/admin/data/cdn//20170509/ngaahcs-access.log.705090000.1494259200039.gz 1222 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.l4.201705090000.1494259200039.gz: MISSING 1 blocks of total size 1222
/user/admin/data/cdn//20170509/ngaahcs-access.log.C2-3l4.201705090245.1494269103909.gz 211 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CTSX2-3l4.201705090750.1494287404133.gz 1504 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-3l4.201705090820.1494289204450.gz 308 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.C2-3l4.201705091545.1494315903839.gz 437 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.SX3-3l3.201705090002.1494259321230.gz 1075 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CX3-3l4.201705090001.1494259260581.gz 521 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-X3-3l4.201705090001.1494259260581.gz: MISSING 1 blocks of total size
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-SX3-3l4.201705090002.1494259320807.gz 729 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-GX-GD-SX4-3l4.201705090001.1494259260236.gz 1138 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-3l4.201705090001.1494259260236.gz: MISSING 1 blocks of total size 1138 B.........................
/user/admin/data/cdn//20170509/ngaahcs-access.log.CTX9-3n3.201705090001.1494259260495.gz 2379 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CXq-3k1.201705090002.1494259320204.gz: MISSING 1 blocks of total size 10153 /user/admin/data/cdn//20170509/ngaahcs-access.log.CTXq-3k2.201705090001.1494259260772.gz 539 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-GXq-3n1.201705090002.1494259320328.gz 1278 bytes, 1 block(s), OPENFORWRITE:
/user/admin/data/cdn//20170509/ngaahcs-access.log.CT-G-3n2.201705090001.1494259260696.gz 2183 bytes, 1 block(s), OPENFORWRITE:

如果文件不重要则删除他们
 

再检查
hdfs fsck /user/admin/data/cdn/20170509 -openforwrite
Total size: 2115004402 B
Total dirs: 1
Total files: 67337
Total symlinks: 0
Total blocks (validated): 67337 (avg. block size 31409 B)
Minimally replicated blocks: 67337 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 6
Number of racks: 1
FSCK ended at Wed May 10 10:16:52 CST 2017 in 1329 milliseconds

The filesystem under path '/user/admin/data/cdn//20170509' is HEALTHY

然后再运行spark程序

注:这不是最终解决方法,所以需要查明原因


如果文件重要,则需要修复。
一个一个地查看文件状态并且恢复
以这个文件为例:/user/admin/data/cdn//20170508/ngaahcs-access.log.3k3.201705081700.1494234003128.gz


执行修复命令:

hdfs debug recoverLease -path <path-of-the-file> -retries <retry times>
hdfs debug recoverLease -path /user/admin/data/cdn//20170508/ngaahcs-access.log.C00.1494234003128.gz -retries 10


hadoop 命令汇总:

https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#fsck

猜你喜欢

转载自blog.csdn.net/mnasd/article/details/86691093
今日推荐