CDH启动DataNode失败

搭建完CDH集群后,发现有一个DataNode启动后马上又挂掉,反复几次都是如此,查看角色日志

java.net.BindException: Problem binding to [cdh01:50020] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:720)
	at org.apache.hadoop.ipc.Server.bind(Server.java:524)
	at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:723)
	at org.apache.hadoop.ipc.Server.<init>(Server.java:2438)
	at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:1042)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:536)
	at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:511)
	at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:887)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.initIpcServer(DataNode.java:930)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1324)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:465)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2592)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2479)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2526)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2708)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2732)
Caused by: java.net.BindException: Address already in use
	at sun.nio.ch.Net.bind0(Native Method)
	at sun.nio.ch.Net.bind(Net.java:433)
	at sun.nio.ch.Net.bind(Net.java:425)
	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
	at org.apache.hadoop.ipc.Server.bind(Server.java:507)
	... 14 more

原来datanode需要绑定50020端口,却发现这个端口已经被占用了。用jps -m查询占用端口进程,发现该进程不存在

[root@cdh01 ~]$ jps -m
4624 Main --pipeline-type SERVICE_MONITORING --mgmt-home /opt/cloudera-manager/cm-5.16.2/share/cmf
8529 Jps -m
5617 NodeManager
4626 AlertPublisher
5622 JobHistoryServer
3801 Main
5673 ResourceManager
5339 QuorumPeerMain /opt/cloudera-manager/cm-5.16.2/run/cloudera-scm-agent/process/73-zookeeper-server/zoo.cfg
4620 Main --pipeline-type HOST_MONITORING --mgmt-home /opt/cloudera-manager/cm-5.16.2/share/cmf
5021 SecondaryNameNode
4622 EventCatcherService
5023 NameNode

使用ps -ef也没有发现该进程,咋办?

后来想起来以前用过的lsof -i 命令,先安装该命令 yum install -y lsof

执行lsof -i:50020

[root@cdh01 ~]$ lsof -i:50020
COMMAND  PID      USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    5339 zookeeper   39u  IPv4  34440      0t0  TCP cdh01:50020->cdh02:macbak (ESTABLISHED)

终于把真凶找了出来!原来是zookeeper给占了。

解决:

     在CDH上先关闭zookeeper, 再启动那个失败的datanode进程,最后恢复zookeeper进程

猜你喜欢

转载自blog.csdn.net/dinghua_xuexi/article/details/105904217