Namenode节点因网络问题挂掉以后,整个集群的datanode等服务也相继挂了,待修复网络问题,并且启动集群后发现有两个datanode节点无法启动,查看日志发现其报错如下:
017-12-20 23:55:17,542 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/data/hadoop/datanode is in an inconsistent state: Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.2017-12-20 23:55:17,542 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Setting up storage: nsid=299045286;bpid=BP-1735478683-10.1.0.31-1433992326763;lv=-56;nsInfo=lv=-59;cid=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3;nsid=299045286;c=0;bpid=BP-1735478683-10.1.0.31-1433992326763;dnuuid=b6c8b918-fa63-4812-95bc-c399b4f300312017-12-20 23:55:17,554 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data/data/hadoop/datanode should be specified as a URI in configuration files. Please update hdfs configuration.2017-12-20 23:55:17,555 WARN org.apache.hadoop.hdfs.server.common.Util: Path /data1/data/hadoop/datanode should be specified as a URI in configuration files. Please update hdfs configuration.2017-12-20 23:55:17,555 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory [DISK]file:/data/data/hadoop/datanode/ has already been used.2017-12-20 23:55:17,603 INFO org.apache.hadoop.hdfs.server.common.Storage: Analyzing storage directories for bpid BP-1735478683-10.1.0.31-14339923267632017-12-20 23:55:17,604 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to analyze storage directories for block pool BP-1735478683-10.1.0.31-1433992326763java.io.IOException: BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /data/data/hadoop/datanode/current/BP-1735478683-10.1.0.31-1433992326763 at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.loadBpStorageDirectories(BlockPoolSliceStorage.java:210) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:242) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:381) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:462) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:722)2017-12-20 23:55:17,606 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage for block pool: BP-1735478683-10.1.0.31-1433992326763 : BlockPoolSliceStorage.recoverTransitionRead: attempt to load an used block storage: /data/data/hadoop/datanode/current/BP-1735478683-10.1.0.31-14339923267632017-12-20 23:55:17,644 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /data1/data/hadoop/datanode/in_use.lock acquired by nodename 5774@NDAPP-DATA-112017-12-20 23:55:17,645 WARN org.apache.hadoop.hdfs.server.common.Storage: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /data1/data/hadoop/datanode is in an inconsistent state: Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.2017-12-20 23:55:17,645 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool(Datanode Uuid unassigned) service to NDAPP-DATA-09/10.1.0.32:9000. Exiting. java.io.IOException: All specified directories are failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:463) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1318) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:722)2017-12-20 23:55:17,645 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-08/10.1.0.31:9000. Exiting. org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 1, volumes configured: 2, volumes failed: 1, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl. (FsDatasetImpl.java:247) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1331) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1288) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:320) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:221) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:722)2017-12-20 23:55:17,645 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-09/10.1.0.32:90002017-12-20 23:55:17,646 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool (Datanode Uuid unassigned) service to NDAPP-DATA-08/10.1.0.31:90002017-12-20 23:55:17,747 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool (Datanode Uuid unassigned)2017-12-20 23:55:19,747 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode2017-12-20 23:55:19,749 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 02017-12-20 23:55:19,751 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down DataNode at XXXXXX/10.1.0.71************************************************************/
最终发现无法启动原因是
Root /data1/data/hadoop/datanode: DatanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598, does not match b6c8b918-fa63-4812-95bc-c399b4f30031 from other StorageDirectory.
解决办法:
vim /data1/data/hadoop/namenode/current/VERSION
根据日志提示,将原来的UUID换成新的
#Thu Dec 21 00:02:59 CST 2017storageID=DS-14885ae9-613f-4e9d-b9f3-6e672101bcd9clusterID=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3cTime=0datanodeUuid=c9ee0ab8-45a3-4709-8fc2-35fe365ed598storageType=DATA_NODElayoutVersion=-56
换成
#Thu Dec 21 00:02:59 CST 2017storageID=DS-14885ae9-613f-4e9d-b9f3-6e672101bcd9clusterID=CID-c1b0775b-e6f8-4bf9-bd3c-d3cd953ae8b3cTime=0datanodeUuid=b6c8b918-fa63-4812-95bc-c399b4f30031storageType=DATA_NODElayoutVersion=-56
然后启动datanode即可~ 不明白为啥这个ID会变~