hadoop deployed on centos 7

First, the preparatory work
to prepare three centOS 7

turn off the firewall:
      systemctl STOP firewalld.service
       disable the firewall: systemctl disable firewalld.service
       view the status of the firewall firewall-cmd --state
       reboot reboot

Second, to achieve three machines free ssh password
First of all, you have to make sure you know the three machine name and ip
example: my three are

10.25.0.165 hadoop01
10.25.0.221 hadoop02
10.25.0.232 hadoop03
1. check your machine's IP name and
check the machine name

account log in with the root, and then use the hostname command to check the machine name

[root @ localhost etc] # hostname
localhost.localdomain
[root @ localhost etc] #
put his name we want to modify

hostname hadoop01
check after the modification, if the modification is unsuccessful, you can enter the name of the configuration file modify.
vim / etc / hostname ## be modified in vi
Similarly, the two other machines, and are renamed hadoop02 hadoop03

Use ifconfig to check the computer's ip

[root @ hadoop01 etc] # ifconfig
 
if your virtual machine using a bridge, after ifconfig you still unsure about your IP, recommended tool to connect CRT try to successfully China Unicom is correct the IP.

2. Modify the / etc / hosts file
to modify three machines / etc / hosts file, add the following to the inside (add on the line, do not need to delete)
modification method: You can use vim command, you can also write a hosts file, take cover on linux.

Hadoop01 10.25.0.165
10.25.0.221 hadoop02
10.25.0.232 hadoop03
Tip: IP address, and I do not need, like here just to make a map.
After configuration, the machine checks whether the three ping have links with each other (each check) ping command.

[@ hadoop01 the root etc] # of ping. 3 -C hadoop02
the PING hadoop02 (10.25.0.221) 56 is (84) bytes of Data.
64 bytes from hadoop02 (10.25.0.221): icmp_seq = 64. 1 TTL = Time = 0.416 MS
64 bytes from hadoop02 (10.25.0.221): icmp_seq TTL = 2 = 64 Time = 0.431 MS
64 bytes from hadoop02 (10.25.0.221): icmp_seq = 64. 3 TTL = Time = 0.458 MS
 
Hadoop02 of ping statistics --- ---
. 3 Transmitted packets, Received. 3, 0% Packet Loss, 2003ms Time
RTT min / AVG / max / (mdev) = 0.416 / 0.435 / 0.458 / 0.017 MS
[@ hadoop01 the root etc] #
of ping have links , indicating the machine is interconnected, hosts configured correctly.


4.SSH free password - another reference to a blog

can test each other more than a few times between the three machines

Third, install jdk and hadoop
1. install jdk
see another tutorial

2. Download hadoop
HTTP: //hadoop.apache .org / releases.html
select the appropriate version for download, I use here it is 2.91
http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.9.1/hadoop-2.9. 1.tar.gz

3. upload files and extract the
directory in the opt directory hadoop of a new name, and upload the downloaded hadoop-2.9.1.tar.gz this directory.
Enter the directory, and decompression operation:

[hadoop01 the root @ ~] # CD / opt / Hadoop
[Hadoop hadoop01 the root @] # the tar -xvf Hadoop-2.9.1.tar.gz
This operation is repeated in the three machines.
Several new directory / root directory, copy and paste the following command:

mkdir / root / Hadoop
mkdir / root / Hadoop / tmp
mkdir / root / Hadoop / var
mkdir / root / Hadoop / DFS
mkdir / root / Hadoop / DFS / name
mkdir / root / hadoop / the DFS / the Data
4. modify a bunch of configuration files
into the /opt/hadoop/hadoop-2.9.1/etc/hadoop/
major changes configuration files are here, we can look at.

[root @ hadoop01 hadoop] # LS /opt/hadoop/hadoop-2.9.1/etc/hadoop/
Capacity-scheduler.xml Core-the site.xml hadoop-metrics2.properties HDFS-the site.xml httpfs-signature.secret kms- env.sh log4j.properties mapred-queues.xml.template slaves yarn- env.cmd
configuration.xsl hadoop-env.cmd hadoop-metrics.properties httpfs-env.sh httpfs- site.xml kms-log4j.properties mapred-env.cmd mapred-site.xml ssl-client.xml.example yarn-env.sh
container-executor.cfg hadoop-env.sh hadoop-policy.xml httpfs-log4j.properties kms-acls.xml kms-site.xml mapred-env.sh mapred-site.xml.template ssl-server.xml.example yarn -site.xml
[@ hadoop01 the root Hadoop] #
below modify the configuration file (or files Alternatively, the same operation as above) with vim command

! ! ! Details Note: some configuration files with hadoop01, and you need to replace the host name of your configuration, do not copy! ! !
1) modified core-site.xml

added in <configuration> node configuration:

 <Property>
        <name> hadoop.tmp.dir </ name>
        <value> / the root / Hadoop / tmp </ value>
        . <Description> OTHER Temporary Directories for ABase </ Description>
   </ Property>
   <Property>
        <name> fs.default.name </ name>
        <value> HDFS: // hadoop01: 9000 </ value>
   </ Property>
2) modify hadoop-env.sh

the

export JAVA_HOME = $ {JAVA_HOME}
modified to:

Export the JAVA_HOME = / opt / Java / jdk1.8.0_171
Remark: JDK path for their

modification hdfs-site.xml 3)

in <configuration > was added into the node configuration:

<Property>
   <name> dfs.name.dir </ name>
   <value> / the root / Hadoop / DFS / name </ value>
   <Description> the local filesystem the Path ON WHERE theNameNode the namespace and Stores transactions logs persistently. </ description>
</property>
<property>
   <name>dfs.data.dir</name>
   <value>/root/hadoop/dfs/data</value>
   <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
   <name>dfs.replication</name>
   <value>2</value>
</property>
<property>
      <name>dfs.permissions</name>
      <value>false</value>
      <description>need not permissions</description>
</property>


Description (I copied this period, not very understanding, in short, what I wrote is true): dfs.permissions configured after is false, you can not check the permissions allow you to generate files on the dfs, convenient touches easily, but you need preventing accidental erasure, set it to true, or delete nodes the property directly, because the default is true.

4) Create and modify mapred-site.xml

this version, there is a file named mapred-site.xml.template, copying the file, and then renamed mapred-site.xml, the command is:

CP / opt / Hadoop / hadoop-2.9.1 / etc / hadoop / mapred-site.xml.template /opt/hadoop/hadoop-2.9.1/etc/hadoop/mapred-site.xml
modify this new mapred-site.xml file, < configuration> was added into the node configuration:

<Property>
   <name> mapred.job.tracker </ name>
   <value> hadoop01: 49001 </ value>
</ Property>
<Property>
      <name> mapred.local.dir </ name >
       <value> / the root / Hadoop / var </ value>
</ Property>
<Property>
       <name>
       <value> Yarn </ value>
</ Property>
. 5) Modify slaves file

modification /opt/hadoop/hadoop-2.9.1/etc/hadoop/slaves file, delete the inside localhost, add the following:

hserver2
hserver3


Description: This hadoop01 only need to modify the host, the other two do not need to modify! ! !

6) modifying yarn-site.xml file

modification /opt/hadoop/hadoop-2.9.1/etc/hadoop/yarn-site.xml document, adding disposed in <configuration> node (note that, the greater the machine according to the configuration memory well, I'm here with only two G is because the machine does not work):

<Property>
        <name> yarn.resourcemanager.hostname </ name>
        <value> hadoop01 </ value>
   </ Property>
   <Property>
        <the Description> at The address . The Applications Manager of The RM interface in </ Description>
        <name> yarn.resourcemanager.address </ name>
        <value>} $ {yarn.resourcemanager.hostname: 8032 </ value>
   </ Property>
   <Property>
        < description> The address of the scheduler interface . </ description>

        <value>${yarn.resourcemanager.hostname}:8030</value>
   </property>
   <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>${yarn.resourcemanager.hostname}:8088</value>
   </property>
   <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>${yarn.resourcemanager.hostname}:8090</value>
   </property>
   <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>${yarn.resourcemanager.hostname}:8031</value>
   </property>
   <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value>${yarn.resourcemanager.hostname}:8033</value>
   </property>
   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
   </property>
   <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
        <discription>每个节点可用内存,单位MB,默认8182MB</discription>
   </property>
   <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
   </ Property>
   <Property>
        <name> yarn.nodemanager.resource.memory-MB </ name>
        <value> 2048 </ value>
</ Property>
   <Property>
        <name> yarn.nodemanager.vmem-Check-Enabled </ name>
        <value> false </ value>
</ Property>
Fourth, start hadoop
here, it is to examine the results of the time, excited about it!

1. On initialization is performed namenode

because hadoop01 is namenode, hadoop02 and hadoop03 Datanode are, it is only necessary to initialize hadoop01 operation, i.e. for hdfs format.

[root @ hadoop01 hadoop] # cd /opt/hadoop/hadoop-2.9.1/bin
[root @ hadoop01 bin] # ./hadoop the NameNode -format
...
...
not being given, then, is the successful implementation of the completion of initialization ;
after successfully formatted, you can see in / root / hadoop / dfs / name / directory more than a current directory, and there are a series of files within that directory:

[root @ hadoop01 bin] # cd / root / hadoop / the DFS / name /
[root @ hadoop01 name] # LS
Current in_use.lock
[root @ hadoop01 name] # LS Current /
edits_0000000000000000001-0000000000000000002 edits_0000000000000000005-0000000000000000006 fsimage_0000000000000000004 fsimage_0000000000000000006 seen_txid
edits_0000000000000000003- edits_inprogress_0000000000000000007 fsimage_0000000000000000004.md5 fsimage_0000000000000000006.md5 VERSION 0000000000000000004
[root @ hadoop01 name] #
2. execute the start command:
[root @ hadoop01 name] # cd /opt/hadoop/hadoop-2.9.1/sbin
[root @ hadoop01 sbin] #. /start-all.sh
...
...

five test hadoop
hadoop01 my own host, ip is 10.25.0.165
so visit:
http://10.25.0.165:50070/
http://10.25.0.165:8088/

Well, you succeeded a thing? !
Sixth, the problem
1.hadoop command is not available

to use after a successful installation and deployment of hadoop yesterday, using

hadoop fs -ls / *
command to view the HDFS file system and found an error

[root @ hadoop01 hadoop-2.9.1] # hadoop -ls FS / *
bash: hadoop: command not found ...
this is not because of configuration environment variable, use the command to modify the vim / etc / profile file, add the

export HADOOP_HOME = your hadoop installation path
export pATH = $ HADOOP_HOME / bin: $ HADOOP_HOME / sbin: $ the PATH
2.datanode does not start

when using start-all.sh start hadoop, you will find the master node namenode is started, but sub-node datanode failed to start. Continue to observe, you will find there is a datanode beginning, but after a while it starts automatically disappear.
Possible causes some say the Internet: a firewall is not closed (note the distinction between centOS6 and centOS7 command to turn off the firewall is not the same).
I am here after repeated inquiries found that the reason is because clusterID and namenode of datanode of clusterID do not match!

Open datanode and namenode corresponding directory hdfs-site.xml in configurations, respectively, to open the current folder in the VERSION, you can see clusterID items as log records, as indeed inconsistent.

[@ hadoop01 the root sbin] # CAT / the root / Hadoop / DFS / name / Current / VERSION
#Fri Jul-13 is 23:04:07 CST 2018
namespaceID = 781.01218 million
clusterID-b6934b47-4a9a the CID = 8291-cd153ef830ba-4e4c-
CTime = 1531494247761
= NAME_NODE StorageType
blockpoolID on BP-485123232-10.25.0.165-1531494247761 =
layoutVersion = -63
[@ hadoop01 the root sbin] # CAT / the root / Hadoop / DFS / Data / Current / VERSION
#Fri 18:27:19 CST 2018 Jul-13 is
the StorageID the DS-4423-b3dda351-22bb =-B058-df3b5af962ae
clusterID = Hadoop-Federation-ClusterID
CTime = 0
datanodeUuid = 9241c115-8068-46d4-956b-eb86b8b37b49
StorageType = DATA_NODE
= -57 layoutVersion
[root @ hadoop01 sbin] #
modifications clusterID namenode in agreement datanode in VERSION file and then restart dfs (execution start-all.sh) then execute jps command can see datanode has started normally.

Guess you like

Origin www.cnblogs.com/leolzi/p/10986337.html