Hadoop安装
0.安装java环境
<!--[if !supportLists]-->1. <!--[endif]-->下载版本1.2.1版本Hadoop
<!--[if !supportLists]-->2. <!--[endif]-->tar zxvf 解压缩hadoop到相应目录
<!--[if !supportLists]-->3. <!--[endif]-->需要配置的4个文件:hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、masters、slaves
4、文件说明:
hadoop-env.sh是hadoop启动环境设置
hdfs-site.xml 是分布式文件系统相关配置
mapred-site.xml是map-reduce任务相关配置
master 表示主从机器(主机器是nameNode,丛就是dataNode)
slaver 表示丛机器
5、hadoop-env.sh配置详细:
# Set Hadoop-specific environment variables here.
# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.
# The java implementation to use. Required. 配置该项目,设置jDK所在目录
export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45
# Extra Java CLASSPATH elements. Optional.可选的配置CLASSPATH
# export HADOOP_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000. 配置虚拟机内存
# export HADOOP_HEAPSIZE=2000
# Extra Java runtime options. Empty by default. 配置模式,肯定是server模式了,client说明内存太小了,是java虚拟机模式,设计的内存回收等相关算法
# export HADOOP_OPTS=-server
# Command specific options appended to HADOOP_OPTS when specified 配置JMX管理配置
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS
# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"
# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs
# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves
# host:path where hadoop code should be rsync'd from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1
# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the users that are going to run the hadoop daemons. Otherwise there is
# the potential for a symlink attack.
# export HADOOP_PID_DIR=/var/hadoop/pids
# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HADOOP_NICENESS=10
由于以上配置设计到 ${HADOOP_HOME} 所以需要在/etc/profile 配置
export HADOOP_HOME=/usr/local/hadoop/hadoop 就是Hadoop所在的目录
5、core-site.xml核心配置:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<!—配置端口号 hadoop1 是本机机器名称需要在hosts文件夹下映射配置,也可以配置IP地址 - ->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop1:9000</value>
</property>
<!-- 配置文件系统所在目录 à
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop/tmp</value>
</property>
</configuration>
6、配置hdfs-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 是否打开权限检查 à
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- 副本复制的数目 à
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
7、配置mapred-site.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 配置jobtracker端口号,dataNode会使用 à
<property>
<name>mapred.job.tracker</name>
<value>hdfs://hadoop1:9001</value>
</property>
</configuration>
8、master配置:
配置那些是master可以配置机器名称也可以配置IP地址
例如:
192.168.100.101
192.168.100.102
9、salver配置:
配置那些事slaver
192.168.100.101
192.168.100.102
10、配置SSH无密登录
免密码ssh设置(合适的权限很重要)
登入hadoop账户,建立ssh文件夹 mkdir .ssh
现在确认能否不输入口令就用ssh登录本机:
$ ssh namenode
如果不输入口令就无法用ssh登陆namenode,执行下面的命令:
$ ssh-keygen -t rsa –f ~/.ssh/id_rsa
回车设置密钥,可以设置一个密钥,也可以设置为空密钥。但是安全起见,设置密钥为hadoop,下面利用ssh-agent来设置免密钥登陆集群中其他机器。
私钥放在由-f选项指定的文件之中,例如~/.ssh/id_rsa。存放公钥的文件名称与私钥类似,但是以’.pub’为后缀,本例为~/.ssh/id_rsa.pub
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
将ssh密钥存放在namenode机器的~/.ssh/authorized_keys文件中。
再用scp命令将authorized_keys和id_rsa.pub分发到其它机器的相同文件夹下,例如,将authorized_keys文件分发给datanode1机器的.ssh文件夹的命令为:
$scp ~/.ssh/id_dsa.pub hadoop@datanode1:/home/hadoop/.ssh
到datanode1机器上 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
理论上要将公钥发放给各机器然后再各个机器上生成authorized_keys文件,实际上直接分发authorized_keys文件也可以
然后赋予文件权限,赋予各机器的.ssh文件夹权限为711,authorized_keys的权限为644。过高的权限不能通过ssh-add命令,过低的权限无法实现免密码登录。
11、启动start-all.sh 就可以启动hadoop 集群了。其他丛机器会自动被ssh无密码登录启动起来。nameNode会启动其他子节点的DataNode。
12、使用jps查看
主机器:
[root@hadoop1 conf]# jps
32365 JobTracker
32090 NameNode
25900 Jps
3017 Bootstrap
32269 SecondaryNameNode
丛机器:
[root@hadoop2 ~]# jps
10009 TaskTracker
9901 DataNode
27852 Jps
恭喜你启动成功了
Hbase安装
由于hbase 和hadoop需要搭配版本的,我用的是1.2.1那么hbase用的是hbase-0.94.23
<!--[if !supportLists]-->1. <!--[endif]-->tar zxvf 解压缩 hbase-0.94.23.tar.gz
<!--[if !supportLists]-->2. <!--[endif]-->修改配置文件hbase-env.sh、hbase-site.xml
<!--[if !supportLists]-->3. <!--[endif]-->Hbase-env.sh
#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements. See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership. The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License. You may obtain a copy of the License at
# *
# * http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */
# Set environment variables here.
# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)
# The java implementation to use. Java 1.6 required.指定JDK所在的目录
# export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45/
# Extra Java CLASSPATH elements. Optional.
# export HBASE_CLASSPATH=
# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000
# Extra Java runtime options.
# Below are what we set by default. May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"
# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.
# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# Uncomment one of the below three options to enable java garbage collection logging for the client processes.
# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"
# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"
# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"
# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.
# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# File naming hosts on which HRegionServers will run. $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers
# File naming hosts on which backup HMaster will run. $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
# Extra ssh options. Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"
# Where log files are stored. $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"
# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER
# The scheduling priority for daemon processes. See 'man nice'.
# export HBASE_NICENESS=10
# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids
# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
# export HBASE_MANAGES_ZK=true
4、配置hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop1:9000/hbase</value>
</property>
<!-- 设置hbase数据存在位置 -->
<property>
<name>hbase.rootdir</name>
<value>file:///usr/local/hadoop/hbase/hbase-0.94.23/data</value>
</property>
<!-- 配置Hbase是分布式模式 -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value>hdfs://hadoop1:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hadoop1,hadoop2</value>
</property>
</configuration>
~