Hadoop学习日志(一)

Hadoop安装

0.安装java环境

<!--[if !supportLists]-->1.       <!--[endif]-->下载版本1.2.1版本Hadoop

<!--[if !supportLists]-->2.       <!--[endif]-->tar zxvf  解压缩hadoop到相应目录

<!--[if !supportLists]-->3.       <!--[endif]-->需要配置的4个文件:hadoop-env.shcore-site.xmlhdfs-site.xmlmapred-site.xmlmastersslaves

4、文件说明:

hadoop-env.shhadoop启动环境设置

hdfs-site.xml  是分布式文件系统相关配置

mapred-site.xmlmap-reduce任务相关配置

master 表示主从机器(主机器是nameNode,丛就是dataNode

slaver 表示丛机器

扫描二维码关注公众号,回复: 593878 查看本文章

5hadoop-env.sh配置详细:

 

# Set Hadoop-specific environment variables here.

 

# The only required environment variable is JAVA_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.

 

# The java implementation to use.  Required. 配置该项目,设置jDK所在目录

export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45

 

# Extra Java CLASSPATH elements.  Optional.可选的配置CLASSPATH

# export HADOOP_CLASSPATH=

 

# The maximum amount of heap to use, in MB. Default is 1000. 配置虚拟机内存

# export HADOOP_HEAPSIZE=2000

 

# Extra Java runtime options.  Empty by default. 配置模式,肯定是server模式了,client说明内存太小了,是java虚拟机模式,设计的内存回收等相关算法

# export HADOOP_OPTS=-server

 

# Command specific options appended to HADOOP_OPTS when specified 配置JMX管理配置

export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"

export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"

export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"

export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"

# export HADOOP_TASKTRACKER_OPTS=

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)

# export HADOOP_CLIENT_OPTS

 

# Extra ssh options.  Empty by default.

# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

 

# Where log files are stored.  $HADOOP_HOME/logs by default.

# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

 

# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.

# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

 

# host:path where hadoop code should be rsync'd from.  Unset by default.

# export HADOOP_MASTER=master:/home/$USER/src/hadoop

 

# Seconds to sleep between slave commands.  Unset by default.  This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HADOOP_SLAVE_SLEEP=0.1

 

# The directory where pid files are stored. /tmp by default.

# NOTE: this should be set to a directory that can only be written to by

#       the users that are going to run the hadoop daemons.  Otherwise there is

#       the potential for a symlink attack.

# export HADOOP_PID_DIR=/var/hadoop/pids

 

# A string representing this instance of hadoop. $USER by default.

# export HADOOP_IDENT_STRING=$USER

 

# The scheduling priority for daemon processes.  See 'man nice'.

# export HADOOP_NICENESS=10

 

 

由于以上配置设计到 ${HADOOP_HOME} 所以需要在/etc/profile  配置

export HADOOP_HOME=/usr/local/hadoop/hadoop 就是Hadoop所在的目录

 

 

5core-site.xml核心配置:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

<!—配置端口号 hadoop1 是本机机器名称需要在hosts文件夹下映射配置,也可以配置IP地址   - ->

<configuration>

  <property>

   <name>fs.default.name</name>

    <value>hdfs://hadoop1:9000</value>

  </property>

<!--  配置文件系统所在目录 à

  <property> 

  <name>hadoop.tmp.dir</name> 

  <value>/usr/local/hadoop/hadoop/tmp</value>

  </property> 

</configuration>

 

6、配置hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

<!--  是否打开权限检查 à

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<!--  副本复制的数目 à

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

</configuration>

 

7、配置mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

 

<!-- Put site-specific property overrides in this file. -->

 

<configuration>

<!--  配置jobtracker端口号,dataNode会使用 à

<property>

<name>mapred.job.tracker</name>

<value>hdfs://hadoop1:9001</value>

</property>

</configuration>

8master配置:

配置那些是master可以配置机器名称也可以配置IP地址

例如:

192.168.100.101

192.168.100.102

9salver配置:

配置那些事slaver

192.168.100.101

192.168.100.102

10、配置SSH无密登录

免密码ssh设置(合适的权限很重要)

登入hadoop账户,建立ssh文件夹    mkdir .ssh

现在确认能否不输入口令就用ssh登录本机:
$ ssh namenode

如果不输入口令就无法用ssh登陆namenode,执行下面的命令:
$ ssh-keygen -t rsa –f ~/.ssh/id_rsa 

回车设置密钥,可以设置一个密钥,也可以设置为空密钥。但是安全起见,设置密钥为hadoop,下面利用ssh-agent来设置免密钥登陆集群中其他机器。

私钥放在由-f选项指定的文件之中,例如~/.ssh/id_rsa。存放公钥的文件名称与私钥类似,但是以’.pub’为后缀,本例为~/.ssh/id_rsa.pub

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh密钥存放在namenode机器的~/.ssh/authorized_keys文件中。

再用scp命令将authorized_keysid_rsa.pub分发到其它机器的相同文件夹下,例如,将authorized_keys文件分发给datanode1机器的.ssh文件夹的命令为:

$scp ~/.ssh/id_dsa.pub  hadoop@datanode1:/home/hadoop/.ssh

datanode1机器上  cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

理论上要将公钥发放给各机器然后再各个机器上生成authorized_keys文件,实际上直接分发authorized_keys文件也可以

然后赋予文件权限,赋予各机器的.ssh文件夹权限为711authorized_keys的权限为644。过高的权限不能通过ssh-add命令,过低的权限无法实现免密码登录。

 

11、启动start-all.sh 就可以启动hadoop 集群了。其他丛机器会自动被ssh无密码登录启动起来。nameNode会启动其他子节点的DataNode

12、使用jps查看

主机器:

[root@hadoop1 conf]# jps

32365 JobTracker

32090 NameNode

25900 Jps

3017 Bootstrap

32269 SecondaryNameNode

丛机器:

[root@hadoop2 ~]# jps

10009 TaskTracker

9901 DataNode

27852 Jps

恭喜你启动成功了

 

 

 

Hbase安装

由于hbase hadoop需要搭配版本的,我用的是1.2.1那么hbase用的是hbase-0.94.23

<!--[if !supportLists]-->1.       <!--[endif]-->tar  zxvf 解压缩 hbase-0.94.23.tar.gz

<!--[if !supportLists]-->2.       <!--[endif]-->修改配置文件hbase-env.shhbase-site.xml

<!--[if !supportLists]-->3.       <!--[endif]-->Hbase-env.sh

#

#/**

# * Copyright 2007 The Apache Software Foundation

# *

# * Licensed to the Apache Software Foundation (ASF) under one

# * or more contributor license agreements.  See the NOTICE file

# * distributed with this work for additional information

# * regarding copyright ownership.  The ASF licenses this file

# * to you under the Apache License, Version 2.0 (the

# * "License"); you may not use this file except in compliance

# * with the License.  You may obtain a copy of the License at

# *

# *     http://www.apache.org/licenses/LICENSE-2.0

# *

# * Unless required by applicable law or agreed to in writing, software

# * distributed under the License is distributed on an "AS IS" BASIS,

# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# * See the License for the specific language governing permissions and

# * limitations under the License.

# */

 

# Set environment variables here.

 

# This script sets variables multiple times over the course of starting an hbase process,

# so try to keep things idempotent unless you want to take an even deeper look

# into the startup scripts (bin/hbase, etc.)

 

# The java implementation to use.  Java 1.6 required.指定JDK所在的目录

# export JAVA_HOME=/usr/local/jdk/jdk1.6.0_45/

 

# Extra Java CLASSPATH elements.  Optional.

# export HBASE_CLASSPATH=

 

# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000

 

# Extra Java runtime options.

# Below are what we set by default.  May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning

export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

 

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

 

# This enables basic gc logging to the .out file.

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

 

# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

 

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

 

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

 

# This enables basic gc logging to the .out file.

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

 

# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

 

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

 

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.

 

 

# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"

 

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.

# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

 

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.

# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

 

# Extra ssh options.  Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

 

# Where log files are stored.  $HBASE_HOME/logs by default.

# export HBASE_LOG_DIR=${HBASE_HOME}/logs

 

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

 

# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER

 

# The scheduling priority for daemon processes.  See 'man nice'.

# export HBASE_NICENESS=10

 

# The directory where pid files are stored. /tmp by default.

# export HBASE_PID_DIR=/var/hadoop/pids

 

# Seconds to sleep between slave commands.  Unset by default.  This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1

 

# Tell HBase whether it should manage it's own instance of Zookeeper or not.

# export HBASE_MANAGES_ZK=true

 

4、配置hbase-site.xml

<configuration>

 

<property>

 <name>hbase.rootdir</name>

 <value>hdfs://hadoop1:9000/hbase</value>

 </property>

<!-- 设置hbase数据存在位置  -->

<property>

<name>hbase.rootdir</name>

<value>file:///usr/local/hadoop/hbase/hbase-0.94.23/data</value>

</property>

 

<!-- 配置Hbase是分布式模式  -->

<property>

<name>hbase.cluster.distributed</name>

<value>true</value>

</property>

 

 

<property>

<name>hbase.master</name>

<value>hdfs://hadoop1:60000</value>

</property>

 

<property>

<name>hbase.zookeeper.quorum</name>

<value>hadoop1,hadoop2</value>

</property>

 

</configuration>

~

猜你喜欢

转载自hougechuanqi.iteye.com/blog/2119644