Hadoop
1.安装
不建议使用root用户操作Hadoop,创建一个新用户
解压hadoop[root@localhost ~]# useradd -u 1005 wpf ## 切换用户 [root@localhost ~]# su - wpf
配置jdk路径## 解压 [wpf@localhost ~]$ tar zxf hadoop-2.7.3.tar.gz ## 进入目录下 [wpf@localhost ~]$ cd hadoop-2.7.3/ [wpf@localhost hadoop-2.7.3]$ cd etc/hadoop/
## 配置jdk路径 [wpf@localhost hadoop]$ vim hadoop-env.sh # The java implementation to use. export JAVA_HOME=/usr/java/jdk1.8.0_121/
2.单机测试
在本机查找数据,大数据一般应用于海量数据的查询
通过执行对应的jar包,查询数据## 创建input目录,将etc/hadoop下的所有xml复制到input目录 [wpf@localhost hadoop-2.7.3]$ mkdir input/ [wpf@localhost hadoop-2.7.3]$ cp etc/hadoop/*.xml input
## 查看所有jar包 [wpf@localhost hadoop-2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/ ## 执行jar包,筛选dfs开头的到output中 [wpf@localhost hadoop-2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+' ## 查看筛选的结果 [wpf@localhost hadoop-2.7.3]$ cat output/* 1 dfsadmin
3.伪分布式
分布式编程需要多个主机,没有条件,所以我们在一个主机上进行,称为伪分布式
编辑两个配置文件
配置密钥[wpf@localhost hadoop]$ vim core-site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://172.25.254.112:9000</value> ## 本机ip </property> </configuration> ## 配置 [wpf@localhost hadoop]$ vim hdfs-site.xml <configuration> <property> <name>dfs.replication</name> <value>l</value> </property> </configuration>
修改配置的ip地址[wpf@localhost ~]$ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/wpf/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/wpf/.ssh/id_rsa. Your public key has been saved in /home/wpf/.ssh/id_rsa.pub. The key fingerprint is: a7:f9:2b:0c:b9:be:10:d1:cd:b8:4e:13:81:96:37:00 wpf@localhost The key's randomart image is: +--[ RSA 2048]----+ | E..+. | | +.o= | | ...+.o | | . o | | . +.S . | | +o. + | | . .+o | | .. o. | | .o. .o. | +-----------------+
连接此ip主机,在此之前,先在root下设置用户的密码[wpf@localhost ~]$ cd - /home/wpf/hadoop-2.7.3/etc/hadoop ## 修改slaves内容为ip地址 [wpf@localhost hadoop]$ vim slaves
启动start-dfs.sh## 连接ip [wpf@localhost hadoop]$ ssh 172.25.254.112 [email protected]'s password: Last login: Thu Jan 11 03:48:29 2018 [wpf@localhost ~]$ ssh-copy-id 172.25.254.112 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys [email protected]'s password: Number of key(s) added: 1 Now try logging into the machine, with: "ssh '172.25.254.112'" and check to make sure that only the key(s) you wanted were added.
通过浏览器访问,或通过命令行打印报告(显示为双核)[wpf@localhost hadoop-2.7.3]$ sbin/start-dfs.sh 172.25.254.112: starting namenode, logging to /home/wpf/hadoop-2.7.3/logs/hadoop-wpf-namenode-localhost.out 172.25.254.112: starting datanode, logging to /home/wpf/hadoop-2.7.3/logs/hadoop-wpf-datanode-localhost.out Starting secondary namenodes [0.0.0.0] The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established. ECDSA key fingerprint is eb:24:0e:07:96:26:b1:04:c2:37:0c:78:2d:bc:b0:08. Are you sure you want to continue connecting (yes/no)? yes 0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts. 0.0.0.0: starting secondarynamenode, logging to /home/wpf/hadoop-2.7.3/logs/hadoop-wpf-secondarynamenode-localhost.out ## 查看jps [wpf@localhost hadoop-2.7.3]$ jps 4072 DataNode 4394 Jps ## jps路径 [wpf@localhost hadoop-2.7.3]$ which jps /usr/bin/jps
[wpf@www hadoop-2.7.3]$ bin/hdfs dfsadmin -report Configured Capacity: 10725273600 (9.99 GB) Present Capacity: 5367533568 (5.00 GB) DFS Remaining: 5367529472 (5.00 GB) DFS Used: 4096 (4 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 Missing blocks (with replication factor 1): 0 ------------------------------------------------- Live datanodes (1): Name: 172.25.254.193:50010 (www.westos.org) Hostname: www.westos.org Decommission Status : Normal Configured Capacity: 10725273600 (9.99 GB) DFS Used: 4096 (4 KB) Non DFS Used: 5357740032 (4.99 GB) DFS Remaining: 5367529472 (5.00 GB) DFS Used%: 0.00% DFS Remaining%: 50.05% Configured Cache Capacity: 0 (0 B) Cache Used: 0 (0 B) Cache Remaining: 0 (0 B) Cache Used%: 100.00% Cache Remaining%: 0.00% Xceivers: 1 Last contact: Thu Jan 11 04:20:08 EST 2018