HADOOP 之坑

hadoop

标签: ubuntu hdfs API


概述

通过API访问hdfs文件系统,出现错误:WARN util.Shell:Did not find winutils.exe:{}
HADOOP_HOME and hadoop.home.dir are unset. -see https://
代码如下:

package big.data.hdfs;

import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;

/*
 * 通过API访问hdfs文件系统
 */
public class TestHDFS {
    static {
        //注册协处理器工厂,让java程序能够识别hdfs协议
        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());    
    }
    public static void main(String[] args) throws Exception {       
            
        //定义url地址
        String url = "hdfs://s100:8020/user/ubuntu/how.txt";
        //URL对象
        URL u = new URL(url);
        //URL连接
        URLConnection conn = u.openConnection();
        //打开输入流
        InputStream is = conn.getInputStream();
        //输出流
        FileOutputStream fos = new FileOutputStream("d:/hello.txt");
        //对拷贝
        byte[] buf = new byte[1024];
        int len = -1;
        while ( (len = is.read(buf)) != -1) {
            fos.write(buf, 0, len);
        }
        is.close();
        fos.close();
        System.out.println("over");
    }
}

错误原因:需要在Windows电脑上配置一下hadoop,配置过程如下:

配置hadoop

配置环境变量,在path中添加路径和HADOOP_HOME,path中添加 D:\hadoop-2.8.5\hadoop-2.8.5和 D:\hadoop-2.8.5\hadoop-2.8.5\sbin , HADOOP_HOME 也设为D:\hadoop-2.8.5\hadoop-2.8.5。
(我设为D:\hadoop-2.8.5\hadoop-2.8.5\bin 反而会报错,不知道问什么)

配置hadoop文件

所涉及到的配置都在 \hadoop\etc\hadoop\hadoop 目录下,均使用记事本打开
注意:JDK的环境变量不要有空格

  • 文件1: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\hadoop-env.cmd
set JAVA_HOME=C:\ProgramFiles\Java\jdk1.8.0_181
  • 文件2:D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\core-site.xml
  <configuration>
    <property>  
    <name>fs.default.name</name>  
    <value>hdfs://localhost:9000</value>  
   </property>     
  </configuration>
  • 文件3: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\hdfs-site.xml
<configuration>
<property>  
  <name>dfs.replication</name>  
  <value>1</value>  
</property>
<property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/hadoop/data/dfs/datanode</value>
</property> 
</configuration>
  • 文件4: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\mapred-site.xml ,mapred-site.xml是复制mapred-site.xml.template,去掉template
<configuration>
    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
    </property>
</configuration>
  • 文件5: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\yarn-site.xml
<configuration>
    <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
    </property>
    <property>
       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

启动hadoop

在cmd控制台进入D:\hadoop-2.8.5\hadoop-2.8.5\sbin> 然后

hadoop namenode -format //格式化hdfs
start-all.cmd

把下载的winutils.exe 放在D:\hadoop-2.8.5\hadoop-2.8.5\bin 目录下, 注意版本对应。
重启eclipse,运行,仍然有个警告:貌似是32位64位的问题,不重要了。
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

感谢:https://www.cnblogs.com/shizhijie/p/9034643.html

猜你喜欢

转载自www.cnblogs.com/zhqin/p/10216790.html