hadoop
标签: ubuntu hdfs API
概述
通过API访问hdfs文件系统,出现错误:WARN util.Shell:Did not find winutils.exe:{}
HADOOP_HOME and hadoop.home.dir are unset. -see https://
代码如下:
package big.data.hdfs;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
/*
* 通过API访问hdfs文件系统
*/
public class TestHDFS {
static {
//注册协处理器工厂,让java程序能够识别hdfs协议
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
public static void main(String[] args) throws Exception {
//定义url地址
String url = "hdfs://s100:8020/user/ubuntu/how.txt";
//URL对象
URL u = new URL(url);
//URL连接
URLConnection conn = u.openConnection();
//打开输入流
InputStream is = conn.getInputStream();
//输出流
FileOutputStream fos = new FileOutputStream("d:/hello.txt");
//对拷贝
byte[] buf = new byte[1024];
int len = -1;
while ( (len = is.read(buf)) != -1) {
fos.write(buf, 0, len);
}
is.close();
fos.close();
System.out.println("over");
}
}
错误原因:需要在Windows电脑上配置一下hadoop,配置过程如下:
配置hadoop
配置环境变量,在path中添加路径和HADOOP_HOME,path中添加 D:\hadoop-2.8.5\hadoop-2.8.5和 D:\hadoop-2.8.5\hadoop-2.8.5\sbin , HADOOP_HOME 也设为D:\hadoop-2.8.5\hadoop-2.8.5。
(我设为D:\hadoop-2.8.5\hadoop-2.8.5\bin 反而会报错,不知道问什么)
配置hadoop文件
所涉及到的配置都在 \hadoop\etc\hadoop\hadoop 目录下,均使用记事本打开
注意:JDK的环境变量不要有空格
- 文件1: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\hadoop-env.cmd
set JAVA_HOME=C:\ProgramFiles\Java\jdk1.8.0_181
- 文件2:D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
- 文件3: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data/dfs/datanode</value>
</property>
</configuration>
- 文件4: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\mapred-site.xml ,mapred-site.xml是复制mapred-site.xml.template,去掉template
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- 文件5: D:\hadoop-2.8.5\hadoop-2.8.5\etc\hadoop\yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
启动hadoop
在cmd控制台进入D:\hadoop-2.8.5\hadoop-2.8.5\sbin> 然后
hadoop namenode -format //格式化hdfs
start-all.cmd
把下载的winutils.exe 放在D:\hadoop-2.8.5\hadoop-2.8.5\bin 目录下, 注意版本对应。
重启eclipse,运行,仍然有个警告:貌似是32位64位的问题,不重要了。
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable