文章目录

前言
准备
Core Code

创建文件夹
创建新文件
读取文件
文件是否存在
下载文件从指定目录
上传文件到指定目录
删除文件或文件夹
追加内容
重命名文件或文件夹
列出指定文件夹的文件以及文件夹信息
列出指定路径所有文件信息

Github
总结

前言

在这里插入图片描述

Hadoop家族有很多重要成员,下面列出来的是准备去搞的。

hdfs
hbase
hive
sqoop
zookper
flume

原理的东西，说简单很简单，说复杂很复杂。

小编不跟大神比拼，但是会写一些遇到的坑，思考，总结。

步入正题，HDFS是什么。HDFS是一个分布式文件系统。

针对技术学习上的总结方法，我一般持有这几个点：

来历、特点、解决的问题、应用场景

维基百科解析：是一种允许文件透过网上在多台主机上分享的文件系统，可让多机器上的多用户分享文件和存储空间

这篇博客主要是写一下Java API的操作，后续更新在服务器上的HDFS Shell命令。

同步更新Spark。

深入到源码和原理，我希望另外写文章。先把代码敲起来，后面就会好奇源码和实现的原理了。

大家可以关注代码和小编的总结点。

准备

开发环境
依赖引入
了解一下HDFS基本操作

既然是文件系统，打开你的WIN10系统，看看最常用的操作是什么呢？

Core Code

创建文件夹

/**
     * 创建文件夹.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testMakeDir() throws IOException {
        fileSystem.mkdirs(new Path("D:\\test\\test"));
    }

创建新文件

/**
     * 创建文件，参数二true代表存在即覆盖.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testCreateFile() throws IOException {
        fileSystem.create(new Path("D:\\test\\test\\demo.txt"), true);
    }

    /**
     * 创建新文件 .
     * 不同于create是 先执行 exists方法查看文件是否存在，不存在才创建
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testCreateNewFile() throws IOException {
        fileSystem.createNewFile(new Path("D:\\test\\test\\demo.txt"));
    }

读取文件

/**
     * 读取文件并打印
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testReadFile() throws IOException {
        FSDataInputStream fsDataInputStream = fileSystem.open(new Path("D:\\test\\test.txt"));
        IOUtils.copyBytes(fsDataInputStream, System.out, configuration);
    }

文件是否存在

/**
     * 文件是否存在.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testExist() throws IOException {
        boolean exists = fileSystem.exists(new Path("D:\\test"));
        System.out.println(exists);
    }

下载文件从指定目录

 /**
     * 下载文件从指定目录.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void downLoadFile() throws IOException {
        fileSystem.copyFromLocalFile(new Path("D:\\test\\distance-final.txt"), new Path("D:\\test\\test\\"));
    }

上传文件到指定目录

/**
     * 上传文件到指定目录.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void uploadFile() throws IOException {
        fileSystem.copyFromLocalFile(new Path("D:\\test\\demo.txt"), new Path("D:"));
    }

删除文件或文件夹

/**
     * 删除文件或文件夹.
     * 参数二的true代表 递归删除
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void deleteFile() throws IOException {
        fileSystem.delete(new Path("D:/test/test"), true);
    }

追加内容

/**
     * 追加内容.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testAppendContent() throws IOException {
        configuration.set("dfs.support.append", "true");
        FSDataOutputStream fsDataOutputStream = fileSystem.append(new Path("D:/test/test/demo.txt"));
        fsDataOutputStream.write(new String("test something ").getBytes());
    }

重命名文件或文件夹

 /**
     * 重命名文件或文件夹.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testRename() throws IOException {
        fileSystem.rename(new Path("D:/test/test/demo.txt"), new Path("D:/test/test/demo1.txt"));
    }

列出指定文件夹的文件以及文件夹信息

/**
     * 列出指定文件夹的文件以及文件夹信息.
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testListStatus() throws IOException {
        FileStatus[] fileStatuses = fileSystem.listStatus(new Path("D:/test"));
        for (FileStatus fileStatus : fileStatuses) {
            System.out.println(fileStatus.getPath().toString());
        }
    }

列出指定路径所有文件信息

/**
     * 列出指定路径所有文件信息.
     * listFiles第二个参数 true 递归查找 会把子文件夹的文件信息也查找出来
     *
     * @throws IOException the io exception
     * @since hui_project 1.0.0
     */
    @Test
    public void testListFile() throws IOException {
        RemoteIterator<LocatedFileStatus> fileStatusRemoteIterator = fileSystem.listFiles(new Path("D:/test"), true);
        while (fileStatusRemoteIterator.hasNext()) {
            LocatedFileStatus next = fileStatusRemoteIterator.next();
            System.out.println(next.getPath());
        }
    }

Github

github更新了HDFS常用操作

常用操作在com.hui.bigdata.hadoop.hdfs.HDFSTest

https://github.com/ithuhui/hui-bigdata-hadoop

总结

针对技术学习上的总结方法，我一般持有这几个点：

来历、特点、解决的问题、应用场景

为什么会出现他：常说大数据分析，分析的前提是你有数据，那那么多数据总得找地方存吧。

数据越来越大了，存哪里？一台机器不够，那就多台。

主要理念就是：分块，不管文件本身多大，分块之后都会变得更易于存储。

应用场景不用多说了：就是应用在数据的存储（数据量大）。

特点：

存储量大（存储空间）
可运行在廉价通用的服务器上（降低成本）
不适合访问要求低延迟的系统（HDFS是为高数据吞吐量应用而设计的，必然以高延迟为代价）
存储小文件（分块存储，小文件同样占用一块，尽管不满一块）

小博主更新很累的…，由简单入手，我进步一点，深入一点，就更新更深入的内容，