Eclipse environment is developed based on HDFS API


Read files using IOUtils

1. Document preparation

Upload the README.txt file to HDFS and customize the file content.

Insert image description here

2. Download and install Eclipse

Insert image description here

Insert image description here

  • The above is the installation package method. After downloading, find the download location and click eclipse-inst to install. Select java.
  • The following is the corresponding source code download. Eclipse can be developed in multiple languages. We choose the java version.
  • After choosing any of the above methods, click here a few times. Sometimes it is a bit slow.

3. Open eclipse, create a new java project, and add some packages about hadoop

Create a new java project, and any of the following methods in the red box will work:

Insert image description here

Here we choose the Java version installed on our Linux system instead of the version that comes with eclipse to avoid subsequent problems (which will be mentioned later):

Insert image description here

Create the package in the project:

Insert image description here

Insert image description here

Add some hadoop jar packages we want to use:

Insert image description here

Insert image description here

  • The package is in the corresponding hadoop installation directory: /home/chenqi/hadoop-3.3.6/share/hadoop

  • Just import the packages in common in the hadoop directory and the packages under lib in it.

    Insert image description here

  • Correspondingly, there are also HDFS, mapreduce, and yarn related packages, which are also operated in the same way.

4. Create new classes in the package for development

Insert image description here

Insert image description here

Insert image description here

Code analysis:

package org.chenqi.hadoop.hdfs.fs;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDatalnputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class FileSystemCat {
    
    

       public static void main(String[] args) {
    
    
           //设置输入文件路径
           String uri ="hdfs://master:9000/README.txt";
           //配置项对象
           Configuration conf = new Configuration();
           //初始化FileSystem及输入流对象FSDataInputStream
           FileSystem fs = null;
           FSDataInputStream in = null;

           try{
    
    
               //给FileSystem对象赋值
               fs = FileSystem.get(conf);
               //打开uri位置的文件的输入流
               in = fs.open(new Path(uri));
               //使用IO工具类,将输入流拷贝到标准输出流中,每次拷贝4096字节,且流不自动关闭
               IOUtils.copyBytes(in,System.out,4096,false);
           }catch(IOException e){
    
    
               e.printStackTrace();
           }finally{
    
    //在finally中,做关闭操作
               if(in != null){
    
    
                   //关闭输入流
                   IOUtils.closeStream(in);
               }
               if(fs !=null){
    
    
                   try{
    
    
                       //关闭文件系统
                       fs.close();
                   }catch(IOException e){
    
    
                       e.printStackTrace();
                       }
                   }
           }
    }
}
  • Note: The url in the code is the access path to hdfs set by yourself in the core-site.xml file when configuring hadoop.

  • Call the FileSystem static method get to generate the File System object fs (the static get method has two overloading methods). The first method is used here.

    Insert image description here

    Second way:

    FileSystem fs = FileSystem.get(URl.create(uri),conf);
    
  • Calling the open method of fs returns a FSDataInputStream stream (the open method has two overloading methods). The first method is used here.

    //第一种方式:
    in = fs.open(new Path(uri));
    
    //第二种方式:增加了缓冲区
    in = fs.open(new Path(uri),4096);
    

5. Use packaging method to generate java jar package

Insert image description here

Insert image description here

Insert image description here

6. Verify code correctness

Submit the jar package to HDFS for running (where org.chenqi.hadoop.hdfs.fs.FileSystemCat is the fully qualified name of the custom FileSystemCat class), view the structure: display the contents of the Readme.txt file.

hadoop jar FileSystemCat.jar org.chenqi.hadoop.hdfs.fs.FileSystemCat

Insert image description here

The content is correct: the file we uploaded to hdfs at the beginning

other questions:

Exception in thread “main“ java.lang.UnsupportedClassVersionError

Insert image description here

Solution:

The default jdk version used by eclipse when creating a project is 1.7, but the jdk installed in our Linux system is 1.8. Different versions cause this problem. Just change the jdk version used by the eclipse project:

Insert image description here

  1. Modify Java Build Path

    Right-click the project, select "Properties", select "Java Build Path" -> "Libraries", click to select "JRE System Library", and then click the "Edit" button to edit. Select the jdk version of "Alternate JRE" or "Workspace default JRE" (generally they should be the same), and click "Finish".

    When creating the project, we have chosen to use the jdk version in the system, so generally there will be no problem here

    Insert image description here

    Insert image description here

  2. Modify Java Compiler

Select "Java Compiler", check "Enable project specific settings", and set "Compiler compliance level" to a version consistent with jvm (1.8).

Insert image description here

Guess you like

Origin blog.csdn.net/qq_45491628/article/details/133418812