Beginner Hadoop-the use of HDFS Java API, install eclipse in linux

The different file systems of Hadoop interact by calling Java API. The Shell command introduced in Experiment 1 is essentially the application of Java API.

Hadoop official Hadoop API documentation, you can visit the following website to view the functions of each API: website link

1. Eclipse installation

To interact with the Java API, you need to use the software Eclipse to write Java programs. Here I used to install eclipse in Ubuntu.

(1) Open the Software Center in the sidebar of Ubuntu
Insert picture description here
Insert picture description here

(2) Enter "ec" in the search bar of the software center, and the software center will automatically search for related software

Insert picture description here

Click on the eclipse software in the picture to install it.

(3) Click the search tool in the sidebar under Ubuntu, enter "eclipse", search for the installed related software, and open Eclipse.
Insert picture description here
Insert picture description here

Two, create a project in eclipse

(1) When you open Eclipse for the first time, you need to fill in the workspace (workspace) to save the location of the program. Here you can follow the default without changing it.
Insert picture description here

Click the "OK" button to enter the Eclipse software.

Insert picture description here

The interface after successful startup.

(2) Create a Java project

Select the "File->New->Java Project" menu to start creating a Java project, and an interface as shown in the figure below will pop up.
Insert picture description here

Enter the project name "HDFSExample" after "Project name" and select "Use default location" to save all the files of this Java project to the "/home/hadoop/workspace/HDFSExample" directory. In the "JRE" tab, you can select the JDK that has been installed in the current Linux system, such as java-8-openjdk-amd64. Then, click the "Next>" button at the bottom of the interface to enter the next setting.

Three, add the required JAR package to the project

Insert picture description here

You need to load the JAR packages needed by the Java project in this interface, and these JAR packages contain Java APIs that can access HDFS. These JAR packages are located in the Hadoop installation directory of the Linux system.

It is in the "/usr/local/hadoop/share/hadoop" directory. Click the "Libraries" tab in the interface, and then click the "Add External
JARs..." button on the right side of the interface .

Insert picture description here

The top row of directory buttons (ie "usr", "local", "hadoop", "share", "hadoop", "mapreduce" and "lib"), when a directory button is clicked, will be listed below Out the contents of the directory.

In order to write a Java application that can interact with HDFS, you generally need to add the following JAR package to the Java project:

(1) hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar under the "/usr/local/hadoop/share/hadoop/common" directory;

(2) All JAR packages in the /usr/local/hadoop/share/hadoop/common/lib" directory;

(3) haoop-hdfs-2.7.1.jar and haoop-hdfs-nfs-2.7.1.jar in the "/usr/local/hadoop/share/hadoop/hdfs" directory;

(4) All JAR packages in the "/usr/local/hadoop/share/hadoop/hdfs/lib" directory.

For example, if you want to add hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar in the "/usr/local/hadoop/share/hadoop/common" directory to the current Java project, you can Click the directory button in the interface to enter the common directory, and then the interface will display all the contents under the common directory, as shown below:

Insert picture description here

In the interface, click with the mouse to select hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar, and then click the "OK" button in the lower right corner of the interface to add these two JAR packages to the current Java In the project, the interface that appears is as follows:

Insert picture description here
It can be seen that hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar have been added to the current Java project. Then, following a similar operation method, you can click the "Add External JARs..." button again to add the remaining JAR packages. It should be noted that when you need to select all the JAR packages in a certain directory, you can use the "Ctrl+A" key combination to select all. After all the additions are complete, you can click the "Finish" button in the lower right corner of the interface to complete the creation of the Java project HDFSExample.

Fourth, write Java application code

Write a Java application to detect whether there is a file
in HDFS: In the "Package Explorer" panel on the left side of the Eclipse work interface, find the project name "HDFSExample" just created, and click the right mouse button on the project name , Select the "New->Class" menu in the pop-up menu.
Insert picture description here

In this interface, you only need to enter the name of the newly created Java class file after "Name", here the name "HDFSFileIfExist" is used, and the default settings can be used for others. Then, click the "Finish" button in the lower right corner of the interface, and the following figure appears Display interface:

Insert picture description here

It can be seen that Eclipse automatically created a source code file named "HDFSFileIfExist.java", and enter the following code in the file:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSFileIfExist {
    
    
    public static void main(String[] args){
    
    
        try{
    
    
            String fileName = "test";
            Configuration conf = new Configuration();
            conf.set("fs.defaultFS", "hdfs://localhost:9000");
            conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
            FileSystem fs = FileSystem.get(conf);
            if(fs.exists(new Path(fileName))){
    
    
                System.out.println("文件存在");
            }else{
    
    
                System.out.println("文件不存在");
            }
        }catch (Exception e){
    
    
            e.printStackTrace();
        }
    }
}

This program is used to test whether a file exists in HDFS.
The String fileName = "test"
line of code indicates that the name of the file to be tested is "test". The full name of the path is not given, which means that a relative path is used. In fact, it is to test the currently logged in Linux system user hadoop. Whether the test file exists in the corresponding user directory in HDFS, that is, test whether the test file exists in the "/user/hadoop/" directory in HDFS.

Five, compile and run the program

Before starting to compile and run the program, make sure that Hadoop has been started and running. If it has not been started, you need to open a Linux terminal and enter the following command to start Hadoop:

cd /usr/local/hadoop
start-dfs.sh

Then you can compile and run the code written in point (4) above. You can directly click the shortcut button to run the program on the upper part of the Eclipse work interface. When you move the mouse to this button, select "Run As" in the pop-up menu, and continue to select "Java Application" in the pop-up menu, as shown below Shown.

Insert picture description here

Then, the interface shown below will pop up:

Insert picture description here

In this interface, you need to enter "HDFSFileIfExist" in the text box under "Select type", and Eclipse will automatically find the corresponding class "HDFSFileIfExist-(default package)" (Note: This class is in the subsequent export JAR package operation Will be used in the Launch configuration), and then click the "OK" button at the bottom right corner of the interface to start running the program. After the program runs, the results of the operation will be displayed in the "Console" panel at the bottom. As there is no test file in the "/user/hadoop" directory of HDFS, the result of the program running is "the file does not exist". At the same time, some warning messages similar to "log4j:WARN..." will be displayed in the "Console" panel, so you can ignore it.

Six, application configuration

That is, how to generate JAR packages from Java applications and deploy them to the Hadoop platform to run. First, create a new directory named myapp under the Hadoop installation directory to store our own Hadoop applications. You can execute the following commands in the Linux terminal:

cd /usr/local/hadoop
mkdir myapp

Insert picture description here
Insert picture description here
Then, in the "Package Explorer" panel on the left side of the Eclipse work interface, click the right mouse button on the project name "HDFSExample" and select "Export" in the pop-up menu, as shown in the following figure.
Insert picture description here

Then, the interface shown below will pop up:

Insert picture description here
In this interface, select "Runnable JAR file", and then click the "Next>" button, an interface as shown in the figure below pops up:
Insert picture description here
In this interface, "Launch configuration" is used to set the generated JAR package to run when it is deployed and started For the main class, you need to select the class "HDFSFileIfExist-HDFSExample" just configured in the drop-down list. In "Export destination", you need to set the directory to which the JAR package should be saved. For example, set it to "/usr/local/hadoop/myapp/HDFSExample.jar" here. Select "Extract required libraries into generated
JAR" under "Library handling ". Then, click the "Finish" button, and the interface as shown in the figure below will appear:
Insert picture description here

You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface to start the packaging process. After the packaging process is over, a warning message interface will appear, as shown in the following figure:

Insert picture description here

You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface. So far, the HDFSExample project has been successfully packaged and generated HDFSExample.jar. You can check the generated HDFSExample.jar file in the Linux system. You can execute the following commands in the Linux terminal:

cd /usr/local/hadoop/myapp
ls

Insert picture description here
As you can see, there is already a HDFSExample.jar file in the "/usr/local/hadoop/myapp" directory. Now, you can use the hadoop jar command to run the program in the Linux system, the command is as follows:

cd /usr/local/hadoop
hadoop jar ./myapp/HDFSExample.jar

Or you can use the following command to run the program:

cd /usr/local/hadoop
java -jar ./myapp/HDFSExample.jar

After the command is executed, the execution result "File does not exist" will be displayed on the screen.

At this point, the program for detecting the existence of HDFS files is successfully deployed.

In the "Package Explorer" panel on the left side of the Eclipse work interface, find the project name "HDFSExample" just created, then right-click on the project name, and select the "New->Class" menu in the pop-up menu.
Insert picture description here

In this interface, you only need to enter the name of the newly created Java class file after "Name", here the name "HDFSFileIfWrite" is used, and the default settings can be used for others. Then, click the "Finish" button in the lower right corner of the interface, and the following figure appears Display interface:

Insert picture description here

It can be seen that Eclipse automatically created a source code file named "HDFSFileIfWrite.java", and enter the following code in the file:

import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
	public class HDFSFileIfWrite {
    
        
        	public static void main(String[] args) {
    
     
			try {
    
    
				Configuration conf = new Configuration();                               conf.set("fs.defaultFS","hdfs://localhost:9000");                                conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
                                FileSystem fs =
FileSystem.get(conf);
                                byte[] buff = "Hello world".getBytes(); // 要写入的内容
                                String filename
= "test"; //要写入的文件名                               
				FSDataOutputStream os = fs.create(new Path(filename));                               
				os.write(buff,0,buff.length);
                                System.out.println("Create:"+ filename);
                                os.close();
                                fs.close();
                        } catch (Exception e) {
    
                                     
				e.printStackTrace();  
                        }  
                }  
        }

Five, compile and run the program

Before starting to compile and run the program, make sure that Hadoop has been started and running. If it has not been started, you need to open a Linux terminal and enter the following command to start Hadoop:

cd /usr/local/hadoop
start-dfs.sh

Then you can compile and run the code written above. You can directly click the shortcut button to run the program on the upper part of the Eclipse work interface. When you move the mouse to the button, select "Run As" in the pop-up menu, and continue to select "Java Application" in the pop-up menu, as shown below Shown.

Insert picture description here

Then, the interface shown below will pop up:

Insert picture description here

In this interface, you need to enter "HDFSFileIfExist" in the text box under "Select type", and Eclipse will automatically find the corresponding class "HDFSFileIfExist-(default package)" (Note: This class is in the subsequent export JAR package operation Will be used in the Launch configuration), and then click the "OK" button at the bottom right corner of the interface to start running the program. After the program runs, the results of the operation will be displayed in the "Console" panel at the bottom (as shown in the figure below):

Insert picture description here

Six, application configuration

That is, how to generate JAR packages from Java applications and deploy them to the Hadoop platform to run. First, create a new directory named myapp under the Hadoop installation directory to store our own Hadoop applications. You can execute the following commands in the Linux terminal:

cd /usr/local/hadoop
mkdir myapp

Insert picture description here
Insert picture description here

Then, in the "Package Explorer" panel on the left side of the Eclipse work interface, click the right mouse button on the project name "HDFSExample" and select "Export" in the pop-up menu, as shown in the following figure.

Insert picture description here

Then, the interface shown below will pop up:

Insert picture description here
In this interface, select "Runnable JAR file", and then click the "Next>" button to pop up the interface as shown in the figure below:
Insert picture description here

In this interface, "Launch configuration" is used to set the main class to run when the generated JAR package is deployed and started. You need to select the class "HDFSFileIfWrite-HDFSExample" just configured in the drop-down list. In "Export destination", you need to set the directory where the JAR package should be exported and saved, for example, set it to "/usr/local/hadoop/myapp/HDFSExample1.jar" here. Select "Extract required libraries into generated
JAR" under "Library handling ". Then, click the "Finish" button, and the interface as shown in the figure below will appear:

Insert picture description here
You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface to start the packaging process. After the packaging process is over, a warning message interface will appear, as shown in the following figure:
Insert picture description here

You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface. So far, the HDFSExample project has been successfully packaged and generated HDFSExample1.jar. You can check the generated HDFSExample1.jar file in the Linux system. You can execute the following commands in the Linux terminal:

cd /usr/local/hadoop/myapp
ls

Insert picture description here

As you can see, there is already a HDFSExample1.jar file in the "/usr/local/hadoop/myapp" directory. Now, you can use the hadoop jar command to run the program in the Linux system, the command is as follows:

cd /usr/local/hadoop
hadoop jar HDFSExample1.jar

Or you can use the following command to run the program:

cd /usr/local/hadoop
java -jar ./myapp/HDFSExample.jar

After the command is executed, the execution result Create:test will be displayed on the screen.
So far, the program for writing HDFS files has been successfully deployed.

Read the file
Create a new Java class file, name it "HDFSFileIfRead" and enter the following code:



 import java.io.BufferedReader;

        import java.io.InputStreamReader;

 

        import org.apache.hadoop.conf.Configuration;

        import org.apache.hadoop.fs.FileSystem;

        import org.apache.hadoop.fs.Path;

        import org.apache.hadoop.fs.FSDataInputStream;

 

        public class HDFSFileIfRead{
    
                    
        	public static void main(String[] args) {
    
    
                        try {
    
    
                                Configuration
				conf = new Configuration();                                conf.set("fs.defaultFS","hdfs://localhost:9000");                          	conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
                                FileSystem fs =
				FileSystem.get(conf);
                                Path file = new Path("test"); 
                                FSDataInputStream getIt =
				fs.open(file);
                                BufferedReader
				d = new BufferedReader(new InputStreamReader(getIt));
                                String content
= d.readLine(); //读取文件一行
                                System.out.println(content);
                                d.close(); //关闭文件
                                fs.close(); //关闭hdfs
                        } catch (Exception e) {
    
                                   
				e.printStackTrace();
                        }
                }
        }

Insert picture description here

Compile and run in eclipse:

Insert picture description here

Package this java program into a jar package and deploy it to run on the Hadoop platform.

Insert picture description here

Enter "/usr/local/hadoop/myapp" to see if there is a corresponding jar package:

Insert picture description here

Run the jar package with the hadoop command in the terminal:

Insert picture description here
This article is mainly based on the experimental tutorial of teacher Lin Ziyu when I was learning Hadoop, and it was compiled by myself and practiced.

Guess you like

Origin blog.csdn.net/qq_45154565/article/details/109181753