The different file systems of Hadoop interact by calling Java API. The Shell command introduced in Experiment 1 is essentially the application of Java API.
Hadoop official Hadoop API documentation, you can visit the following website to view the functions of each API: website link
1. Eclipse installation
To interact with the Java API, you need to use the software Eclipse to write Java programs. Here I used to install eclipse in Ubuntu.
(1) Open the Software Center in the sidebar of Ubuntu
(2) Enter "ec" in the search bar of the software center, and the software center will automatically search for related software
Click on the eclipse software in the picture to install it.
(3) Click the search tool in the sidebar under Ubuntu, enter "eclipse", search for the installed related software, and open Eclipse.
Two, create a project in eclipse
(1) When you open Eclipse for the first time, you need to fill in the workspace (workspace) to save the location of the program. Here you can follow the default without changing it.
Click the "OK" button to enter the Eclipse software.
The interface after successful startup.
(2) Create a Java project
Select the "File->New->Java Project" menu to start creating a Java project, and an interface as shown in the figure below will pop up.
Enter the project name "HDFSExample" after "Project name" and select "Use default location" to save all the files of this Java project to the "/home/hadoop/workspace/HDFSExample" directory. In the "JRE" tab, you can select the JDK that has been installed in the current Linux system, such as java-8-openjdk-amd64. Then, click the "Next>" button at the bottom of the interface to enter the next setting.
Three, add the required JAR package to the project
You need to load the JAR packages needed by the Java project in this interface, and these JAR packages contain Java APIs that can access HDFS. These JAR packages are located in the Hadoop installation directory of the Linux system.
It is in the "/usr/local/hadoop/share/hadoop" directory. Click the "Libraries" tab in the interface, and then click the "Add External
JARs..." button on the right side of the interface .
The top row of directory buttons (ie "usr", "local", "hadoop", "share", "hadoop", "mapreduce" and "lib"), when a directory button is clicked, will be listed below Out the contents of the directory.
In order to write a Java application that can interact with HDFS, you generally need to add the following JAR package to the Java project:
(1) hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar under the "/usr/local/hadoop/share/hadoop/common" directory;
(2) All JAR packages in the /usr/local/hadoop/share/hadoop/common/lib" directory;
(3) haoop-hdfs-2.7.1.jar and haoop-hdfs-nfs-2.7.1.jar in the "/usr/local/hadoop/share/hadoop/hdfs" directory;
(4) All JAR packages in the "/usr/local/hadoop/share/hadoop/hdfs/lib" directory.
For example, if you want to add hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar in the "/usr/local/hadoop/share/hadoop/common" directory to the current Java project, you can Click the directory button in the interface to enter the common directory, and then the interface will display all the contents under the common directory, as shown below:
In the interface, click with the mouse to select hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar, and then click the "OK" button in the lower right corner of the interface to add these two JAR packages to the current Java In the project, the interface that appears is as follows:
It can be seen that hadoop-common-2.7.1.jar and haoop-nfs-2.7.1.jar have been added to the current Java project. Then, following a similar operation method, you can click the "Add External JARs..." button again to add the remaining JAR packages. It should be noted that when you need to select all the JAR packages in a certain directory, you can use the "Ctrl+A" key combination to select all. After all the additions are complete, you can click the "Finish" button in the lower right corner of the interface to complete the creation of the Java project HDFSExample.
Fourth, write Java application code
Write a Java application to detect whether there is a file
in HDFS: In the "Package Explorer" panel on the left side of the Eclipse work interface, find the project name "HDFSExample" just created, and click the right mouse button on the project name , Select the "New->Class" menu in the pop-up menu.
In this interface, you only need to enter the name of the newly created Java class file after "Name", here the name "HDFSFileIfExist" is used, and the default settings can be used for others. Then, click the "Finish" button in the lower right corner of the interface, and the following figure appears Display interface:
It can be seen that Eclipse automatically created a source code file named "HDFSFileIfExist.java", and enter the following code in the file:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class HDFSFileIfExist {
public static void main(String[] args){
try{
String fileName = "test";
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(fileName))){
System.out.println("文件存在");
}else{
System.out.println("文件不存在");
}
}catch (Exception e){
e.printStackTrace();
}
}
}
This program is used to test whether a file exists in HDFS.
The String fileName = "test"
line of code indicates that the name of the file to be tested is "test". The full name of the path is not given, which means that a relative path is used. In fact, it is to test the currently logged in Linux system user hadoop. Whether the test file exists in the corresponding user directory in HDFS, that is, test whether the test file exists in the "/user/hadoop/" directory in HDFS.
Five, compile and run the program
Before starting to compile and run the program, make sure that Hadoop has been started and running. If it has not been started, you need to open a Linux terminal and enter the following command to start Hadoop:
cd /usr/local/hadoop
start-dfs.sh
Then you can compile and run the code written in point (4) above. You can directly click the shortcut button to run the program on the upper part of the Eclipse work interface. When you move the mouse to this button, select "Run As" in the pop-up menu, and continue to select "Java Application" in the pop-up menu, as shown below Shown.
Then, the interface shown below will pop up:
In this interface, you need to enter "HDFSFileIfExist" in the text box under "Select type", and Eclipse will automatically find the corresponding class "HDFSFileIfExist-(default package)" (Note: This class is in the subsequent export JAR package operation Will be used in the Launch configuration), and then click the "OK" button at the bottom right corner of the interface to start running the program. After the program runs, the results of the operation will be displayed in the "Console" panel at the bottom. As there is no test file in the "/user/hadoop" directory of HDFS, the result of the program running is "the file does not exist". At the same time, some warning messages similar to "log4j:WARN..." will be displayed in the "Console" panel, so you can ignore it.
Six, application configuration
That is, how to generate JAR packages from Java applications and deploy them to the Hadoop platform to run. First, create a new directory named myapp under the Hadoop installation directory to store our own Hadoop applications. You can execute the following commands in the Linux terminal:
cd /usr/local/hadoop
mkdir myapp
Then, in the "Package Explorer" panel on the left side of the Eclipse work interface, click the right mouse button on the project name "HDFSExample" and select "Export" in the pop-up menu, as shown in the following figure.
Then, the interface shown below will pop up:
In this interface, select "Runnable JAR file", and then click the "Next>" button, an interface as shown in the figure below pops up:
In this interface, "Launch configuration" is used to set the generated JAR package to run when it is deployed and started For the main class, you need to select the class "HDFSFileIfExist-HDFSExample" just configured in the drop-down list. In "Export destination", you need to set the directory to which the JAR package should be saved. For example, set it to "/usr/local/hadoop/myapp/HDFSExample.jar" here. Select "Extract required libraries into generated
JAR" under "Library handling ". Then, click the "Finish" button, and the interface as shown in the figure below will appear:
You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface to start the packaging process. After the packaging process is over, a warning message interface will appear, as shown in the following figure:
You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface. So far, the HDFSExample project has been successfully packaged and generated HDFSExample.jar. You can check the generated HDFSExample.jar file in the Linux system. You can execute the following commands in the Linux terminal:
cd /usr/local/hadoop/myapp
ls
As you can see, there is already a HDFSExample.jar file in the "/usr/local/hadoop/myapp" directory. Now, you can use the hadoop jar command to run the program in the Linux system, the command is as follows:
cd /usr/local/hadoop
hadoop jar ./myapp/HDFSExample.jar
Or you can use the following command to run the program:
cd /usr/local/hadoop
java -jar ./myapp/HDFSExample.jar
After the command is executed, the execution result "File does not exist" will be displayed on the screen.
At this point, the program for detecting the existence of HDFS files is successfully deployed.
In the "Package Explorer" panel on the left side of the Eclipse work interface, find the project name "HDFSExample" just created, then right-click on the project name, and select the "New->Class" menu in the pop-up menu.
In this interface, you only need to enter the name of the newly created Java class file after "Name", here the name "HDFSFileIfWrite" is used, and the default settings can be used for others. Then, click the "Finish" button in the lower right corner of the interface, and the following figure appears Display interface:
It can be seen that Eclipse automatically created a source code file named "HDFSFileIfWrite.java", and enter the following code in the file:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.Path;
public class HDFSFileIfWrite {
public static void main(String[] args) {
try {
Configuration conf = new Configuration(); conf.set("fs.defaultFS","hdfs://localhost:9000"); conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs =
FileSystem.get(conf);
byte[] buff = "Hello world".getBytes(); // 要写入的内容
String filename
= "test"; //要写入的文件名
FSDataOutputStream os = fs.create(new Path(filename));
os.write(buff,0,buff.length);
System.out.println("Create:"+ filename);
os.close();
fs.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Five, compile and run the program
Before starting to compile and run the program, make sure that Hadoop has been started and running. If it has not been started, you need to open a Linux terminal and enter the following command to start Hadoop:
cd /usr/local/hadoop
start-dfs.sh
Then you can compile and run the code written above. You can directly click the shortcut button to run the program on the upper part of the Eclipse work interface. When you move the mouse to the button, select "Run As" in the pop-up menu, and continue to select "Java Application" in the pop-up menu, as shown below Shown.
Then, the interface shown below will pop up:
In this interface, you need to enter "HDFSFileIfExist" in the text box under "Select type", and Eclipse will automatically find the corresponding class "HDFSFileIfExist-(default package)" (Note: This class is in the subsequent export JAR package operation Will be used in the Launch configuration), and then click the "OK" button at the bottom right corner of the interface to start running the program. After the program runs, the results of the operation will be displayed in the "Console" panel at the bottom (as shown in the figure below):
Six, application configuration
That is, how to generate JAR packages from Java applications and deploy them to the Hadoop platform to run. First, create a new directory named myapp under the Hadoop installation directory to store our own Hadoop applications. You can execute the following commands in the Linux terminal:
cd /usr/local/hadoop
mkdir myapp
Then, in the "Package Explorer" panel on the left side of the Eclipse work interface, click the right mouse button on the project name "HDFSExample" and select "Export" in the pop-up menu, as shown in the following figure.
Then, the interface shown below will pop up:
In this interface, select "Runnable JAR file", and then click the "Next>" button to pop up the interface as shown in the figure below:
In this interface, "Launch configuration" is used to set the main class to run when the generated JAR package is deployed and started. You need to select the class "HDFSFileIfWrite-HDFSExample" just configured in the drop-down list. In "Export destination", you need to set the directory where the JAR package should be exported and saved, for example, set it to "/usr/local/hadoop/myapp/HDFSExample1.jar" here. Select "Extract required libraries into generated
JAR" under "Library handling ". Then, click the "Finish" button, and the interface as shown in the figure below will appear:
You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface to start the packaging process. After the packaging process is over, a warning message interface will appear, as shown in the following figure:
You can ignore the information on this interface and directly click the "OK" button in the lower right corner of the interface. So far, the HDFSExample project has been successfully packaged and generated HDFSExample1.jar. You can check the generated HDFSExample1.jar file in the Linux system. You can execute the following commands in the Linux terminal:
cd /usr/local/hadoop/myapp
ls
As you can see, there is already a HDFSExample1.jar file in the "/usr/local/hadoop/myapp" directory. Now, you can use the hadoop jar command to run the program in the Linux system, the command is as follows:
cd /usr/local/hadoop
hadoop jar HDFSExample1.jar
Or you can use the following command to run the program:
cd /usr/local/hadoop
java -jar ./myapp/HDFSExample.jar
After the command is executed, the execution result Create:test will be displayed on the screen.
So far, the program for writing HDFS files has been successfully deployed.
Read the file
Create a new Java class file, name it "HDFSFileIfRead" and enter the following code:
import java.io.BufferedReader;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
public class HDFSFileIfRead{
public static void main(String[] args) {
try {
Configuration
conf = new Configuration(); conf.set("fs.defaultFS","hdfs://localhost:9000"); conf.set("fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs =
FileSystem.get(conf);
Path file = new Path("test");
FSDataInputStream getIt =
fs.open(file);
BufferedReader
d = new BufferedReader(new InputStreamReader(getIt));
String content
= d.readLine(); //读取文件一行
System.out.println(content);
d.close(); //关闭文件
fs.close(); //关闭hdfs
} catch (Exception e) {
e.printStackTrace();
}
}
}
Compile and run in eclipse:
Package this java program into a jar package and deploy it to run on the Hadoop platform.
Enter "/usr/local/hadoop/myapp" to see if there is a corresponding jar package:
Run the jar package with the hadoop command in the terminal:
This article is mainly based on the experimental tutorial of teacher Lin Ziyu when I was learning Hadoop, and it was compiled by myself and practiced.