Hadoop from entry to master series--5. HDFS API

table of Contents

A client environment

1.1 Configure environment variables

1.2 eclipse / IDEA preparation

Two specific API operations of HDFS

2.1 Create HDFS client objects and test create folders

2.2 Test file upload

2.3 Test file download

2.4 Test delete folder

2.5 Test view file details

2.6 Determine whether it is a folder or a file


A client environment

The previous blog described the HDFS shell operation, review: use bin / hadoop fs-command or bin / hdfs dfs-command, the command is basically the same as the Linux command. Way, but there is another way, that is, the client mode, which is to use code operation, the idea of ​​using the client operation is also very simple, in fact, you need to get a client object, and operate the HDFS cluster through the object and the encapsulated method. It ’s okay to understand, continue to look down.

1.1 Configure environment variables

Configure the environment variables of your local machine, as shown below:

In fact, it is almost the same as configuring the environment variables of the JDK. First configure a HOME, and then add HADOOP to the path. After the configuration, enter hadoop -version test in the cmd window. If the appearance is similar to the following figure, configure the hadoop results.

1.2 eclipse / IDEA preparation

In fact, eclipse is similar to IDEA, but many people will find IDEA easy to use. The main reason is that most of those people are working. IDEA code hints and debugging are really better than eclipse, but IDEA runs take up more resources than eclipse. If the computer's memory does not have more than 8g, IDEA is really not recommended, and IDEA still needs to perform a series of cracks, it is not impossible to crack, I am a person who likes to crack software, but the cracked software is indeed less stable, daily learning eclipse is enough Of course IDEA is better.

  • Create a new maven project in eclipse, and add dependencies in the project's pom.xml file. Needless to say about maven.
<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>RELEASE</version>
		</dependency>
		<dependency>
			<groupId>org.apache.logging.log4j</groupId>
			<artifactId>log4j-core</artifactId>
			<version>2.8.2</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>2.7.2</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>2.7.2</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>2.7.2</version>
		</dependency>
		<dependency>
			<groupId>jdk.tools</groupId>
			<artifactId>jdk.tools</artifactId>
			<version>1.8</version>
			<scope>system</scope>
			<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
		</dependency>
</dependencies>

The above configuration is generally known, junit test, hadoop, here is a hint if the import error is reported, because the default jdk of maven is 1.5, plus the last paragraph above to specify jdk1.8.

  • There is another problem: If the following information is displayed on the eclipse / IDEA console. In the src / main / resources directory, create a new file with the name log4j.properties. The content added by this file is shown in the following figure:
1.log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).  
2.log4j:WARN Please initialize the log4j system properly.  
3.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

In this way, eclipse / IDEA can print the log correctly.

Two specific API operations of HDFS

Before specifically starting the operation, first sort out the ideas: What are the operations for operating HDFS? It's nothing more than viewing, uploading, downloading, etc., how to do it? First, there must be a client object, and secondly, there must be a packaged method, and then use the object name method to complete the operation of the cluster, the entire process should be like this. After sorting out the ideas, programming is easy.

2.1 Create HDFS client objects and test create folders

HDFS is a file system. You can operate the cluster by creating a file system object. Explain the code. It is mainly created by the FileSystem.get () method. The method requires three parameters, which are 1. HDFS cluster address new URI ("hdfs: // hadoop102: 9000"), 2. configuration, 3. user name. Write according to the situation of your own cluster. When you have these parameters, a file system object will create fs. Call the fs.mkdirs () method to create a directory on the cluster. Finally, close this object.

Summary: Remember that there are only three steps to operate the HDFS cluster: 1. Get the object (FileSystem.get) 2. Use the object operation 3. Close the resource

public class HdfsClient{	
@Test
public void testMkdirs() throws IOException, InterruptedException, URISyntaxException{
		
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei");	
	// 2 创建目录
	fs.mkdirs(new Path("/1108/daxian/banzhang"));
	// 3 关闭资源
	fs.close();
	}
}

2.2 Test file upload

Follow the same three steps as above, only need to consider the specific operation method. The upload is copyFromLocalFile, is it the same as the shell operation.

@Test
public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	configuration.set("dfs.replication", "2");
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei");
	// 2 上传文件
	fs.copyFromLocalFile(new Path("e:/banzhang.txt"), new Path("/banzhang.txt"));
	// 3 关闭资源
	fs.close();
}

2.3 Test file download

@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei");
	// 2 执行下载操作
	// boolean delSrc 指是否将原文件删除
	// Path src 指要下载的文件路径
	// Path dst 指将文件下载到的路径
	// boolean useRawLocalFileSystem 是否开启文件校验
	fs.copyToLocalFile(false, new Path("/banzhang.txt"), new Path("e:/banhua.txt"), true);	
	// 3 关闭资源
	fs.close();
}

2.4 Test delete folder

@Test
public void testDelete() throws IOException, InterruptedException, URISyntaxException{

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei");
	// 2 执行删除
	fs.delete(new Path("/0508/"), true);	
	// 3 关闭资源
	fs.close();
}

2.5 Test view file details

To view the file details, you still need to explain. The method used is listFiles, which returns an iterator, so to traverse the iterator, you can view the file length, name, permissions and other information. If you want to view the stored block information, use getBlockLocations gets the array object, and iterate through the array object to view the block storage information, which is a little more complicated.

@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException{

	// 1获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei"); 
	// 2 获取文件详情
	RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);	
	while(listFiles.hasNext()){
		LocatedFileStatus status = listFiles.next();
		// 输出详情
		// 文件名称
		System.out.println(status.getPath().getName());
		// 长度
		System.out.println(status.getLen());
		// 权限
		System.out.println(status.getPermission());
		// 分组
		System.out.println(status.getGroup());
			
		// 获取存储的块信息
		BlockLocation[] blockLocations = status.getBlockLocations();
			
		for (BlockLocation blockLocation : blockLocations) {
				
			// 获取块存储的主机节点
			String[] hosts = blockLocation.getHosts();
				
			for (String host : hosts) {
				System.out.println(host);
			}
		}
			
		System.out.println("---------------------");
	}

        // 3 关闭资源
        fs.close();
}

2.6 Determine whether it is a folder or a file

@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
		
	// 1 获取文件配置信息
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "wanglei");
		
	// 2 判断是文件还是文件夹
	FileStatus[] listStatus = fs.listStatus(new Path("/"));
		
	for (FileStatus fileStatus : listStatus) {
		
		// 如果是文件
		if (fileStatus.isFile()) {
				System.out.println("f:"+fileStatus.getPath().getName());
			}else {
				System.out.println("d:"+fileStatus.getPath().getName());
			}
		}
		
	// 3 关闭资源
	fs.close();
}

Today, time is limited, I only write here.

 

 

 

 

 

Published 111 original articles · Like 60 · 70,000 + views

Guess you like

Origin blog.csdn.net/Haidaiya/article/details/84946742