HDFS, the Hadoop Distributed File System is referred to, is an implementation of abstract Hadoop file system. Hadoop abstract file system, Amazon S3 integrated with the local system and the like, or even may be operated by Web protocols (webhsfs). HDFS file on the cluster distribution machine, while providing a copy of the fault tolerance and reliability assurance. Such as client read files directly write operations are distributed among the cluster of machines, there is no single point of pressure performance.
HDFS related structures can look at my previous blog post, we mainly in terms of how to operate under today's api and hdfs hdfs command line,
Java operating within HDFS need to configure the warehouse
<repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url> </repository> </repositories> // guide package <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency>
Example: Create a directory from api
this.configuration = new Configuration(); this.fileSystem = FileSystem.get(new URI(this.HDFS_PATH),configuration,"hadoop"); Path path = new Path("/hdfsapi/test"); boolean result = fileSystem.mkdirs(path);
By reading the file API, local written back
Path path = new Path("/gwyy.txt"); FSDataInputStream fsDataInputStream = fileSystem.open(path); FileOutputStream fileOutputStream = new FileOutputStream(new File("a.txt")); byte[] buffer = new byte[1024]; int length = 0; StringBuffer sb = new StringBuffer(); while( ( length = fsDataInputStream.read(buffer)) != -1) { sb.append(new String(buffer,0,buffer.length)); fileOutputStream.write(buffer,0,buffer.length); } System.out.println(sb.toString());
HDFS creates the file and write the contents
FSDataOutputStream out = fileSystem.create(new Path("/fuck.txt")); out.writeUTF("aaabbb"); out.flush(); out.close();
HDFS same name
boolean a = fileSystem.rename(new Path("/fuck.txt"),new Path("/fuck.aaa")); System.out.println(a);
HDFS file copy
fileSystem.copyFromLocalFile(new Path("a.txt"),new Path("/copy_a.txt"));
HDFS upload large files
InputStream in = new BufferedInputStream(new FileInputStream(new File("hive-1.1.0-cdh5.15.1.tar.gz"))); Path dst = new Path("/hive.tar.gz"); // display a progress bar FSDataOutputStream out = fileSystem.create(dst, new Progressable() { @Override public void progress() { System.out.flush(); System.out.print('.'); } }); byte[] buffer = new byte[4096]; int length = 0; // write to hdfs while((length = in.read(buffer,0,buffer.length)) != -1) { out.write(buffer,0,buffer.length); }
HDFS file download
fileSystem.copyToLocalFile(new Path("/fuck.aaa"),new Path("./"));
HDFS lists all files
FileStatus[] fileStatuses = fileSystem.listStatus(new Path("/")); for (FileStatus f:fileStatuses) { System.out.println(f.getPath()); }
HDFS file recursively list
RemoteIterator<LocatedFileStatus> remoteIterator = fileSystem.listFiles(new Path("/"),true); while(remoteIterator.hasNext()) { LocatedFileStatus file = remoteIterator.next(); System.out.println(file.getPath()); }
HDFS view the file block
FileStatus fileStatus = fileSystem.getFileStatus(new Path("/jdk-8u221-linux-x64.tar.gz")); BlockLocation[] blockLocations = fileSystem.getFileBlockLocations(fileStatus,0,fileStatus.getLen()); // Check Block for (BlockLocation b:blockLocations) { for (String name:b.getNames()) { System.out.println(name + b.getOffset() + b.getLength()); } }
HDFS delete files
If the path is a directory and set to * If true, then delete the directory, otherwise an exception is thrown. * For the file, the recursive can be set to true or false. boolean a = fileSystem.delete(new Path("/gwyy.txt"),true); System.out.println(a);
Here we introduce HDFS command line operation
View hdfs document root
Hadoop fs -Ls /
Hdfs upload files to the root directory
hadoop fs -put gwyy.txt /
From a local copy of the file to hdfs
hf -copyFromLocal xhc.txt /
#### hdfs local files to delete the file from the local mobile hf -moveFromLocal a.txt /
View the file contents
hadoop fs -cat /gwyy.txt hadoop fs -text /gwyy.txt
From there take the file to a local hdfs
hadoop fs -get /a.txt ./
HDFS create folders
hadoop fs -mkdir /hdfs-test
From A folder to folder B
hadoop fs -mv /a.txt /hdfs-test/a.txt
File copy operation
hadoop fs -cp /hdfs-test/a.txt /hdfs-test/a.txt.back
Merge multiple files together to lead out
hadoop fs -getmerge /hdfs-test ./t.txt
Delete a file
HF -rm /hdfs-test/a.txt.back
To delete a directory
hadoop fs -rmdir / hdfs-test can only delete empty directories hadoop fs -rm -r / hdfs-test whether or not to delete the directory delete everything