Hadoop---Introduction to HDFS

The full name of HDFS is Hadoop Distributed File System hadoop distributed (cluser) file storage system. It is suitable for scenarios where write once and read many times.

HDFS does not need to be installed separately. The HDFS system is included when Hadoop is installed.

Hadoop installation can refer to: 

  1. There is a basic, already installed virtual machine:  Hadoop installation
  2. There is no foundation and no virtual machine installed:  Hadoop cluster installation

Advantages and disadvantages of HDFS:

  • Advantages: high fault tolerance, suitable for processing large data, can be built on cheap machines
  • Disadvantages: Not suitable for low-latency data access; cannot efficiently store a large number of small files, and does not support concurrent writing and data modification.

HDFS file block size:

The files in HDFS are physically stored in blocks, and the block size can be specified through the configuration parameter (dfs.blocksize). The default size is 128M in Hadoop2.x/3.x version, and 64M in 1.x version

Note: When the addressing time is 1% of the transmission time, it is the best state

Thinking: Why can't the block size be set too small or too large?
(1) If the HDFS block setting is too small, it will increase the seek time.
(2) If the block is too large, the time to transfer data from the disk will be significantly longer than the time required to locate the start position of this block. When the program processes block data, it will be very slow.
Summary: The HDFS block size setting mainly depends on the transfer rate of the disk.

HDFS structure: 

HDFS is a master/slave architecture (classic Master and Slave architecture).

HDFS consists of four parts, HDFS Client, NameNode, DataNode and Secondary NameNode.

Each HDFS cluster includes a NameNode and multiple DataNodes

1. Client client 

The file system is accessed through the Client, and then the Client communicates with the NameNode and DataNode. Client acts as an interface to the file system

  • File segmentation, when the file is uploaded to HDFS, the file is divided into data blocks (Block) for storage
  • Interact with NameNode to get file location
  • Interact with DataNodes to read or write data. Store the actual data blocks and perform database read and write operations
  • Client provides some commands to manage HDFS, such as NameNode formatting
  • Client can access HDFS through some commands, such as adding, deleting, modifying and querying HDFS

2, NameNode ( nn ) 

Equivalent to a Master, a manager. It is used to store and manage file metadata, maintain the directory structure tree of the file system, and record the correspondence between each data block (Block) written and its belonging file.

  • Manage HDFS namespaces
  • Configure replica policy
  • Mapping information for managing data blocks
  • Handle client read and write requests

3.dataNode 

DataNode will maintain communication with NameNode through heartbeat. DataNode is responsible for storing the data (block block) of the file, providing block reading and writing, and regularly reporting the data block information stored by the DataNode to the NameNode

4、Secondary NameNode

The role of the Secondary NameNode is to consume EditsLog, periodically merge FsImage and EditsLog, generate a new FsImage file, and push it to NameNode, reducing the pressure on NameNode. In an emergency, it can assist in recovering the NameNode.

SecondaryNameNode mechanism:

  1. SecondaryNameNode is not the standby node where NameNode hangs up
  2. His main function is to merge logs regularly to prevent log files from becoming too large
  3. The merged image file will also be saved on the NameNode

SecondaryNameNode working process:

  1. The SecondaryNameNode initiates a synchronization request to the NameNode, and the NameNode will write the logs to the new log at this time
  2. SecondaryNameNode downloads image files + log files to NameNode
  3. SecondaryNameNode starts to merge these two files and generates a new image file
  4. SecondaryNameNode returns a new image file to NameNode
  5. The NameNode file replaces the new mirror and log files with the ones currently in use

Note: 

1. FsImage (file system image binary)
  stores NameNode image data at a certain point in time (checkPoint).
  Default storage location: /opt/install/hadoop-2.5.2/data/tmp/dfs/name
  dfs.namenode.name.dir
   
 2. EditsLog 
   can edit log binary records (all write operations after checkpoint)
   default storage location: dfs.namenode.edits.dir 

HDFS high availability design: 

Data storage fault tolerance: 

Disks may become out of order while storing data. HDFS calculates and stores checksums (CkeckSum) for data blocks stored on DataNodes. When reading data, recalculate the checksum of the read data. If the checksum is incorrect, an exception will be thrown. After the application catches the exception, it will go to other DataNodes to read the backup data.

Disk failure tolerance:

If the DataNode detects that the local disk is damaged, it will report the BlockID stored on the disk to the NameNode, and the NameNode will check the backup of these data blocks, notify the corresponding DataNode service, and restore the corresponding data to other servers to ensure data block backup The number meets the requirements.

DataNode fault tolerance:

The DataNode will maintain communication with the NameNode through heartbeats. If the DataNode does not send a heartbeat after a timeout, the NameNode will consider the DataNode to be down, and immediately search for the data blocks on the DataNode and the servers where these data blocks are located, and then notify these servers to make another copy Data to other servers to ensure that the number of data block backups stored in HDFS meets the requirements.

NameNode fault tolerance: 

NameNode is the core of the entire HDFS, recording the allocation information of all files, as well as all file paths and data block storage information. If the NameNode fails, the entire HDFS system cluster cannot be used. If the data of the NameNode is lost, all the DataNode data of the entire cluster will be useless. Therefore, the NameNode adopts a master-slave hot standby method to provide high availability services. As shown below: 

HDFS read and write process:

Write process: 

  1. Request upload: the client communicates with the namenode, requesting to upload files
  2. Namenode judges whether it can be uploaded: namenode checks whether the user has upload permission, whether the target file exists, and whether the parent directory exists 
  3. File segmentation: the client divides the file into blocks of size 0~128M (logical segmentation)
  4. The client requests the storage location of the block block
  5. namenode returns datanode address dn1, dn2, dn3
  6. The client requests dn1 to upload data through the FSDataOutputStream module, and establishes a connection pipeline (essentially an RPC call, establishing a pipeline)
  7. When dn1 receives the request, it will continue to call dn2, and dn2 will call dn3 to complete the establishment of the entire communication pipeline, and then return to the client step by step, that is, the ack check in the figure
  8. The client starts to upload the first block to dn1 (first read the data from the disk and put it in a local memory cache), taking Packet as the unit (default 64k), dn1 receives a Packet and will pass it to dn2, and dn2 will pass it to dn3; dn1 Every time a packet is transmitted, it will be put into a response queue to wait for the response
  9. When a Block transmission is completed, the client requests the NameNode to upload the server of the second Block again (repeat steps 4-8)
  10. After the transmission is completed, the client closes the stream resource and tells hdfs that the data transmission is complete, and then hdfs restores the metadata after receiving the transmission

Read process: 

  1. Client initiates an RPC request to NameNode to determine the location of the requested file block;
  2. NameNode will return part or all of the block list of the file as appropriate, and for each block, NameNode will return the address of the DataNode containing the copy of the block;
  3. The returned DataNode addresses will obtain the distance between the DataNode and the client according to the cluster topology, and then sort them according to two rules: in the network topology, the one closest to the Client is ranked first; in the heartbeat mechanism, the state of the DataNode reported overtime is STALE , so that the row is behind;
  4. Client selects the top-ranked DataNode to read the block. If the client itself is a DataNode, it will directly obtain data from the local; the underlying essence is to establish a Socket Stream (FSDataInputStream), and repeatedly call the read method of the parent class DataInputStream until this The data on the block has been read;
  5. After reading the blocks in the list, if the file reading is not over yet, the client will continue to obtain the next batch of block lists from the NameNode;
  6. After reading a block, checksum verification will be performed. If an error occurs when reading a DataNode, the client will notify the NameNode, and then continue reading from the next DataNode that has a copy of the block.
  7. The read method reads the block information in parallel, not one by one; the NameNode only returns the DataNode address that the Client requests to contain the block, and does not return the data of the requested block;
  8. In the end, all blocks read will be merged into a complete final file. 

 HDFS storage model: 

  1. The file is linearly cut into blocks by bytes, with offset and id

  2. Except for the last block in a file, the other blocks have the same size

  3. The block size is adjusted according to the I/O characteristics of the hardware

  4. The block is scattered among the nodes of the cluster, with location

  5. The block has a copy (replication), there is no master-slave concept, and the copy cannot appear on the same node

  6. Replicas are the key to satisfying reliability and performance

  7. File upload can specify the block size and number of copies, and only the number of copies can be modified after uploading

  8. Write once and read multiple times, does not support modification, only supports appending data 

Guess you like

Origin blog.csdn.net/zhoushimiao1990/article/details/131231731