hadoop Series II --HDFS concept

hdfs Brief

First, it is a file system for storing files through a unified namespace - directory tree to locate the file
Secondly, it is distributed by many servers work together to achieve its functionality, servers in the cluster have their own roles ;

HDFS (Hadoop Distributed File System), realized as Google File System (GFS) is the core Hadoop subprojects project is the basis for distributed computing in the data storage management, demand-based access and process large files and streaming data mode developed to run on inexpensive commodity servers. It has a high fault tolerance, high reliability, scalability, high availability, high throughput, and other characteristics is provided for storing massive data afraid fault, to bring the large data set (Large Data Set) application processing a lot of convenience.

Key Concepts

File cut, copy storage, metadata

hdfs overall operating mechanism

hdfs common features with the file system:
1, the directory structure, the top-level directory is: /
2, is stored in the file system
3, the system can provide file: create, delete, modify, view, movement and other functions

hdfs a difference with the ordinary stand-alone file system:
1, single file stored in the file system, the operating system is a machine of
2, hdfs file system across many machines N
3, stand-alone file system storage file, in a machine of the disk
4, the file stored in the file system hdfs, falls single local file system of n number of machines (hdfs linux is a local file system on the file system)

hdfs Features

(1) HDFS files physically block storage (Block), the block size can be specified by the configuration parameters (dfs.blocksize), the default size is 128M hadoop2.x version, the old version is 64M

(2) HDFS file system client will provide a uniform abstract tree , by the client to access the file path, the form: hdfs: // namenode: port / dir-a / dir-b / dir-c / file .data

(3) the directory structure and the file block information (metadata) borne by the management node namenode

--namenode HDFS cluster is the master node, block chunk information file system maintains hdfs entire directory tree, and each path (file) corresponding to (Block's id, and the server is located datanode)

(4) each block storage management file borne by the node datanode

---- Datanode from HDFS cluster node is, each block may be stored on a plurality Datanode multiple copies (copy number can also be set by the parameter dfs.replication)

(5) HDFS is designed to accommodate a write once, read many of the scenes, and does not support modifying files

(Note: suitable for doing data analysis, do not suitable for network application, because the inconvenience to modify, delay, big network overhead, the cost is too high)

HDFS architecture

HDFS consists of four parts, HDFS Client, the NameNode, DataNodes and Secondary NameNode.
HDFS is a master / slave (Mater / Slave) architecture, HDFS clusters, and some have a NameNode DataNode. NameNode file metadata management system, actual data DataNode storage.

HDFS client : is the client.
1, provides commands to manage and access HDFS, such as start or shut down HDFS.
2, DataNode interacting with reading or writing data; read, to interact with NameNode obtains location information of the file; when written to HDFS tenderer, Client cut into a file of a Block, and then stored.

The NameNode : namely Master, the management node is the entire file system. He maintains a list of data blocks of the entire file system file directory tree, meta-information file / directory and each file corresponding. Receiving a user operation request.
1, is responsible for response to client requests (read and write data request) of
2, maintaining the directory tree structure (metadata management: query, modify)
3, a copy of the application to store configuration and policy
4, cluster management block Load Balancing
added:
Yuan data:
   the FsImage: metadata mirror file (the file system directory tree.)
    edits: operation log metadata (file system for changes made operational records)
    fstime: save the last checkpoint time.
    It is stored in the memory namenode = fsimage + edits.

DataNode : is the Slave. NameNode orders, DataNode perform the actual operation.
File storage services provide real data.
Is different from the ordinary file system, the HDFS, if a file is smaller than the size of a data block, the entire block of data does not take up storage space;
the Replication: multiple replicas. The default is three.

1, stores the actual data block.
2, the data block read / write operations.
Added: file block (block): The most basic unit of storage. For file content, a file size of length is the size, then the file starting from offset 0, a fixed size, the file is divided and the order number, a good division of said each block a Block. HDFS default Block size is 128MB, 256MB file to a total of 256/128 = 2 Block.
Dfs.block.size

Secondary NameNode : not NameNode hot spare. When NameNode hang, it does not immediately replace NameNode and provide services.

Execution: download from NameNode metadata information (fsimage, edits), and the two combined to generate a new FsImage, stored locally and push them to NameNode, replace the old fsimage.
Default node installed on NameNode but so insecure ...

1, auxiliary NameNode, sharing the workload.
2, regular merger fsimage and fsedits, and pushed to NameNode.
3, in an emergency, can assist the recovery NameNode.

hdfs overall analysis

Through the above description we know, hdfs many features:

Save multiple copies and provide fault tolerance, automatic recovery copy is lost or downtime (the default save 3 copies).

Run on inexpensive machines

Suitable for handling large data. HDFS file into the default will block ,.
  Here Insert Picture DescriptionShown, HDFS is in accordance with the Master and Slave configuration as FIG. Points NameNode, SecondaryNameNode, DataNode these roles.
    NameNode: is the Master node, a big leader. Management block mapping; write process client requests; copy of a policy configuration; HDFS namespace management;
    SecondaryNameNode: is a brother, brother namenode sharing workload; NameNode is cold backup; fsimage combined and then sent namenode fsedits .
    DataNode: Slave nodes, slaves had to work. The client is responsible for storing the data blocks sent by block; read and write operations of the data block.
    Hot backup: b is a hot backup, if a broken. Then b immediately run instead of a job.
    Cold backup: b is a cold backup, if a broken. Then b can not be immediately replaced by a work. However, some of the information stored on a, b, reduced loss after a broken.
    fsimage: metadata mirror file (the file system tree.)
    edits: operation log metadata (file system changes made for the recording operation)
    stored in the memory is namenode = fsimage + edits.
    SecondaryNameNode responsible for the timing of the default 1 hour from the NameNode, and edits to obtain fsimage combined and then sent to namenode. Reduce the workload of namenode.

HDFS limitations

1) low-latency data access. Application of user interactivity, the application needs to get a response time of several ms or s's. Because HDFS made designed for high throughput, thus sacrificing the quick response. For low-latency applications, consider using HBase or Cassandra.
  2) a large number of small files. HDFS data block size standard is 64M, and then stored on the HDFS, and the key value of the key block map stored in memory. If too many small files, the memory of that burden will be heavy. Storing small files and the actual storage space will not be wasted, but will undoubtedly increase the metadata on NameNode, a large number of small files will affect the performance of the entire cluster.

Earlier we know, Btrfs is optimized -inline file into small files, small files for good space optimization and optimization of access time.
  3) multi-user write, modify the file. HDFS file can have only one writer, and writes only way to append at the end of the file. It does not support multiple writers, nor does it support after the file write, modify the file to any location. But in the field of big data, analysis of already existing data, the data will not be modified once produced, therefore, HDFS and design limitations of these features and very easy to understand. HDFS large data field data analysis, provides a very important and very basic file storage capabilities.

HDFS reliability assurance measures

1) redundancy

Each file is stored as a series of data blocks (Block). For fault tolerance, all data blocks will have a copy of the file (i.e. copy number replication factor, class configuration) (dfs.replication)

2) a copy of the deposit

Sensing strategy using the rack (Rak-aware) to improve data reliability, availability and utilization of network bandwidth

3) heartbeat

NameNode DataNode periodically from receiving a heartbeat packet every block and reports the cluster, the heartbeat packet is received indicating that the work DataNode

4) Safe Mode

When the system starts, NameNode will enter a safe mode. At this time, the write operation of the data block does not occur.

5) Data integrity detection

HDFS client software realizes the contents of HDFS file checksum (Checksum) check (dfs.bytes-per-checksum)

namenode metadata mechanism

Here Insert Picture Description
namenode part
1.namenode real-time full metadata stored in memory;
2.namenode still image file (dfs.namenode.name.dir) metadata stored in memory at some point in time in the disk, fsimage: metadata mirror file (the file system tree.);
3.namenode will cause changes in the client metadata recorded in the end of the operation edits the log file, and saved to disk;

secondarynamenode periodically download fsimage image and the newly generated edits log from NameNode, then load fsimage image into memory, and then parse edits the file sequentially, memory metadata objects to be modified (integration)
integration is completed, the memory metadata fsimage into a new sequence, and uploads the image file to fsimage Secondary namenode portion
1.secondarynamenode fsimage periodically download the newly generated image and edits log from NameNode
2 fsimage image is then loaded into memory, then edits the file parsing sequence, a memory-mapped metadata objects to be modified (integrated)
3 integration is completed, the memory metadata into a new sequence of fsimage, and this fsimage image file to upload back to namenode, (fsimage in the hard disk).

The above process is called: checkpoint operation
from time to time, will be the secondary namenode all edits accumulated on namenode and a new fsimage downloaded to the local, and loaded into memory for merge (a process called checkpoint) CheckPoint detailed process diagram:
Here Insert Picture Description

Tip: When secondary namenode every time checkpoint operations, need to download the last fsimage image file from the namenode it?
The first checkpoint will need to download, the future will not be downloaded, because of their own machines already have.

HDFS write operation

Here Insert Picture Description
1, the client initiates a request: hadoop fs -put hadoop.tar.gz /

The client how to know which process the request to that node?

Because the client will provide some tools to parse out your specified HDFS cluster master node information who, as well as port number, etc., is mainly determined by the URI,

url: hdfs: // hadoop1: 9000
total size of the uploaded data: the current request will contain a very important message

2, namenode will respond to the request of the client

namenode responsibilities:

1 manage metadata (abstract tree structure)

Users upload the file in the corresponding directory if it exists. So HDFS clusters should do what, and do not deal with

Users upload that file to be stored in the directory does not exist, if there will not be created

2, in response to the request

The real operation: to do a series of verification,

1, check the client request is reasonable
2, check whether the client has permission to upload

3, if namenode returned to the client by the results, that is allowed to upload

namenode returned to the client a list of all nodes corresponding to a plurality of stored copies of data blocks, i.e., the storage location of each block, such as:

file1_blk1 hadoop02,hadoop03,hadoop04
file1_blk2 hadoop03,hadoop04,hadoop05

4, after the data stored in the acquired client to return back namenode multiple copies of all data blocks, operation can be carried out one by one to upload the data block in the order

5, a block of data to be uploaded logic slices

Slice is divided into two stages:

1, planning how cut
2, the real cut

Physical slice: 1 and 2

Logic chips: 1

file1_blk1 : file1:0:128
file1_blk2 : file1:128:256

Logic chips just planning how to cut
  Here Insert Picture Description

6, the first data block start uploading

7, the client will do a series of preparatory operations

1, and sends a request to connect to a corresponding datnaode

pipline : client - node1 - node2 - node3

In the form of a transmission of a packet.

After each transmission of a data packet, a copy of each node will be verified, the same route to the client sequentially

2, the client will start a service:

It is used to wait until the future to the user verifies the information packets transmitted on the data pipeline pipline

The client will be able to know the current from clinet to write node1,2,3 three nodes up all data is written correctly and successfully
8, clinet will be officially put in all this fast packet are written to the corresponding copy of the node

1, block is the largest unit, which is eventually stored on a data granularity DataNodes, dfs.block.size determined by parameters, default version 2.x 128M; Note: This parameter is determined by the client configuration; such as: System .out.println (conf.get ( "dfs.blocksize")); // result is 134217728

2, packet is a unit of medium, which is the data size of the flow from the DFSClient DataNode to dfs.write.packet.size parameter as a reference value, the default is 64K; Note: This parameter reference values, refers to the real data being transmission, the reason it will be adjusted basis, the adjustment is a packet has a specific structure, adjust the size of the packet's goal is to just include all members of the structure, but also ensure that the size of the current block does not written DataNode exceeds the set value;

Such as: System.out.println (conf.get ( "dfs.write.packet.size" )); // result is 65536
. 3, the chunk is a unit of the smallest, which is the data transmission DataNode DFSClient to data validation particle size, determined by io.bytes.per.checksum parameter defaults 512B; Note: the fact that a chunk further includes a check value. 4B, when a packet is written and thus chunk 516B; ratio data 128 and the verification value: 1, the block corresponding to a 1M 128M will have a file checksum;
such as: System.out.println (conf.get ( "io.bytes.per.checksum" )); // result is 512

9, clinet be verified, if the verification passes, indicating that the data block is successfully written

10, three operation 789 is repeated to continue to upload additional data blocks

11, after the client aware of all the data blocks are written to success, namenode will send a feedback, it is to tell namenode current client data has been successfully uploaded.

HDFS read operation

Here Insert Picture Description
Examples of a client calls FileSystem open method, to obtain the file corresponding to the input stream InputStream.

2, through RPC remoting NameNode, obtaining the data block storage location corresponding to this file NameNode, including a copy of the file saving location (mainly DataNode each address).

3, after obtaining the input stream, the client calls the read method of reading data. Select Recent DataNode establish connections and to read data.

4, if the client and in the same machine wherein a DataNode (such as during the MapReduce mapper and the reducer), then it will read the data directly from the local.

5, reaches the end of the block of data, the DataNode close the connection to this, and then re-locate the next data block.

6, constantly performing 2 - Step 5 until all the data read.

7, the client calls the close, closing the input stream DF S InputStream.

Published 44 original articles · won praise 0 · Views 871

Guess you like

Origin blog.csdn.net/heartless_killer/article/details/100673920