An introduction to vSAN, for everyone

An introduction to vSAN, for everyone

https://mp.weixin.qq.com/s?__biz=MzUxODgwOTkyMQ==&mid=2247484078&idx=1&sn=2d793bf44a822315a647f001c7b9e628&chksm=f9827280cef5fb96a6be6a48dc83337187a20ac78678067c08bfd8fc3d5aa1376642adf27a21&scene=21#wechat_redirect

I have always wanted to take the time to write about vSAN. I heard about vSAN for the first time in 15 years. At that time, VMware also promoted vSAN as the best partner of VDI. In just two years, vSAN has gone through multiple versions. The update iteration of VDI has greatly improved both in terms of function and stability, and the most extensive application has also changed from VDI to carrying core business.

 

This article summarizes some of the author's learning and use experience of vSAN, and briefly introduces vSAN, hoping to introduce the architecture, advantages and points of attention of vSAN with the least text.

 

The following content only represents personal views, please correct me if you make any mistakes.

 


 

 

01

Virtualized storage

 

In the NSX series of articles  NSX From Entry to Proficiency (2): Introduction to NSX-Network Engineers  , a brief introduction to virtualization and vSphere. This article starts with this background and introduces storage.

 

The important thing for enterprises is data, and the device that carries the data is the storage device (Storage).

 

image

 

Similar to personal computers, enterprise-level storage devices generally consist of multiple hard drives and RAID cards. Through the RAID card, multiple disks can be formed into a logical array, so that data is scattered and stored in multiple disks, achieving efficient reading and writing, achieving redundancy, and avoiding data loss caused by a single disk failure.

 

Two configuration modes of RAID 1 and RAID 5 are usually used. The algorithms used in the two modes are different, and the final read and write efficiency and resource utilization are also different, but the ultimate goal is the same: to avoid a disk failure .

 

The array composed of RAID cards and disks is purely at the hardware level . For the operating system, the final way to use it is to format it as an XX file system for use, and for Windows systems, it will be formatted as NTFS for use.

 

After the advent of virtualization, such a hardware architecture can still be used.

 

A very important function of vSphere is the encapsulation of virtual machines. A virtual machine exists in the form of a file and can be copied to other arbitrarily. In order to more precisely define the resources required by this virtual machine, there will be many files to represent this virtual machine, such as virtual machine configuration files with a .vmx suffix, data files with a .vmdk suffix, and so on.

 

image

 

A single server is virtualized and runs a lot of business, so there is no problem, but in an enterprise environment, server hardware failure must also be considered . Therefore, there is the concept of cluster under vSphere. A cluster is regarded as a resource pool. With many advanced features of vSphere, business can run on any host in the cluster without worrying about a single host failure .

 

The following figure demonstrates vSphere's failure recovery mechanism HA after a single server fails  , which can "migrate" the virtual machines on the failed host to other hosts for operation .

 

image

 

And behind this function, there is a premise, that is, shared centralized storage . The key word here is sharing . A storage can be connected to multiple servers at the same time to read data at the same time. After any server fails, the data is not affected, and other servers can use the data to quickly restore business.

 

image

 

Generally, storage will have two heads (equivalent to the brain of storage) to ensure redundancy, and each head will have multiple interfaces to ensure interface redundancy. Each interface can provide external data access services, but this number is fixed. In the case of many servers, the intervention of storage switches is required. To ensure redundancy, at least two are required.

 

image

 

 


 

 

 

02

The arrival of big data

 

The architecture of Chapter 1 has been very stable until the emergence of various new concepts.

 

A few years ago compared with the present, the biggest change for individuals is "manpower and one machine", and the biggest change for enterprises is "data surge".

 

In just a few years, the data generated by users has increased sharply, and the demand for data storage and analysis has also increased sharply.

 

Go back to Chapter 1 and take a look at the traditional centralized storage. Many defects are exposed at once:

 

 

  • Capacity has an upper limit

  • All accessories are dedicated hardware, once the equipment is discontinued, the hardware cannot be upgraded

  • Cannot smoothly upgrade capacity and performance according to business needs

  • Rely heavily on storage vendors to provide services

  • Long on-line cycle

  • The configuration is more complicated, has no strong relevance to the business, and is prone to misoperation

  • Inability to provide differentiated services for the business

In order to solve the above problems, three things need to be done: abandon single-point design and avoid single-point performance bottlenecks (no centralized master control); abandon proprietary hardware and use standard hardware to build a storage system; use a flexible storage The management program provides storage services.

 

The final result achieved is software-defined distributed storage based on server clusters .

 

Although the above words are more convoluted, I think one sentence can fully express its three characteristics:

 

  1. Software definition: The storage management program must be implemented based on software. Only software can be open, flexible, and fast, and adapt to the various needs of enterprises for storage.

  2. Based Cluster: Cluster representatives to build such a storage system, you must have multiple servers involved , these servers need to have similar configurations , to provide a unified and standard features ;

  3. Distributed: Distributed can disperse data and IO access to multiple nodes, so that the capacity and performance of the entire storage system increase linearly with the increase of nodes.



     

image

 

Based on the above conceptual diagram, let's take a look at how to implement it.

 

1. Who provides the capacity?



  1.  

As shown in the figure, each server requires a locally installed hard drive to contribute a portion of its capacity . In order to achieve better storage performance, SSD hard disks need to be configured for read-write caching, and HDDs are used to provide large-capacity storage.

 

2. How to connect the hard drives provided by each server?

 

Through the network, in order to ensure performance, it is generally necessary to use a dedicated 10 Gigabit Ethernet. In order to ensure network redundancy, two switches, two-wire connection are also required.

 

One important point here is that 10 Gigabit switches must be interconnected!

image

 

3. To whom is the service provided?

 

If in a virtualized environment, the final storage is a virtual machine. As mentioned earlier, the virtual machine is saved in the form of a file, so you only need to allow the virtualization layer to save the file on it.

 

image

 

4. How to provide storage?

 

Under the traditional storage architecture, all services are implemented through the storage brain-the head.

 

If it is changed to distributed, each node needs to provide storage resources and also needs to access storage , so each node will have related components.

 

image

 


 

 

03

VMware vSAN

 

VMware officially launched vSAN in 2013. Its full name is VMware Virtual SAN. Its architecture is very similar to the distributed storage architecture mentioned in Chapter 2. It is more "perfect" in some designs and achieves enterprise-level security + consumer-level simple.

 

image

 

The following three key words explain vSAN:

 

Pooling:

 

The first step vSAN does is to pool multiple disks on multiple servers in the cluster.

 

1. Connection inside the server

The architecture of the X86 server determines that each server must have a RAID card to use hard disks. Multiple hard disks are aggregated together using the RAID card and then connected to the motherboard.

 

In each vSAN server, at least one SSD + one HDD needs to be configured. HDD is used for storage capacity, and SSD is only used for read-write cache.

 

In terms of RAID card selection, it is recommended to use a RAID card that supports Pass Through mode . In this mode, it is very convenient to replace hard disks and add new hard disks (and if RAID is configured for vSAN, it is often necessary to restart the host and enter the BIOS for configuration).

 

Some people may ask what should I do if data is lost if the hard disk is broken if RAID is not configured? Don't worry, vSAN has its own redundancy mechanism , which will be discussed below.

 

70% of SSD capacity is used for read cache and 30% of capacity is used for write cache. By default, all written data will be placed in the SSD first to reduce the write delay, and these data will be gradually written to the HDD through some mechanisms (this process is called Destage). When reading data from vSAN, vSAN has a set of algorithms to determine which data is hot data, and then pre-caches it to SSD to speed up the reading speed.

 

So, usually we do not recommend the use of RAID card comes with read-write cache , because there are already vSAN optimize read and write caching accelerates.

 

It is important to note here that vSAN requires SSD, HDD, and IO Controller to  be in the vSAN compatibility list, otherwise instability will occur.

 

2. Connection between servers

Inside a server, the hard disks are gathered together through a RAID card, and between servers, a network is required to ensure that data can communicate.

 

Of course, relying on Ethernet alone cannot connect multiple hard drives. An intermediary is needed. This intermediary is the vSAN process embedded in vSphere. To communicate between two network nodes, both parties must have IP addresses. Therefore, with the  vSAN VMkernel , each host needs a vSAN VMkernel and an independent network to ensure high-speed and stable data transmission between nodes.

 

image

 

With the above elements, you only need to enable the vSAN function in the vSphere cluster, and the disks of all hosts will form a logical storage pool.

 

Fault domain:

 

1、

As mentioned earlier, traditional storage uses RAID1 or RAID5 to prevent single hard disk failures. In distributed storage, it is also necessary to avoid single points of failure.

 

vSphere provides the HA function to ensure that services can run on other hosts after a single host fails. The unit of failure here is the " host ", and vSAN also inherits this setting (which is also reasonable because the host also requires hardware maintenance. During maintenance, all storage resources that a host can provide will be offline).

 

Under such a setting, in order to ensure that the data is not lost, the storage location of the data is particular.

The same data of the same virtual machine must be stored on different hosts.

 

image

 

Combining the above architecture and vSAN network architecture, what if a host network has problems? (At this time, the host cannot determine whether its own network is interrupted or another host is interrupted, but only knows that it cannot communicate with other nodes.)

 

At this time, there needs to be an arbitration mechanism to ensure that only one piece of data is active and up-to-date at the same time , otherwise it will cause conflicts.

 

Therefore, in the architecture of the above figure, another arbitration file is created for each piece of data and stored in the third host.

 

image

 

This is the simplest architecture of vSAN. This architecture allows one host to fail. Of course, any hardware failure on a host is also allowed, as long as the failure occurs in one host .

 

The following figure is a simple representation of the vSAN fault domain. There is a word in vSAN called FTT (Fault to Tolerance), which means the maximum number of hosts allowed to fail at the same time . FTT determines the virtual machine data protection level, and also determines the minimum number of a cluster, the number of hosts in a cluster>=2N+1 , N=FTT value.

 

image

 

2. As mentioned above, each host needs a RAID card to connect multiple hard drives, and at least one SSD and one HDD. SSD only serves as read-write cache. From an economic point of view, it is impossible to configure one SSD for each HDD, and multiple HDDs are required to share the resources of one SSD .

 

In vSAN, with the concept of a disk group , a disk group is a logical group used to allow multiple HDDs to share one SSD .

 

Therefore, vSAN disk group and RAID group are not the same concept at all. Data is automatically stored in units of fault domains. You cannot manually select which host to save, let alone which disk group to save.

 

vSAN stipulates that each disk group requires at least one SSD+one HDD, and at most one+7 HDDs . Each host cannot have more than 5 disk groups.

 

image

 

Allowing multiple HDDs to share one SSD is cost-saving, but it also has certain risks. For example, in the event of an SSD failure, the data of the entire disk group will be inaccessible. Therefore, it is generally recommended to use multiple disk groups to distribute data to reduce such failures. The impact.

 

For example: 

Cluster 1: One disk group per host, consisting of 1 400G SSD + 4 800G HDDs

Cluster 2: Two disk groups per host, each disk group consists of 1 piece of 200G SSD + 2 pieces of 800G HDD

 

Once one SSD of cluster 1 fails, the vSAN storage pool will lose 3.2T of raw capacity.

Once one SSD of cluster 2 fails, the vSAN storage pool will lose 1.6T of raw capacity.

 

Differentiated Services:

 

As long as there are different types of services, there must be differentiated services.


For traditional storage, the distinction between services is at the storage volume level. The bottom layer of a storage volume uses RAID 10, and the storage resource obtained by the upper-layer business is the protection level and performance of RAID 10; the bottom layer of a storage volume uses RAID 5, and the storage resource obtained by the upper-layer business is the protection level and performance of RAID 5.

 

1、

As mentioned earlier in the fault domain, using vSAN to ensure that data is not lost when a single node fails, at least three nodes are required.

 

If you want the data to be accessible even when the two nodes fail (ie FTT=2), how should the data be saved?

 

At this time, there may be someone rushing to answer, the data is divided into three, plus an arbitration, and stored on four nodes, as shown in the following figure:

 

image

 

Unfortunately, this architecture has the risk of split-brain . If the hosts happen to be isolated in pairs, it is impossible to determine which two hosts have active data and which two hosts’ storage resources need to be suspended.

 

Therefore, the occurrence of split brain can be avoided by adding a witness node (in actual situations, data storage is not necessarily performed in accordance with the following figure).

 

To achieve the redundancy level shown in the figure below, you only need to modify the storage strategy of the virtual machine and change FTT=1 to FTT=2.

 

image

 

 

2、

vSAN distinguishes different services for different objects through storage policies .

 

E.g:

Set storage policy A for virtual machine 1 (FTT=1, no cache reserved, limit IOPS to 100)

Set storage strategy B for virtual machine 2 (FTT=2, reserve 10% of SSD cache, no IOPS limit)

 

Wait! There is a word mentioned above, object (Object) , what is an object?

 

A file, such as vmdk, is an object in vSAN;

A snapshot file is an object;

The virtual machine swap file is an object;

Virtual machines generally have a folder to store related vmx files, log files, etc., and this folder (VM home) is also an object under vSAN.

 

It can be understood that in the vSAN world, a virtual machine is composed of multiple objects .

 

The object is not the smallest unit, the object is composed of multiple components (Component) . How an object becomes multiple components is determined by the storage strategy.

 

image

image

 

Each component has a size limit, the maximum is 255G, so Objects larger than 255G will be forced to split into multiple components.

 

image

image


 

At this point, the main implementation principles of vSAN are basically finished.

 

For more knowledge about vSAN, it is recommended to look at the vSAN related materials, which are also all included here , especially the vSAN Design Guide inside.

 

How is vSAN configured specifically? You can use your computer to open the demo below and follow the instructions to create it step by step 

Guess you like

Origin blog.csdn.net/z136370204/article/details/115160219