VMware’s HCI (Hyper-Converged Infrastructure) solution is VSAN – or Virtual SAN. It’s a kernel based solution built right into the ESXi Hypervisor, that allows each server to consume local storage, and pool it together and share it back out to the VMware cluster. A minimum of 2 hosts is required (plus another witness appliance) or simply 3 or more hosts to maintain quorum. 4 hosts are typically recommended, in order to maintain availability when 1 host is in maintenance mode or offline. Local disks in each host are combined into Disk Groups, which consists of 1 SSD, and 1 or more capacity drives (SSD or HDD)
Data protection is configured via storage policies and applied to the per-VM object layer. We specify the number of failures to tolerate, and the striping for the object. FTT (Failures to tolerate) will configure how many hosts the data will reside upon, and striping will determine how many capacity disks that data will stripe across for increased IO.
VSAN 6.x is offered in 2 types of deployments: Hybrid, and All-Flash.
Hybrid:
In a hybrid VSAN deployment, the disks groups will contain HDDs for the capacity layer and storage of VM files. The SSD is used as a read-write cache, with 30% of it used for write cache, and 70% used for read caching. This SSD should be sized to at least 10% of the capacity drives. By utilizing the SSD as a caching layer, the slower HDDs can be given a performance boost, and the cluster storage accelerated to far more performance that HDDs alone could accomplish.
All Flash:
In an All Flash configuration, there are no HDDs in the hosts, as the capacity drives are SSD. There is 1 SSD (per disk group) still used for caching; however, the entire drive is used for write cache, and not utilized for read caching, since reading from the SSDs in the capacity layer is fast. It’s recommended to use a faster SSD for the caching drive, with consistent write performance. These drives are usually more expensive than the larger capacity enterprise SSDs which will be in the capacity layer.
Technical benefits of All-Flash VSAN
Erasure coding – Erasure coding is a method of striping data across the VSAN nodes, and maintaining parity. Basically, like a network RAID 5 for your data. This will assist in capacity savings, as in a FTT=1 scenario, a 100GB data object would consume 200GB of space, in order to withstand a host failure. But with erasure coding, that 100GB would be striped across the nodes, with parity information accompanying it, and only consuming 1.33 times the space, instead of 2 times. An option for Raid6 (double parity) also exists, and would consume 1.5 times the data object.
Deduplication and Compression – If enabled, VSAN will deduplicate data as it leaves the caching tier, and is de-staged into the capacity tier. Data is deduplicated at a block size of 4k. During the de-staging process, the deduplicated blocks are then compressed only if they can reach a target block size of 2k. By only compressing blocks that can reach the target size, less CPU time is allocated to the compression process, speeding up the IO – as the smaller blocks would not benefit enough from the compression to justify the CPU cycles. These 2 processes will allow the data to consume much less storage on the capacity tier. While results vary depending on data sets, customers can expect to average between 3-7x deduplication rate.
Financial Benefits of All-Flash VSAN:
While building out a greenfield All-Flash VSAN environment appears to be more expensive than a Hybrid deployment, the opposite is actually true! If raw capacity of the storage is used, and tied to the costs of the drives, then the Hybrid array will be cheaper – however, by utilizing erasure coding and deduplication, you are effectively changing the equation, as raw capacity no longer matter, and you are focusing on usable capacity, which will be even greater.
Example: (Just an example – in a real word scenario, more hosts would be needed to satisfy feature requirements and failover capacity)
Hybrid VSAN:
- 3 x hosts (each)
- 1 x 480 GB SSD (cache)
- 4 x 1 TB HDD (capacity)
- Total cache – 1.44TB
- Total capacity – 12 TB
All-Flash VSAN:
- 3 x hosts (each)
- 1 x 480 GB SSD (cache)
- 4 x 480 GB SSD (capacity)
- Total cache – 1.44 TB
- Total capacity – 5.76 TB
Virtual Machines provisioned – 5TB
In the hybrid array, we will assume a FTT=1, so our VMs will consume 2x the storage space to maintain availability. In the All-Flash VSAN, we will assume that Erasure coding is enabled (Raid 5) and deduplication and compression are also turned on. [Note that this is an example to show space saving benefits. Erasure coding requires 3+1 hosts (raid 5), or 4+2 hosts (raid 6)].
Hybrid Array:
- 2 x consumption: 5TB x 2 = 10TBs of space utilized.
All-Flash VSAN:
- 1.33 consumption from erasing coding. 5TB x 1.33 = 6.65TB of space utilized (3.35 TB savings)
- Deduplication and compression estimate: 3.5x estimate (Conservative) 6.65TB / 3.5 = 1.9TB Space Utilized (additional 4.75TB savings)
Total All-Flash VSAN savings: 8.1 TB
So, even when purchasing smaller capacity SSDs, at a price point more comparable to the larger HDDs, we still are able to better utilize the storage. In fact, in the above examples, we could only expand our VMs by no more than 1 TB (ignoring space for snapshots and swap). However, with All-Flash, we are able to expand our VM workload by 300% (3.8TB)




 
							 
							 
							
Pingback: Getting Started with VMware Virtual SAN: Hybrid or All Flash?
Cost is still a big factor here – hybrid is still cheaper. You fail to mention that enterprise VSAN licensing is needed to make use of dedupe, compression and EC. Not to mention in a lot of circumstances larger workloads can’t make use of the storage efficiency – SQL for example EC is a performance killer, also most SQL workloads these days are also encrypted making dedupe and compression ineffective. We are moving towards a time when all flash will be more economical, I think we’re 2 years away yet – especially with a global shortage on SSD!
Peter,
You bring up great points, and yes – it all depends on what VM workloads are deployed at the client site. Obviously, heavy SQL workloads may not benefit from Erasure Coding, and any encrypted VMs will cause issues with any deduplication system. However, this article is aimed at the majority of customers as a whole, and the generalized workloads running.