Challenging the Dell/HPE Duopoly with Storage Spaces Direct

I've recently spent a fair amount of time looking at hyperconverged solutions based on Windows Server 2016's Storage Spaces Direct - aka S2D.

To quote Microsoft, in case you're not familiar with S2D:
Storage Spaces Direct uses industry-standard servers with local-attached drives to create highly available, highly scalable software-defined storage at a fraction of the cost of traditional SAN or NAS arrays.

You can read about it in more detail here, but to summarise, we can do this with 2-16 servers, keeping either 2 or 3 copies of the data, in either hybrid configuration (flash cache, HDD capacity) or all flash (cache optional).

We can also reserve capacity as spare so the system can self heal if a drive or server should fail. (Which means you could design the system in such a way that it starts to looks very much like EMC ScaleIO - i.e. 2 copies of data with a full nodes worth of spare capacity)

Overall it looks like quite an attractive proposition, particularly if you're an all Microsoft / Hyper-V shop and are going to have to buy the licenses for this stuff anyway.

This commodity based, distributed, highly redundant, self healing architectural approach allows us to challenge the assumptions we make around our support requirements when choosing a hardware vendor, opening up our options beyond the world of Dell and HPE. (And saving us a whole heap of cash)

There are two main reasons for this:

First of all, we no longer have to worry about proprietary hardware with limited availability, like a bladecentre or a traditional monolithic SAN, since both our storage and compute are now running on standard form factor x86 servers.

Whichever vendor we choose, it’s the same Intel chipsets, the same Intel CPUs, the same SSDs and HDDs, the same NICs, etc. These components are widely available from a myriad of suppliers.

Secondly, assuming we choose to store 3 copies of all the data, (which will always be on different physical disks, in different physical servers), even if an entire server fails, we still have 2 copies of all the data.

Add to this enough reserved capacity that we can automatically restore that 3rd copy of the data with no user intervention required, and we can effectively negate the need for same day hardware support.

Consider a 10 node all flash S2D cluster, with 12 disks per server. (Like one I recently looked at for a client)

When you look at the actual hardware, then the component most prone to failure here is that of a disk.

If I choose to maintain 3 copies of the data, and reserve enough capacity to withstand failure of an entire node, then even if I'm unlucky enough that my disks fail at a rate of one per week, (which seems incredibly unlikely), then it would take almost 3 months of inaction before my ability to maintain 3 copies of the data was compromised.

With the 10 node all flash cluster I used in my example, the Dell and HPE options came in nearly 50% more expensive than the same configuration based on Quanta.

That's 50% more cost for the sake of getting a disk changed in the same day, when getting it changed in the same quarter would have been fine.