Software Defined Storage (SDS) is one of those terms that has been readily hijacked by vendors over the past few years.  The term developed from the adoption of Software Defined Networking (SDN), used to define the separation of control and data traffic in the networking world that provides the abstraction needed to deliver more efficient network management and to virtualise network functionality.  Where SDN was reasonably easy to define, SDS has been less clear.  Looking at the SDS Wikipedia page, there is far less detail there than that for SDN, with only a vague definition of what SDS characteristics should be.

I’ve attempted to add my own definition and discussed the subject at TechUnplugged in Austin earlier this year (see slide 13 in this deck).  Part of the problem with finding an adequate definition is that data storage has two components, both a persistent side, for storing and recalling data and a transmission side, to cover how data passes from host to external storage.  SDN by contrast only has to worry about the data transit definitions, so has fewer concerns around performance and throughput as far as an individual host is concerned.  To add to the confusion, storage is moving back into the server with hyper-converged solutions, making it more difficult to come up with a consistent definition.

Object Storage as a Bridgehead

Looking at how object storage has developed over the last 6-7 years, we’ve seen many entrants come to the market that are purely software-based.  NooBaa is one of the newest start-ups to this market, launching at VMworld in August 2016.  However Scality, Cleversafe, Caringo, Cloudian, OpenIO and Ceph are all purely software-based (even if they are resold with hardware by vendors).  There are few vendors with products that are hardware focused (notably EMC & DDN, although DDN does a software offering too).  So why has object storage been more of a natural fit for SDS?  Here are some thoughts;

  • Performance – object stores are less dependent on performance, specifically the performance and latency of each individual I/O.  Storing and retrieving objects is more focused on data throughput than latency, which is much easier to achieve in a scale-out model.  Data can be scattered over many nodes, with any individual failure having less impact on the overall performance of an individual request. There aren’t usually many tiers of storage in object stores, so data can be widely distributed across nodes without direct concern for individual I/O performance.
  • Capacity – object stores are designed for very large capacity and that by nature implies commodity hardware.  No-one wants to pay standard block-based vendor pricing for object storage systems.  The economics of the data access profile and the data itself mean much of the data may be inactive and not justify expensive storage.
  • Management – Object stores are almost exclusively driven using web-based protocols (HTTP/S & REST) and managed with web GUIs.  This nicely suits the SDS management definitions.
  • Improvements in Technology – the evolution of server components (processors, memory, bus speeds) means object stores can be built reliably from commodity components and implement fault tolerance at the disk and/or node level.  Processor improvements mean functions like compression and erasure coding can be achieved with standard x86 CPUs rather than dedicated hardware.

SDS for All

Looking at wider storage usage, file and block-based SDS solutions already exist, but perhaps don’t have the same adoption rates as object storage.  Unfortunately there aren’t any figures to corroborate this, apart from looking at how the hyper-converged market has grown, with Gartner predicting a market size of $2 billion this year and $5 billion by 2019.  SDS underpins hyper-converged solutions and many SDS vendors have pivoted to cover hyper-convergence in their offerings.  So the wider adoption of SDS is on the cusp of widespread adoption.

The Architect’s View

There’s an assumption that hyper-converged solutions will subsume the traditional storage market, however I think we’re likely to see a big uptake in SDS.  Dedicated hardware will (eventually) move to being as niche as the mainframe, but won’t totally disappear.  The more interesting trend will be how pricing changes for storage.  The existing model of $/GB charged differently by tier is hard to justify (and police) for a pure software solution, making it a flat rate per node, per GB and/or per feature.  Essentially SDS pricing will move to align with that in the public cloud.

 

Comments are always welcome; please read our Comments Policy first.  If you have any related links of interest, please feel free to add them as a comment for consideration.  

Copyright (c) 2009-2016 – Chris M Evans, first published on https://blog.architecting.it, do not reproduce without permission.

 

We share because we care!Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0Buffer this pageEmail this to someoneShare on Reddit0Share on StumbleUpon0

Written by Chris Evans

  • Well, it looks like there will only be two types of storage in the not-too-distant future. Flash arrays for hot or transactional data and object storage for everything else. When the price of SSDs gets down to $0.15/GB it will start replacing some uses for HDDs in object storage clusters. Eventually every use case for object storage will use flash. Flash SSDs have already won the capacity competition with HDDs. Price remains the last barrier but probably not for much longer.

    • Tim, agreed that prices will continue to drop – I guess the question is whether pricing will drop to the point that SSDs do become cheaper than HDDs, simply based on the fact that HDDs continue to grow in capacity and drop in price. Then there’s the problem of capacity in manufacturing. There’s still masses of capability to make HDDs, I wonder if we will be able to keep up with SSD demand. This has been a problem for some time.

      • Chris, the price of SSDs does not actually have to fall below the price of HDDs for the wholesale switch to begin. When you factor in the costs for electricity, cooling and real estate to operate a “fleet” of HDDs, SSDs don’t have to be cheaper than HDDs in order to be competitive. I’ve noticed that the price per GB for HDDs stopped falling a while ago, and now the production of HDDs has started falling. This would indicate that there is an excess of manufacturing capacity for HDDs. I agree that flash manufacturing capacity is a short term constraint on a major shift to SSDs, but manufacturers who see the growing demand for flash will create the supply, even if it takes a couple of years to bring new fabs online.

        • Tim, true, fair point on the power/cooling, however I think there’s a psychological barrier on price that has to be broken too, irrespective of the TCO-based cost, because people tend to price in broad terms based on pure $/GB. I couldn’t say where that is, but it will be slightly more than the cost of disk, but not much. The fab development will be interesting; there’s a massive latent demand once SSD costs come down a little further, so big opportunities for Samsung, Micron, Toshiba, etc.

          • Chris, well it is a guess, but I think when the price per GB for SSDs is $0.10 more than the price per GB for HDDs the wholesale shift to SSDs will begin. For example, if the price of a 1TB HDD is $0.05 per GB, a price of $0.15 per GB for a 1TB SSD will trigger the decision to go with SSDs over HDDs. This is more than a penny or two difference, but if you factor in the operating costs I mentioned, I think you will be able to cover the gap between the two when the difference gets to about $0.10 difference per GB.

          • Shanmuga Sundaram

            I think SSD is even when you include de-dupe/compression offered by most all flash array vendors. I think 3:1 seems entirely possible, the cost even now is not huge factor.