The term Software Defined Storage has generally accepted within IT as a process for delivering storage without the lock-in of hardware from the vendor. How has our understanding of the term evolved over recent years?
It’s easy to use the examples of server virtualisation and networking as reasons why thinking of storage virtualisation for SDS is only just around the corner. However those assumptions are incorrectly made and as usual, storage is a special use case, that doesn’t quite fit the mould of compute and network.
Whether we like it or not, storage is different. In a virtual server environment, the image of the server is held in memory, using a data image on disk as the means of maintaining state. Only changes are committed to disk and these can be asynchronous in nature in order to improve performance. If the physical server is rebooted and the in-memory copy is lost, it is simply reconstituted from the disk image and off we go again. Moving virtual server around the physical infrastructure is simply managing data in flight.
In networking, data is transient across the network and doesn’t reside in the switch other than temporarily, as it moves between compute and compute, or compute and permanent storage. The data is ethereal and the network was designed to be just that; capable of losing data during transmission, with high level protocols designed to manage that scenario.
Storage arrays (and storage in general) have to provide a different purpose. It is the permanent record of data. It has to be the part of the computing infrastructure that maintains state, even when the power is off. That as we know presents special challenges. This has been generally accepted, even for application deployments such as containers.
For networking, “software defined” means splitting of the control plane and the data plane. Simply put, the silicon in the network switch responds to management from an external device, directing packets as it is instructed to do so. To carry this analogy into the storage world, we have to look at two pieces; the transmission of data and the storage of data. For transmission, we expect to use Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), iSCSI (IP SCSI) or perhaps something more bespoke like Infiniband or more proprietary like FICON. With the move to disaggregated storage, NVMe and NVMeoF are starting to become popular.
Static Storage Networking
The problem is, technologies such as Fibre Channel weren’t defined to expect a dynamic network. A source and especially target device were expected to be static devices that joined and stayed in a network (or at worst, very rarely left). This was because the protocol was an extension of existing logic that expected a fixed storage network topology. A Fibre Channel network change was a disruptive event, RSCNs initially notified all devices in the fabric, which if occurring too often could cause fabric issues (over time vendors have minimised the effects of RSCNs). For the networking component, SDS could mean a more flexible routing of data across the SAN.
However, the second thing we need to consider is the requirement to store data permanently, a concept that doesn’t exist in Ethernet networking. It isn’t that simple to decide a that data volume or LUN now resides at the end of another connection and for traffic to be routed there. What happens to the existing data? How would data be moved and how would data integrity be maintained? Most important, what happens in a disaster scenario? This is the hardest part of trying to work out what “software defined” means in a storage context. Some vendors have used the idea of storage virtualisation or running as a virtual machine to represent this part of SDS.
So does Software Defined Storage exist today? In limited ways, I think it does. One example is Hitachi’s Universal Volume Manager feature within the VSP platform, also known as external storage virtualisation. This enables data to be written to an abstract device (which could be an internal disk or an external array) and for the control and data to be treated separately. The array receives and writes data to the target device, but can be directed to write data to another device through the separate control plane.
This can even include (with Hitachi Availability Manager) redirecting I/O to a secondary device without requiring host interaction, but spoofing WWN addresses. It can also mean redirection within the array using Tiered Storage Manager. Incidentally, the image presented here shows a diagram I produced for Hitachi over three years ago. It shows how the technology (which has changed names in some instances) can be placed into layers, in a similar fashion to the one Chuck uses in his presentation. Good ideas never go out of fashion, don’t you think?
VPLEX is another platform I think meets what SDS could mean. Data can be stored across multiple nodes, rather than statically in one place, with the ability to direct that control separately from the data path.
There are also other vendors offering products that fit some aspects of SDS. Solidfire for instance, create an array of nodes that are managed using REST APIs. The data and the control are separated from each other, with provisioning and management handled separately via API. Other platforms like Nutanix do this too, although they are adding compute into the mix.
The Architect’s View
Software Defined Storage is a difficult term to pin down. Today’s storage protocols, along with the need to ensure persistent storage of data whilst maintaining data integrity mean that the dynamic nature of SDS is hard to achieve. There’s still a way to go before storage can be considered completely abstract enough to be termed “software defined”.
Comments are always welcome; please read our Comments Policy. If you have any related links of interest, please feel free to add them as a comment for consideration.
Copyright (c) 2009-2018 – Post #9A48 – Chris M Evans, first published on https://blog.architecting.it, do not reproduce without permission. Photo credit iStock.