When object stores first started to appear on the market, the aim was to target a very specific niche.  Unstructured data growth demanded better ways of storing data than file servers.  Object stores offered the ability to scale to billions of objects, without the issues of managing a file system.  However, rewriting applications for REST-based APIs is time-consuming.  Wouldn’t it be nice to have the benefits of object scalability, with the functionality of file?  This is what many vendors now offer, as we see more file and object integration and file protocols on object stores.

Object Scalability

As a physical storage medium, object stores have some great benefits.  They are highly scalable, reaching petabytes of capacity and billions of objects.  With erasure coding, object stores become more reliable as capacity increases.  Data can be dispersed and made accessible across multiple geographies without having to create full replicas that need to be kept in sync.  However, the flexibility of object stores comes at a cost.  An object store offers basic file read/write operations.  Typically these are CRD (Create, Read, Delete), but sometimes CRUD (Create, Read, Update, Delete) where Update can simply be a combination of the other functions.

Object stores don’t implement features like file locking, the ability to update parts of an object, caching or complex security models.  In effect, the features many users and developers have come to expect from POSIX-compliant file systems.

 

POSIX-compliant file system.  A file system that meets the internationally agreed POSIX standards for file and directory operations.  In effect, POSIX-compliant systems will operate and respond in a known and expected way.

The Benefits of File

Using a file system for storing data has been wildly successful, with companies like NetApp founding their business on this part of the market.  Files have more structure than object stores or block devices, implementing directory hierarchies, complex security models (including LDAP and AD based) and data integrity.  A file server adds the intelligence that block devices never had.  As a side comment, it’s interesting to note that file was the primary protocol for data on the mainframe, although the physical storage was divided into volumes/LUNs.  Looking at the two protocols, it seems obvious that combining the physical storage benefits of object with the logical accessibility benefits of file would result in a great combination.  Naturally that’s what vendors have been doing.

Implementations

There are two main ways that vendors have been merging object and file:

  • File on object – file services are provided on a platform that uses object storage at the back-end.  Data isn’t accessible by both protocols and in some cases, only through file.
  • File with object – file services allow data to be read/written interchangeably between file and object.  Users can (for example) write with file and read with object.

Both solutions exploit the benefit of running as an object store, including features that can’t easily be implemented in a traditional file platform.  For example, data can be distributed geographically in an object store without having to entirely replicate the data.  Placing a file system on top allows the file system to be made accessible in multiple locations much more cost-effectively and with reduced complexity.

File On Object

Running a file server on an object store provides the back-end scaling features of an object store, with the normal usability features of file.  In many cases the end user doesn’t know or need to care that the physical storage medium is object. OneBlox from Exablox is one example of this kind of solution.  A OneBlox implementation consists of a cluster or ring of nodes that together operate as a large object and key-value store.  Data is split, de-duplicated and compressed as it is stored across the ring.

The open source Ceph file system platform is another example where POSIX-compliant file systems are created and stored on top of an object store (RADOS).  The RADOS component focuses on managing the physical resources, while the file system layer manages logical access, security and data integrity.

With solutions like OneBlox and Ceph, data on the object store is in a proprietary format, so can’t be accessed directly as objects. Cloudian recently announced HyperFile, an appliance that integrates with HyperStore to provide file services and uses HyperStore as the secure repository.  CTERA partners with companies like IBM Cleversafe and DDN to offer global file storage, backed by either public or on-premises object stores.

File With Object

Accessing file with object is the scenario where data can be stored and retrieved through either object or file protocols interchangeably.  Data could, for example, be written by a traditional application with NFS or SMB and then analysed using the object store interface.  Why is this useful?  Well, it means applications that already store data on a file system don’t have to be rewritten.  Object interfaces can be used to perform tasks that would otherwise slow down a traditional file system, such as high-performance scanning of data.  Processing data for analytics purposes is usually a read-only process, so all of the file locking and security issues are simplified or not relevant.

Vendor File With Object Solutions

Support for File With Object is now quite widespread in the industry.  SwiftStack recently announced support for SwiftStack 6, which includes bi-directional file/object support.  OpenIO provides a FUSE connector that maps files to objects.  Data can be read/written to the file system, with object support restricted to read-only for data integrity purposes.  Scality RING provides the capability to access data stored on SOFS (Scale-out File System Connector) using SMB, NFS or FUSE protocols with the S3 API.  Hitachi Vantara’s HCP platform has had the ability to read content stored with object protocols through standard file APIs, including NFSv3, SMB 3 and WebDAV.  Caringo provides multi-protocol access to data in Swarm through SwarmNFS, implemented as a lightweight stateless Linux process.

The Architect’s View

Blurring the lines between object and file provides some scalability and efficiency benefits to the enterprise.  Where POSIX compliance isn’t required, data can be accessed quickly and efficiently as objects, making analytics easier to integrate, especially with public cloud.  Imagine today’s network of CCTV devices that write video to a file share, but could be analysed in the Public Cloud as object content.  There are some issues to consider, such as how file locking and data integrity is managed on a global basis.  These are non-trivial issues to solve (but have been resolved already).

Again, we’re seeing the abstraction of the underlying storage media and a greater focus on the data and what can be done with it.

Further Reading

Comments are always welcome; please read our Comments Policy first.  If you have any related links of interest, please feel free to add them as a comment for consideration.  

Copyright (c) 2009-2017 – Post #66E6 – Chris M Evans, first published on http://blog.architecting.it, do not reproduce without permission.

Share me!Share on Facebook3Share on Google+0Tweet about this on TwitterShare on LinkedIn87Buffer this pageEmail this to someoneShare on Reddit1Share on StumbleUpon0

Written by Chris Evans

  • Travis

    Very useful summary. Would be easier to understand & apply if it included more use cases.

    • Travis, agreed. I am following up with vendors mentioned to get some more concrete examples. I’ve seen a few being mentioned, but some additional detail, I’m sure, would be useful.

  • Well, I agree that in one form or another, OBS vendors have attempted to build-in or add-on file access protocol support to their object storage clusters. While the AWS S3 API is supported in whole or part by every OBS software vendor, there seems to be a fair amount of variation in how vendors provide “file-on-object” support for customers whose applications rely on legacy file access methods like SMB and NFS.

    I recall reading a comment from a storage industry pundit that object storage has hit a wall and this will sideline it to niche market status. I don’t agree because it would mean that the growth in data itself would be ending or declining and AFAIK no one is proclaiming the end of data growth. I will grant that the one major thing OBS vendors uniformly lack is lots of customers. Most OBS vendors don’t have enough customers because in my opinion, they spent too much time explaining what OBS is when most customers don’t need to fully understand how it works. If customers have pain points around data growth, storage, and management, they want solutions that will integrate with their current workflows.

    This is why support for “file-on-object” is now being “re-introduced” or “improved” by OBS vendors. One OBS vendor (SwiftStack) recently proclaimed that they are no longer an OBS software vendor. Well, you have to give them credit for trying out that marketing message. The point is every OBS vendor who has been flogging the technology minutia of their OBS software has wasted a lot of time doing that in front of potential customers who really have more important problems to solve. Don’t get me wrong, OBS has to work properly and every vendor needs to meet certain requirements and have certain features (table stakes) in order to be considered by a potential customer. But in the end storage solutions that integrate will be preferred over storage solutions that don’t do that so well. Time for OBS vendors to bust a move and get down with providing solutions customers will recognize as being beneficial and cost-efficient to their operations.

  • neilwlevine

    The Ceph project recently introduced a File with Object option by offering a FSAL for Ganesha that interacts with the RADOS Gateway (RGW), the component that provides an S3 endpoint. While not as performant as CephFS (the Ceph on Object solution), it is pretty well featured (NFS v3 and v4): http://ceph.com/planet/ceph-rados-gateway-and-nfs/

    • Chris Evans

      Neil, wasn’t aware of this, I will check it out, thanks.