Back in August 2018, Pure Storage announced the acquisition of StorReduce, a start-up company with a nifty data de-duplication engine. That technology has been put to good use and now appears in ObjectEngine, a backup-to-flash data protection solution.
As previously highlighted, I looked at StorReduce in 2015 and was quite positive about the technology. The StorReduce solution tackles a potentially big problem in the public cloud. Data ingested into platforms like S3 have no client-facing de-duplication applied. This means if you store ten copies of the same data, you’re charged ten times. This can be quite an issue for backup, where data can be based off a single static O/S image or taken as full snapshots.
- Pure Storage Acquires StorReduce
- Data Protection in a Multi-Cloud World
- Rethinking Data Protection in Public Cloud
Obviously, the hyperscalers like Google and AWS could well be de-duplicating behind the scenes (and would be mad not to), but these benefits are not passed on to the customer. So, StorReduce was developed to sit as a gateway to S3, de-duplicating the data as it is written, rehydrating it back on the way out. Savings can be significant, depending on the type of data being stored; those savings help justify investing in the software.
Rewind a little to Pure Accelerate May 2018, and in a 1:1 session, Brian Schwarz explained that many customers have started using FlashBlade as a backup target because of the fast backup it offers and possibly more important, fast restore times. The problem here though is that FlashBlade has no native de-duplication and if your backup platform doesn’t offer it, then that’s a lot of additional expensive storage being used.
The answer is clearly to build in de-duplication. By acquiring StorReduce, Pure gets the technology to solve the data optimisation problem. It also provides the capability to extend the paradigm a little further and offer data offload to the public cloud. Thus we see the premise of ObjectEngine. A flash-to-flash-to-cloud solution that “modernises” the existing data model of disk-to-disk-to-tape. The ObjectEngine appliance acts as a gateway and de-duplication engine for backup applications that talk S3. Data is written to both a local object store for fast backup/restore and can also be written to the public cloud with a cloud instantiation of the software.
Fast Backup & Restore
Historically, the performance levels seen when writing to and reading from de-duplicating devices was pretty asynchronous. Writing could be fast, whereas restores could be slow. IBM tried to tackle this problem about five years ago with a combination of ProtecTIER software (the Diligent acquisition) and FlashSystem software. HPE also talks a lot about restore performance in relation to their StoreOnce platform, which aims to optimise for the same performance problems.
De-duplicating and rehydrating data needs to take into consideration how the data will be used. There’s no point implementing the most efficient and 100% guaranteed collisionless de-duplication algorithm if the performance characteristics make it unusable. Remember also that de-duplication and encryption processes can make backup data highly random in I/O profile. So, any underlying storage solution needs to be able to meet that requirement. High parallelism is a key value of FlashBlade, so it’s easy to see why Pure Storage combined ObjectEngine and FlashBlade together as a backup solution.
Cloud and On-Premises
ObjectEngine comes in two flavours. There’s OE//A270 – a 4-node cluster with up to 25TB/hour backup capability and 15TB/hour restore. Alternatively, OE//Cloud runs in AWS and scales to 100+ TB/hour and can protect up to 100+ PB of capacity in a single global namespace. In both instances, customers need a backup solution that can write to the S3 API. Data can, of course, come from either a Pure Storage array or third-party solution; it depends on what the backup software supports.
Was the acquisition of StorReduce about saving customers money or is there a more opportunistic move in play here? ObjectEngine could be a way to drive more adoption of FlashBlade as an object store, especially for backup data. However, Pure Storage specifically calls out Data Domain as a competitor to ObjectEngine. This makes me wonder if there is more of a competitive play going on. Pure can now offer primary storage (block and file – FlashBlade and FlashArray) and also meet the data protection needs of prospective customers with traditional backup software and Data Domain. Even the migration process is simple. ObjectEngine is simply added as a new storage target within the backup software. Data can be migrated over time or allowed to cycle round through natural attrition.
The Architect’s View
OK, it may be slightly cynical to assume that StorReduce was acquired to simply build a competitive product. However, as a solution, ObjectEngine is merely a de-duplication target and not a backup platform. Data is stored in the format specified by the backup software and this makes it impossible to use the data for other purposes, without having a way to use backup software to recover it. This begs the question as to how ObjectEngine should be compared to the likes of Rubrik, Cohesity or even NDAS. When the backup software vendor still owns the data format, that vendor is still needed to make that data available for other purposes.
You can see perhaps why I’m curious as to whether this is simply a product for competitive take-out or whether it forms the basis for future data re-use. Hopefully, some of this may be explained at Pure Accelerate later in the year.
Copyright (c) 2007-2019 – Post #D899 – Brookend Ltd, first published on https://blog.architecting.it, do not reproduce without permission.