Synchronous replication is a key feature that many enterprises depend on for data protection as it offers the ability to ensure data is 100% copied to a remote site in the event of a disaster.  On flash storage, sync replication may seem a little at odds with the idea of providing high performance, but the performance impact trade-off is essential where RPO of zero is needed.

So finally Pure Storage has brought synchronous replication to the FlashBlade product line with a new feature called ActiveCluster.  In comparison to traditional replication where one volume is active and the other passive (typically read/write and read-only), ActiveCluster presents two fully active volumes out of two FlashArray systems.  Each volume is capable of servicing I/O and thus be implemented in configurations such as a VMware metro cluster.

Active/Active

Prior to the official announcement, bloggers were pre-briefed on ActiveCluster (as part of a Storage Field Day Extra event), offering a chance to discuss some of the detail of the implementation.  Managing multiple concurrent writes to two copies of a volume isn’t a trivial task and there’s a risk of a “split brain” scenario if the network is interrupted.  For this reason, Pure offers a cloud-based arbiter (called the Pure1 Cloud Mediator) to manage network disruption scenarios.  However even with this in place, an application needs to be cluster-aware.  If not there’s the obvious corruption risk, with “latest writer” taking priority.  Incidentally if you watch the first of the SFD videos (link) you can see me ask about the way in which I/O updates are applied in the event of a non-cluster aware update.  What I was trying to get to was the point that either one array or the other has to be the leader, because one has to make the “latest writer” decision.

A Single View of Data

Having true active/active means on platforms like VMware vSphere, a datastore truly looks like a single entity.  As a result, when moving virtual machines around the infrastructure, only the active VM in memory needs to be moved (vMotion rather than Storage vMotion).  As far as any cluster member is concerned, the array pair looks like a single array, just with more paths in place.  Naturally it makes sense to favour the nearest path to the host and there are configuration options within Purity to do that for a replicated volume or group – in Pure’s terms called a Pod.  The array simply advertises the shortest links from array to host via ALUA, once configured by the administrator (see the ActiveCluster demo by Larry Touchette for more details – link).

The Architect’s View

I have two views on the implementation of ActiveCluster.  My technical head says this is a great implementation, sorely needed by customers and doing active/active is much more interesting than traditional replication.  I would however like to have the ability to run active/passive for environments where I don’t have a cluster aware setup – just in case.

My more cynical side says that the decision to create an active/active cluster could equally be about matching and jumping the competition.  Dell EMC VMAX needs VPLEX to add active/active to SRDF.  XtremIO X2 now has native replication but not active/active – again VPLEX would be needed.  HPE 3PAR has metro cluster support but the replication isn’t active/active.  ActiveCluster addresses customer needs and provides sales/marketing a leap over the vendors most likely to be in a bakeoff with Pure.  Now, none of this is a bad thing, in fact, if that was part of the plan, it’s just good product strategy sense and worth applauding.

You can listen to me talking to Ivan Iannaccone, Director of Product Management at Pure on the Storage Unpacked podcast.  We chat about the latest software updates, including ActiveCluster.

Further Reading

Disclaimer: I was personally invited to attend Pure Accelerate 2017.  My flights, accommodation and meals were paid for by Pure Storage.  However there is no commitment for me to blog on any subjects and Pure receive no rights of editorial before content is published.

Comments are always welcome; please read our Comments Policy first.  If you have any related links of interest, please feel free to add them as a comment for consideration.  

Copyright (c) 2009-2017 – Chris M Evans, first published on https://blog.architecting.it, do not reproduce without permission.

Please consider sharing!Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedInBuffer this pageEmail this to someoneShare on RedditShare on StumbleUpon

Written by Chris Evans

  • Here’s a thought. Storage vendors should consider NOT doing any replication. After all, the database transaction, the business transaction, the everyday processes of a business are not atomic with a disk block write or a file close operation. The “transaction” is rarely at the storage layer. It usually involves far less data and far fewer operations, than replicating an entire block volume. Think about database temporary partitions, for one example.

    Anyone relying solely on storage replication is kidding themselves that they have DR, and they do not understand the technology running their business. Strong words, but true all the same.

    Active/Active vs Active/Passive has been a long running feature battle in the storage vertical. It seems to be the default option for businesses not willing, or unable to identify their true business transactions and the systems that underpin them. Application tier, database tier, storage tier – you may be able to solve the problem at any of those tiers. So – how do you reconcile when you solve it at more than one of them? Which one is correct? Conflict resolution how? All good fun. The right answer is “it depends”, and “horses for courses”. My 2c – spend the time to identify your critical processes, applications and transactions. Then solve that problem. It’s often simpler than you think and can save you real money on storage and network bandwidth.

    • Rob, I think you’re right that there’s a need to have transaction-based replication, however I think so much effort has been made to over-simplify IT that IT organisations use tools like sync-rep then deal with the consequences afterwards. Admittedly in the case of BA that recent consequence cost them £80m, but I doubt they had ever identified the risk of the problem that occurred. For most orgs, replication fits the bill and they can fix up the data later.

      • Not arguing – the “near enough, good enough” use of storage replication for the DR problem is popular, undeniably. Rather than using Pure’s replication or anybody else’s, there is an opportunity to reduce costs and improve accuracy by looking for the business transaction. That’s my point. If the risk of losing a few transactions on a borked journaled filesystem or a ZIL is acceptable, fill your boots. You do still need to reconcile the mess post-disaster and understand which block devices are critical and which are not which you can only do if you understand where your applications are active and transactions happen (BA didn’t know enough). Or, you are stuck with “replicate everything” and can’t risk leaving anything out. Often an expensive proposition.

        Organisations that understand where their transactions are in the tiers above storage, have a much improved outlook towards moving to cloud native platforms. Those stuck with legacy “replicate everything” approaches, not so much. It’s the difference between lift-and-shift vs reengineer.

        As a bootnote: I not-so-fondly recall the days (around 2000AD) of buying licences for Openfile Manager agents to make sure filesystem snapshots included files like normal.dot, kept perpetually open by user login sessions. Microsoft’s VSS providers kicked this largely into touch, but those were interesting times. Lots of coffee.

  • Pingback: Pure Accelerate: FlashArray Gets Synchronous Replication - Tech Field Day()

  • Pingback: Storage Field Day 13 – Wrap-up and Link-o-rama | penguinpunk.net()