DataCore has been active over recent months with benchmarks based on their new SANsymphony Parallel Server offering.  The most recent of these claims 5.1 million SPC-1 IOPS at $0.08/SPC-1 IOPS and 0.32 millisecond response time.  Other vendors are crying foul on these results, claiming they don’t represent a true test because all of the data is held in memory.  So, is it fair to put all of your data in DRAM or is this simply gaming the test?

In this discussion I can see a number of clear issues:

  • Putting all of your data in cache isn’t cheating.  In fact, if cost/benefit analysis can justify it, we should be caching as much data as possible.  In-memory databases and products like PernixData FVP and Infinio Accelerator specifically aim to keep as much data as possible in the cache (as DRAM or flash) rather than write to external storage.
  • Cache Miss Is an Issue.  What we have to look at is what happens to I/O response time for data not in the cache or when the cache becomes fully loaded.  If we never reach this point though, then who cares if all the data is in memory?  This would be a good testing point for the DataCore solution.
  • Caching isn’t persistent storage.  In general, caching I/O isn’t the same as serving it off persistent storage.  Cache is volatile and needs warmup time as well as additional protection.  If data isn’t in cache and has to be retrieved from the backing store, then that I/O could suffer.  If I/O response time has to be 100% guaranteed, then data should sit on flash.
  • With Benchmarks, Caveat Emptor.  All benchmarks can be gamed in one way or another.  Benchmark workload profiles rarely match real world applications and there’s no replacement for running proofs of concept to validate vendor claims (check out my posts on storage performance).

In an ideal world,  all of our data would sit on the fastest media possible.  However compromises have to be made; servers will only hold a certain amount of DRAM; DRAM is volatile; DRAM is (relatively) expensive; we like persistence in our data; we have mobility requirements for our data.  For all of these reasons, keeping everything in DRAM and nowhere else isn’t practical.  However if we can serve the vast majority of I/O requests from cache, then we’re in a good place.  This is what storage arrays have been doing since EMC introduced the ICDA (e.g. Symmetrix) in the early 1990’s.

The Architect’s View

Naturally DataCore is presenting their product in the best light possible.  Every vendor bar none does this and will highlight the benefits of their offerings without discussing the shortcomings.  Benchmarks, including SPC-1 are far from perfect, for example, systems that have always-on data optimisation features aren’t supported for testing.  However it also wouldn’t be practical to continually update the benchmark specification.  Testing is expensive and vendors can’t afford to be running benchmarks regularly, which they’d have to do if the specification was continually changing.  Otherwise, there’d be no way to do realistic vendor to vendor comparisons.

Just remember, there’s no substitute for doing your own testing, preferably with your own workload.  Use the benchmarks in the way they were intended – as a guideline rather than a definitive statement of capability.

Further Reading

You can find more details on the SPC results from the links below, as well as details from DataCore on their results.  I’ve also included some links to recent posts on performance testing.

Comments are always welcome; please read our Comments Policy first.  If you have any related links of interest, please feel free to add them as a comment for consideration.  

Copyright (c) 2009-2016 – Chris M Evans, first published on https://blog.architecting.it, do not reproduce without permission.

We share because we care!Share on Facebook0Share on Google+0Tweet about this on TwitterShare on LinkedIn0Buffer this pageEmail this to someoneShare on Reddit0Share on StumbleUpon0

Written by Chris Evans

  • Cleanur

    All vendors will attempt to game the benchmark in some way, but typically these are subtle tweaks that result in marginal performance gains and to be honest are expected to some extent. The problem I see with the DataCore methodology and subsequent result is that it completely devalues SPC as a relative measurement of performance.
    The flip side being that it will also now encourage others to pursue similar shenanigans in order to remain relevant, thus devaluing SPC-1 further.
    Datacore have done SPC and more importantly customers in the market for storage absolutely no favors with this type of testing and longer term once the hype has subsided I doubt the result will do much for DataCores credibility either.

  • George

    Based on the questions raised, It seems some have missed a major aspect that contributed to DataCore’s world record storage performance. As some may think, it wasn’t just the cache in memory that made the biggest difference in the result. The principal innovation that provided the differentiation is DataCore’s new parallel I/O architecture. I think our Chairman and Technologist; Ziya Aral says it well in the article below.

    From the Register Article by Chris Mellor: The SPC-1 benchmark is cobblers, thunders Oracle veep

    Excerpt below:
    DataCore’s response … Sour grapes

    “The SPC-1 does not specify the size of the database which may be run and this makes the discussion around ‘enormous cache’, etc. moot,” continued Aral. “The benchmark has always been able to fit inside the cache of the storage server at any given point, simply by making the database small enough. Several all-cache systems have been benchmarked over the years, going back over a decade and reaching almost to the present day.”

    “Conversely, ‘large caches’ have been an attribute of most recent SPC-1 submissions. I think Huawei used 4TB of DRAM cache and Hitachi used 2TB. TB caches have become typical as DRAM densities have evolved. In some cases, this has been supplemented by ‘fast flash’, also serving in a caching role.”

    Aral continued:
    In none of the examples above were vendors able to produce results similar to DataCore’s, either in absolute or relative terms. If Mr. Hollis were right, it should be possible for any number of vendors to duplicate DataCore’s results. More, it should not have waited for DataCore to implement such an obvious strategy given the competitive significance of SPC-1. We welcome such an attempt by other vendors.

    “So too with ‘tuning tricks,’” he went on. “One advantage of the SPC-1 is that it has been run so long by so many vendors and with so much intensity that very few such “tricks” remain undiscovered. There is no secret to DataCore’s results and no reason to try guess how they came about. DRAM is very important but it is not the magnitude of the memory array so much as the bandwidth to it.”

    Symmetric multi-processing

    Aral also says SMP is a crucial aspect of DataCore’s technology concerning memory array bandwidth, explaining this at length:

    As multi-core CPUs have evolved through several iterations, their architecture has been simplified to yield a NUMA per socket, a private DRAM array per NUMA and inter-NUMA links fast enough to approach uniform access shared memory for many applications. At the same time, bandwidth to the DRAMs has grown dramatically, from the current four channels to DRAM, to six in the next iteration.

    The above has made Symmetrical Multi-Processing or SMP, practical again. SMP was always the most general and, in most ways, the most efficient of the various parallel processing techniques to be employed. It was ultimately defeated nearly 20 years ago by the application of Moore’s Law – it became impossible to iterate SMP generations as qucikly as uniprocessors were advancing.

    DataCore is the first recent practitioner of the Science/Art to put SMP to work… in our case with Parallel I/O. In DataCore’s world record SPC-1 run, we use two small systems but no less than 72 cores organized as 144 usable logical CPUs. The DRAM serves as a large speed matching buffer and shared memory pool, most important because it brings a large number of those CPUs to ground. The numbers are impressive but I assure Mr. Hollis that there is a long way to go.

    DataCore likes SPC-1. It generates a reasonable workload and simulates a virtual machine environment so common today. But, Mr. Hollis would be mistaken in believing that the DataCore approach is confined to this segment. The next big focus of our work will be on, analytics which is properly on the other end of this workload spectrum. We expect to yield a similar result in an entirely dissimilar environment.
    The irony in Mr. Hollis’ comments is that Oracle was an early pioneer and practitioner of SMP programming and made important contributions in that area.

    DRAM usage
    DataCore’s Eric Wendel, Director for Technical Ecosystem Development, added this fascinating fact: “We actually only used 1.25TB (per server node) for the DRAM (2.5TB total for both nodes) to get 5.1 million IOPS, while Huawei used 4.0TB [in total] to get 3 million IOPS.”

    Although 1.536TB of memory was fitted to each server only 1.25TB was actually configured for DataCore’s Parallel Server (See the full disclosure report) which means DataCore used 1.5TB of DRAM in total for 5 million IOPS compared to Huawei’s 4TB for 3 million IOPS……