Home | Featured | Is RAID Dead? No – Next Question

Is RAID Dead? No – Next Question

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×

Portakabin Modular Construction

I had a small discussion today with a few folks on Twitter that was a forerunner to a Wikibon “Peer Incite” discussion on whether RAID was still relevant or not (hence the title of this post).  It seems to me that perhaps the Wikibon discussion was a good way to introduce a “RAID beating” technology to the market (and that’s a subject for another post), but regardless it may be worth reviewing where we are with RAID and it’s relevance in the future.

Background

IBM 3380 Model CJ2

IBM 3380 Model CJ2

The concept of RAID for disk devices was first discussed in a 1987 technical paper called A Case for Redundant Arrays of Inexpensive Disks (RAID)” by David A Patterson, Garth Gibson and Randy H Katz from the University of California at Berkeley.  The premise of their paper was to compare the price and performance of SLEDs (Single Large Expensive Disks) versus cheaper alternatives in a redundant form and get improvements in performance, scalability and reliability for a lower cost.  The paper compares the price of inexpensive disks to the IBM 3380, one of the first disks I ever used.  These were monster devices the size of fridges, as you can see from this image I’ve borrowed from IBM’s archives (full link here).

I spend many hours recovering individual files from failed devices, having to recover the lost files from backups.  I’m not saying these devices were unreliable, but unfortunately individual file restores isn’t a scalable solution to managing continually increasing capacities.  To give you an idea of scale, I managed a mere 300GB of IBM 3380 & 3390 storage in one installation; what seems an incredibly small amount.

So, RAID promised to increase reliability significantly by providing a means to recover lost data.  RAID-1 is probably the simplest and easiest of the RAID levels to understand; write the data to two separate drives and if one fails, you can read/write from the other.  RAID-5 and RAID-6 offer distributed parity data recovery, with single and dual parity respectively.  This reduces the RAID overhead (RAID-1 was 100% more expensive than no RAID on the disk costs alone) while retaining the ability to recover.  However RAID is not infallible.  Double disk failures (where a second or third drive fails during recovery of the first failed drive) can occur, despite how small the chances seem.  This means RAID should be part of a portfolio of protection measures and not your only one.

Today we see RAID in even the smallest home storage devices, embedded as software options in operating systems and of course an mandatory component of enterprise storage arrays.  However, RAID suffering from potential scalability issues.

The Scalability Problem

Whilst RAID was good for the hard drives of 20 years ago, we now are starting to see issues with the RAID architecture in a number of areas.  These issues are being driven by the sheer increase in capacity we’ve seen for modern disk drives; an increase that hasn’t been matched by an equivalent improvement in performance and I/O throughput.  Here are some of the challenges:

  • Capacity improvements have increased 100-fold between the IBM 3380 and the latest range of Savvio 15K drives (1.26GB to 146GB)
  • Performance has only increased 50-fold in the same period
  • Density (volume occupied by drives) has increased 2 million-fold in the same period
  • Price has dropped to 1/27,000th of the cost

With storage arrays that can hold up to 2000 drives, we can see that I/O isn’t keeping up with capacity even for the fastest hard drives on the market today.  The problem is exacerbated when we look at high capacity SATA drives, currently pushing 3TB and set to go higher very quickly.  These drives have slower spin speeds and less throughput, meaning RAID rebuilds have to be counted in hours (and potentially days) rather than minutes.  This extended rebuild time has a number of implications:

  • There is a greater risk of data loss, as rebuilds are taking substantially longer
  • There is a greater performance impact as rebuilds impact host I/O capacity

There is one other problem that also can’t be ignored and that’s the chance of a non-recoverable bit read error – that is, the chance that your data cannot be re-read from disk.  This error is recoverable with RAID, but not if  RAID recovery is taking place, as this failed read may be essential to rebuild missing data.  As an example, the latest Seagate Constellation drives have a non-recoverable bit read error rate of 1 bit in 10E15 (in fact the failure is reading a whole sector).  A 1TB drive has 1E13 bits.  If we have to recover a disk by reading the whole RAID stripe (imagine it consists of 10 disks) then we have a 1 in 10 chance that we will be unable to recover some of that data.  This risk doubles to 1 in 5 with 2TB drives and so on.  As we reach 8-10TB drives, rebuilding the entire RAID group in this instance means we are almost always likely to fail to recover some data.  These values will change with different RAID group sizes and disk capacities but we’re pushing the technology close to the edge at this point.

The Solutions

We’re already seeing some workarounds to the RAID scalability problems.

  • Block-based RAID – these architectures change the way in which RAID is implemented to a block rather than whole disk level.  So, if a drive has a partial failure, only the failed areas of the disk are rebuilt.  In addition, if the RAID group isn’t full, the system doesn’t spend time rebuilding white space.
  • Failure Prediction – Storage arrays like Hitachi’s USP & VSP use SMART to predict when a drive looks likely to fail.  That drive is then “soft failed” and the data copied off it while it is still working.  This means data can simply be moved to another drive, rather than rebuilt from the other disks in the parity group.  This has a significant impact on improving recovery time and reducing the impact on performance, but can be more expensive in drive costs and maintenance.
  • Data Distribution – The IBM XIV storage array distributes data across many drives.  This has the impact of dramatically improving the time taken to recover failed disks as all drives participate in recovery (and protection is only RAID-1).  Of course the tradeoffs are both the increase in cost and the still possible risk of a double disk failure, which will impact every single LUN on the system.

Whilst the above solutions are good, I believe we need to see more from the drive manufacturers themselves.  Ultimately the reason RAID has become a problem is due to reliability and RAID rebuild times.  We need a new approach to the way host I/O and rebuild I/O is prioritised and managed by a drive.  For instance, with the ability to put large amounts of flash into a drive, flash could be used as a repository for RAID reads.  As a drive executes normal read/write operations, it caches any data needed for requested RAID rebuilds into flash.  This is then made available to the RAID controller via a separate channel to perform the RAID rebuild.  Effectively data is rebuilt on a failed drive only as that data is read/written from the original drive; any unaccessed data is rebuilt as a low priority task.  If this approach was coupled with the ability to highlight a failing sector (and so recover that first) then reliability improves.  This idea is only one thought; I expect extending RAID’s useful life will follow a similar path to that of the increase in drive capacities; lots of incremental improvements that over time move things forward.

RAID isn’t dead, merely evolving to meet new challenges.  In 20 years’ time I suspect RAID will still exist but will be barely recognisable from the original Berkeley paper.

Portakabin Modular Construction

About Chris M Evans

Chris M Evans has worked in the technology industry since 1987, starting as a systems programmer on the IBM mainframe platform, while retaining an interest in storage. After working abroad, he co-founded an Internet-based music distribution company during the .com era, returning to consultancy in the new millennium. In 2009 Chris co-founded Langton Blue Ltd (www.langtonblue.com), a boutique consultancy firm focused on delivering business benefit through efficient technology deployments. Chris writes a popular blog at http://blog.architecting.it, attends many conferences and invitation-only events and can be found providing regular industry contributions through Twitter (@chrismevans) and other social media outlets.
  • Pingback: Fourth-generation Auto Infotainment Architectures Put The Consumer in Charge | Visual News

  • Pingback: Tweets that mention The Storage Architect » Blog Archive » Is RAID Dead? No – Next Question -- Topsy.com

  • http://www.cinetica.it/blog Enrico Signoretti

    I agree 100% with Chris, but i would like to add my two cents.
    In the recent past I wrote an article ( http://www.cinetica.it/2010/11/15/the-next-unified-storage/ ) about the possible (futuristic? :-) ) evolution of the so called “unified storage” (a storage capable to deal with blocks and files at the same time). I think that in the future we will see the firsts implementations of a more unified class of storage arrays capable to deliver service for blocks, files and objects too.

    If this kind of arrays will appear I think that we will see different kind of protection for different kinds of data:

    * structured data: (usually stored on DBs), will be positioned on high performance and high reliable RAID protected blocks striped in small and high speed disks (potentially SSDs).
    * not structured data: (usually stored in medium sized filesystems) will be protected with multiple RAID parity on SAS/SATA/Hybrid disks.
    * objects (data + metadata): will be protected with a whole object replication on very inexpensive big SATA disks.

    To achieve this goal we will need a strong scale-out virtualization layer at the base of next generation storage architectures but I think that some vendors are still looking in this direction.

    ciao,
    Enrico

    • http://www.brookend.com Chris Evans

      Enrico

      I agree I think we will see a change to a more unified protocol; perhaps pNFS takes us part the way there. I’d take a step further than you’ve said and convert your specifications to a service definition;

      Structured – High availability & performance
      Unstructured – Good availability & performance
      Object – reasonable availability & performance

      Obviously the definitions are a bit generic, but the idea would be for the array to meet whatever the definitions of availability and performance are, in whatever manner it chooses. So you don’t need to know whether it’s RAID-1, 5, 6 or JBOD. The system configures the disk to meet the service requirements. Sounds like nirvana I know, but ultimately service levels are the key rather than hardware specifics.

      Chris

  • http://infosmackpodcasts.com/ Greg Knieriemen

    Chris:

    Your statement that “… the Wikibon discussion was a good way to introduce a “RAID beating” technology to the market” was spot on as is your analysis of RAID. Cheap technology pimps like to use provocative terms to help promote and market companies that pay them retainers and subscriptions to do so and ultimately lack the real depth that these challenges present. I’m glad you side-stepped the proactive to peel back the real issues and concerns around RAID.

    RAID is indeed evolving but perhaps not as fast as disk drives. I don’t know if the solution lies with the drive manufacturers as much as it lies with the controller manufacturers (and ultimately mass storage vendors).

  • Pingback: Is RAID Dead? | Storage CH Blog

  • http:www.cleversafe.com Julie Bellanca

    Great article Chris.

    A few thoughts to add to the discussion -

    For your solutions, Information Dispersal should be on your list. Information Dispersal leverages Reed Solomon encoding, and spreads data across multiple storage nodes (versus within a node like RAID). You can then place these nodes within a single data center, or across multiple data centers.

    When across multiple data centers, you can eliminate replication making the solution more cost-effective for long-term archives and content depot repositories. Notice I didn’t say transactional / low-latency as the network will bind performance. However, network speeds and bandwidth costs are both improving faster than Moore’s law. (Butters’ Law of Photonics: the cost of transmitting a bit over an optical network decreases by half every nine months).

    A typical information dispersal configuration is 10 of 16, meaning, slice the data into 16 with any 10 needed to recreate the data. This approach tolerates up to six simultaneous failures which addresses the risk of data loss during a rebuild. If a system size needs to tolerate more than 6 loses (i.e., multi-petabyte?), dial up the configuration to 20 of 32 and tolerate 12 losses.

    We also envision the definition of rebuilding to extend. Rebuilding is the set of processes that support the goal of keeping data accurate and reliably stored. Therefore we should expand this definition to include such things as disk scrubbing, integrity detection, proactive scanning for missing, outdated, or inconsistent data, and the correction of those errors. Not merely the replacement of failed disks.

    At the end of the day rebuilding is all about keeping data in a reliable state. If a system can tolerate 4, 8, or 10 simultaneous failures, then rebuilding as fast as possible is no longer necessary to keep the data reliable. Therefore rebuilds can be run at a rate which will not adversely impact the performance of normal I/O operations.

  • http://www.networkcomputing.com Howard Marks

    Chris,

    While we’re in general agreement I don’t see how your idea of an additional I/O channel for rebuild data would work.

    First there’s only one active head on a drive so no way to get data other than the current track off the drive and out the “rebuild channel”.

    With that limitation the RAID controller should be smart enough to rebuild the replacement drive as it reads data from the rest of the RAIDset as opposed to reading the whole raidset in order. Even with a rebuild channel that would be required as drives won’t have enough cache to hold rebuild data for hours.

    Then the only data that might be in drive cache the RAID controller hasn’t already seen, and used for rebuild, would be read-ahead cache data that turned out to be unused. The drive would have to have a way to tell the controller that data was available making the third channel (SAS and FC already have two and this enterprise only feature would never make it’s way to SATA) complex and expensive for the limited improvement in rebuild.

    3Par/Compellent like architectures that use data chunks not drives as the unit of redundancy make more sense to me.

    – Howard @deepstoragenet

    • http://www.brookend.com Chris Evans

      Howard

      Perhaps I need to think through my ideas in more detail and do an updated post with graphics. Thanks for the comments.

      Chris

  • Pingback: Get the Best Data Recovery in New York | Article Directory 

  • http://www.starboardstorage.com StorageOlogist

    It is always worth revisitng these conversations and looking back on what has transpired. I would argue that Chris’s comments have been bourne out in the market. I recently wrote a more current perspective on the death of RAID in the Starboard Storage Systems blog http://blog.starboardstorage.com/blog/bid/148658/the-death-of-raid

    Lee Johns
    VP Product Management Starboard Storage

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×