Home | Uncategorized | Enterprise Computing: The Long Term Future of Tape

Enterprise Computing: The Long Term Future of Tape

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×

It’s funny how a small comment made in a blog post strikes a note with people in different ways.  In this post on the potential Sun acquisition by IBM, I made the comment “tape doesn’t have a long-term strategic future in anyone’s business”.  D_Ced picked up on this and questioned me about it (see comment). Let me explain…

I’ve been involved in managing tape environments for over 20 years.  I used everything from reel-to-reel 3420 tape drives, right up to today’s fastest LTO4.  What’s obvious from my experience is that tape gets used for two primary needs; data loss and archive. 

Data Loss

When disk was expensive, tape was the only way to restore lost data.  By today’s standards data quantities were tiny; restoring from backup was also in the operational process and (certainly on mainframe environments) was integrated into the I/O architecture so datasets on tape were effectively accessed the same way as datasets on disk.  However, over time, we’ve started storing massive quantities of data on tape.  The technology has improved to assist in managing the growth; we’ve incredible capacities on LTO4 tapes today and robotics and tape library automation means thousands of individual tape cartridges can be stored and accessed in a completely automated fashion.

Archive

As regulatory regimes have changed and as organisations have become highly dependent on electronic forms of data, then the need to retain this data at instances in time has become paramount.  For years, backups have been sequestered for this purpose, retained long after the need to keep the data on tape for restoring has passed.

But tape, whilst being portable and compact, has problems.  Here are some of them that can be found in probably all large organisations today:

  • Legacy Tape.  All companies will have tape data across multiple device types, including DAT, LTO, DLT, DDS, 3480, 3490 and many, many more. 
  • Large Historical Span of Data.  Data on tape will go back years, if not in some cases being stored indefinitely.
  • Large Volumes of Replicated Data. The same full backup will have been taken on servers week-in, week-out.  A large proportion of those files will remain unchanged.
  • Unidentifiable Data.  Lots of tapes in the enterprise which have lost their labels, or don’t have sufficient documentation.
  • Lack of Hardware Support.  Many tapes are still being retained for which no tape drive or backup environment exists.
  • Multiple Backup Server software.  This can be standalone or network-based software; most are incompatible with each other.

The historical nature of the backup process means that most data on tape represents an image or snapshot of a server or data from a specific time point.  The sequential nature of tape means that images are kept separately and duplicate data isn’t removed or simply re-referenced, as would be simple in a disk-based system.  As we continue to see data growth and increased rigour in retaining archive copies, then something needs to change.  The process of writing the same data to a sequential medium doesn’t scale over the long term, especially as that data is never ever refreshed onto new technology platforms.

I think we need a number of things:

  • Tools which can interrogate existing tape media and backup software databases and transfer those backups into either a more current version of the same backup product, or ideally, any backup product.  This can help deal with the legacy backlog.
  • A consistent methodology for referencing backup data; this needs to operate at multiple levels – the server, file and block level.  The schema needs to be able to cope with point-in-time images of each of the levels and to be able to accurately identify when two or more objects being stored are the same and therefore don’t require storing.
  • The splitting of backup and archive into separate functions.  Archive should become part of application design; backup is retained as an operational need, but should be tied to the recovery requirements of the application.

If we start storing only the backup and archive data we actually need, then so much more can be retained on disk (or dare I say it, in the cloud).  After all, having it on a random-access medium is always going to be superior.

About Chris M Evans

Chris M Evans has worked in the technology industry since 1987, starting as a systems programmer on the IBM mainframe platform, while retaining an interest in storage. After working abroad, he co-founded an Internet-based music distribution company during the .com era, returning to consultancy in the new millennium. In 2009 Chris co-founded Langton Blue Ltd (www.langtonblue.com), a boutique consultancy firm focused on delivering business benefit through efficient technology deployments. Chris writes a popular blog at http://blog.architecting.it, attends many conferences and invitation-only events and can be found providing regular industry contributions through Twitter (@chrismevans) and other social media outlets.
  • Pete Steege

    I hear alot of talk about people storing less or pruning their data, but I’m skeptical that there’s been any change in behavior.

    It reminds me of the national savings rate. We all know we should save more, but nobody does.

    Chris, I’d like to see if any of your readers (or you) have any stats that show a trend towards people actually pruning more of their data?

  • http://www.idgt.org Ray Neoh

    Hi Chris,

    I think you are correct in your assumption. We are in a prcess of selecting an off-line backup system for storing HDvideo that is being digitized.

    People try to sell us tape for backup said it is cheaper and more secure than using hard disk.

    We are in SIngapore, to store tape I need special environment and climate control. Hadr disk I don’t have to.

    I am planning to use 2TB eSata drive ( which cost less than US$200 per drive), we use lossless data compression (5:1), and all drive are encrypted for protection. Stored off-line.

    What do you think? Should we use tape or Hard disk. I prefer Hard disk for my situation. Like to hear from you.

    Regards,
    Ray neoh

  • ced

    Hi Chris,

    I overlooked your article. Sorry for this late answer.

    First, i do consider like you archive and backup as two separate functions (with their own supporting software tools) even if the underlying technologies (disk or tape) ‘might’ be the same.

    For the backup function, i do consider all these questions of legacy tapes & hardware support a non issue because after one or two month (depending on your backup policy), you’ve migrated all your assets into the new media/drive. We(in my organisation) have also solved the question of tapes lost as we don’t do any tape offloading. The archive function is another ball game. If i look a bit on your bullet list, lots of them points to data migration. For me, i do consider this action as part of any archiving tools and you will need it whether it is tape or disk based archiving system (DO you agree on this ? )

    For the Large Volumes of Replicated Data, you’re right. Dedup on tape is inherently difficult to implement. Nothing to say about that except that i do not buy the current approach from my VTL vendor for the dedup.

    I’ve done the some CAPEX math and tape are way way much cheaper compared to VTL solutions. It’s really impressive. So i’m considering a mixed approach, VTL for the primary to destage on local & a foreign site.

    Ced

  • Chris Evans

    Ray

    Makes sense to use disk where you’re likely to use the data again and need it quickly. I once tried to establish how quickly HDDs deteriorate compared to tape – i.e. come back and read an HDD 2 years after it was written and what’s the loss? I never found anything out. That metric would be interesting to see how it compares to tape.

    Chris

0 Flares Twitter 0 Facebook 0 Google+ 0 StumbleUpon 0 Buffer 0 LinkedIn 0 0 Flares ×