Virtual Tape

What is Virtual Tape?

There are two fundamental types of virtual tape

  1. Tape Virtualisation or Virtual Tape Augmented (VTA); where the applications write data to virtual volumes on virtual drives in a disk cache, and this data is later consolidated and moved off onto real tape volumes on real tape drives.
  2. Tape Elimination or Virtual Tape Elimination (VTE); where the applications write data to virtual volumes on virtual drives and this data is stored permanently on disk. This type of virtualisation is often combined with data de-duplication.

Some people claim a third type, Virtual Tape Hybrid (VTH), where you have a disk only solution, with the option to add a tape library with real backend tape drives if you want them.

The origins of Virtual Tape

Virtual tape was originally designed to fix some of the issues with IBM mainframe tapes. Mainframes traditionally use tape for all kinds of data; backups, large GDGs, 'ML2' and DFHSM migrated data. While a physical tape can typically hold several TB native, it is quite hard to fill one of these tapes with mainframe data, unless you have a specialist application such as DFHSM or TSM which are designed to pack lots of small files onto a big tape. Several products were introduced to consolidate and stack tape datasets, but these usually required user intervention.

Many mainframe files that are held on tape are live data, as opposed to backups, and unless you make sure that all the files that you store on a tape are going to expire on the same day, then a tape that was initially 100% full of required data will steadily become less full as time goes by. This is because as files are expired, they leave 'holes' on the tape and unlike disk, these holes cannot be filled with new data. As time goes by, the tape will hold less and less active data but it cannot be scratched until the last active file is expired.

Open systems Problems with Physical Tape

By contrast, Open Systems operating systems tend to just use tapes for backups and as they are backing up large disks, they can fill tapes reasonably easily. Way back in time when servers were stand alone, each server had its own DAT tape drive that was used for backups, with all the resulting problems of physically changing tapes and taking them off site for disaster recovery. SAN connectivity allowed us to share tape drives and automate the process with tape libraries. However there were always points in the day when you did not have enough physical drives to cope with your workload, and other times when you were hardly using any drives. It does not make economic sense to buy lots of physical tape drives to manage peak demand, if these drives will be unused for most of the day. A VTL allows you to define lots of virtual drives at no extra cost.
In a disaster you need all your data in your recovery site, including your tape backups. The only safe way to do this is to duplex the data as it is written, but duplexing tape can be difficult, unless your application provides duplexing facilities but most applications will not create two copies at write time, you have to copy the tape later.

One reason for installing Vtape is to increase recovery speeds, but if this is your objective, then make sure the solution you chose has enough power and bandwidth to do that. In particular, check that the network bandwidth and the input channels to the VTL can handle your requirements. Generally speaking, it is as fast to get data off a physical tape as it is from disk. The problem is that if you use a VTA solution, then before you can start to read the data, the tape has to be located and mounted in a drive, then wound forward to the correct position. This does not take as long as it used to, but it will still add to your restore time.

We have requirements to keep certain data for long periods of time. Physical tape might seem to be the ideal medium for these files, but unless you try reading them from time to time, how do you know the data will still be intact?

Virtual Tape Solutions

Virtual tape solves many of the problems with physical tape, as explained in this page set.
The tapes are virtual so it does not matter how much data is on them.
The drives seen by the servers are virtual so you can have hundreds of them.
If you have a physical tape solution, then the physical tapes are recycled regularily, which gets rid of any holes in them and ensures that they can be read. Holes are not an issue with a disk based solution as they can be re-used.
Offsite tapes can be duplexed at source, but see the mirroring page for details. Recovery from VTE systems is fast, and it can be fast from VTA solutions too if the data is still in the disk cache.
NAS support allows vtape to use CIFS and NFS protocols,so Oracle can dump direct to tape without needing intermediate software.

However, virtual tape introduces new issues. One of these is the effect of deduplication, if it is post process, as when dedup is running the data might not be accessable. So if you need to do an urgent restore and dedup is running, you might need to wait for it to finish. You should check this with your vendor.

The future of Virtual Tape

Does Virtual Tape have a future? Some say that the integration between backup products and de-duplication makes it more cost effective now to store backup data permanently on cheaper SAS or SATA disk, especially on disk subsystems where hard drives can spin down when they are not being read, as this saves on power consumption. The argument is, why use a VTS to make a hard drive emulate tape, when the data can simply be stored on disk using more efficient block sizes? Most backup products now support disk to disk, or D2D backups with de-duplication. This means that a VTS that just uses disk storage is a waste of time. This is a compelling argument, but there are no immediate signs that disk based VTS devices are disappearing. In fact, pure disk based VTS devices make a lot of sense for a small to medium business.
The advantages are less clear for large enterprises as they tend to have ferocious growth rates. It is relatively easy and cheap to cope with this by adding more physical tape slots to a library, but more expensive to add several extra terabytes of disk storage. Also, VTLs are designed to cope with large amounts of data and this takes the management strain away from the rest of your hardware.

The Cloud has also changed the picture, as many companies use the Cloud for off-site backups, which gives you the double benefit of having the data off-site, while still accessable for restores. That data could well be stored on virtual tape systems, but as you have handed it over to someone else to manage, the physical storage media is not your problem.

Another emerging strategy is to recognise that backups and archives are two different solutions to different problems. Backups tend to be short retention, say 30 days, while archives are long retention, measured in years. Disks are not really suitable for long term retention, tapes are arguably a better solution. So a workable strategy could be to used D2D for backups and tape based VTS for archives. A clear advantage of VTS over native tape here is that the physical tapes can be recycled regularily, so proving that they can be read, and faulty tapes can be replaced if they are duplexed.

IBM is now testing a 'flape' solution, a box which contains only flash and tape storage media. Flape is sort of the ultimate data tiering solution, where 'hot data' that needs the best performance and speed of access, is held on the fastest storage medium, Flash disk, while 'cold data' is held on tape, which is cost-effective, has high capacity, and uses less power.
IBM considers that one application for tape could be for video streaming, where the first 2 minutes of the video are held on Flash, giving instant access to the video while the rest is kept on tape, so the initial data is served from flash, but that 2 minutes gives ample time for the tape to mount and serve the rest of the data. The same principle can be used for many other 'big data' applications, where hundreds of terabytes of data needs to be stored, but only a small amount is used at a time.
IBM claims that the cheapest way to store large amounts of data is to use an on-premise Tape solution as the TCO is even lower than cloud storage. The combination of tape and flash combines the requirement for fast access and cost effective data storage.

back to top

Lascon updTES

I retired 2 years ago, and so I'm out of touch with the latest in the data storage world. The Lascon site has not been updated since July 2021, and probably will not get updated very much again. The site hosting is paid up until early 2023 when it will almost certainly disappear.
Lascon Storage was conceived in 2000, and technology has changed massively over those 22 years. It's been fun, but I guess it's time to call it a day. Thanks to all my readers in that time. I hope you managed to find something useful in there.
All the best