TSM - Freeing up Old Tapes

These links lead to sections in the text below.


Its not unusual for TSM to run out of tapes because of the way it is engineered. That can make it vital that tapes are returned to the 'scratch pool' for reuse quickly. So why does TSM seem to hang on to empty tapes?

Reuse delay

If a tape has no active files left on it, but it still does not become scratch then the issue might be the reusedelay parm on the copystoragepool.

A tape is not necessarily released for scratch straight away. Imagine the worst has happened, you've had a disaster and have had to restore your database back 48 hours. If tapes have been reused in the past 48 hours, then the database will not be accurate, data will be missing. To prevent this, you have a parameter called REUsedelay. This specifies the number of days that must elapse after all files are deleted from a volume before the volume can be rewritten or returned to the scratch pool. The default value for this is 0, but it may have been set to 5, say, to avoid problems with database rollback. That's one reason why tapes do not get recycled quickly.

If you add new tapes to a tape library, then they must be checked into the TSM server before it will recognise them as scratch tapes. However a tape must have a magnetic label before it can be checked in. Use the LABEL LIBVOLUME command to do this. Most tapes are supplied pre-labelled these days.

If you run out of scratch volumes, have some spare tapes and empty library slots, then load the tapes into your library then enter the command

CHECKIN LIBVOL SEARCH=YES STATUS=SCRATCH CHECKL=BARCODE

back to top


Tape Mount and Dismount issues

TSM will keep adding data to a 'filling' tape until it is full. However, it will sometimes mount a scratch tape even if there is a 'filling' tape available for that node. This is because TSM will not wait for a tape that is currently dismounting. The logic is that it is faster to ask for a new scratch tape than to wait while a filling tape is dismounted, stored, retrieved then remounted. There is no easy answer to this feature, except to juggle your KEEPMP and MOUNTRetention values to minimise the risk.

back to top


Decomissioned Nodes

TSM backups are marked for expiration on the next backup run after the data expiry date is reached. Say you are backing up a Windowx server with a 30 day deletion management class, and the server is decomissioned. You remove the client from the TSM schedules and wait 30 days. However, the data will never be expired automatically as you need to run a backup to expire the data. You need to run a 'delete filespace' command manually for that client to remove the backup data from TSM and free up the backup storage space.

If you wanted to do something at the time the server is decomissioned, to retain the data for a while and then have it deleted automatically, you could create a Backupset of the client data and give the Backupset a specific expiration time and then delete the filespaces and remove the node. This would retain a copy of all the active (non-archive) files that are from the client for a set period of time while still freeing up server and database space used by the decommissioned node. Should there be a requirement to restore a file, you could restore the file from the Backupset.

back to top


Maxscratch parameter

The name of parameter can be a bit confusing, as it limits the total number of tapes that an individual storage pool can contain, not the total number of scratch tapes in the system, or even the number of scratch tapes in that pool. If your tape pool processing starts failing with insufficient space errors, then one cause can be that the maxscratch limit has been reached. You may have plenty of scratch tapes in your library, but TSM will not use them for that tape pool.
Issue the command QUERY STGPOOL FORMAT=DETAIL and compare the value of 'Maximum Scratch Volumes Allowed' with 'Number of scratch volumes used' fields. The first parameter has to be higher than the second one, or the MAXSCRATCH limit ill be reached and backps will fail, even if free scratch tapes exist.
To fix the problem, use the UPDATE STORAGEPOOL poolname MAXSCRATCH=nnnnn command

back to top


How to find out how many scratch tapes exist

The q vol command will only give you information about storage pool volumes, and so does not report on scratch volumes as they are not associated with s storage pool. You need to use the following SQL

select count(*) as Scratch_count from libvolumes where status='Scratch'

back to top


TSM thinks a tape contains data, but it is empty

You have a CopyPool volume, which is EMPTY and OFFSITE, but the tape does not change to scratch as normal. You cannot move the data off the tape because it is empty. You cannot delete the tape, because it contains data, not even with the discard data option. The tape needs to be audited, but to do this it must be on-site. recall the tape to your site and run an 'AUDit Volume VolName Fix=Yes'.

back to top

Altering the MOUNTABLE state

A volume is empty, but is not in scratch status because the volume STATE is mountablenotinlib. To change the STATE of the volume use the command

MOVE MED vol_name STG=pool_name WHERESTATE=MOUNTABLENOTINLIB

This will move the volume back into the scratch category

back to top


Expire Inventory

Expire Inventory deletes unwanted backups from the TSM catalog and marks the backups on tape as expired. Its best to run expire inventory daily. Once the data is expired, then reclamation can release partly used tapes. To do this, schedule the following command

EXPIRE INVENTORY

Its best to do run expire inventory at a time when no, or few backups are running, as it waits when it hits a filespace that is being backed up, and will hold onto the recovery log, which can cause the log to fill up. You also want to avoid running EXPIRE INVENTORY alongside your TSM database backups.

EXPIRE INVENTORY has two undocumented and unsupported parameters, BEGINNODEID=nn and ENDNODEID=nn where nn are decimal node numbers. These can be used to limit the amount of work the process does but please note, these parameters are UNSUPPORTED, you use them at your own risk.

The only really good reason to run a limited EXPIRE INVENTORY is if you've just deleted lots of data from a node and would like to get it deleted quickly from the database. The problem is that you need to find out the node number of the filespace you just deleted the data from.

The only way I know is to use undocumented SHOW commands. You need to start with the object number for the NODE table, then drill down the b-tree to find the Node number of the file space that you are after.
Start by using SHOW OBJDIR to find the node number of the Nodes table. This is normally 38.
Then use SHOW NODE 38 to see the top level tree structure of the NODE table.

SHOW NODE 38

(SERVVM57)
<-    Subtree=<8344593>
Record 9 (KeyLen=9, DataLen=4):

Then finally do a SHOW NODE number for the subtree that contains the node you are interest in.
In the SHOW NODE against the subkeys, the KEY is the NODE_NAME. Field 1 is the node number and field2 is PLATFORM_NAME.

SHOW NODE 8344593

Key:
->(SERVVM3V)<-
   Data: ->(00000134)(WinNT)(plus lots more fields..)

So from that you see the the client node that you are interested in, SERVVM3V has a node number of 134.
Roger Deschner has pointed out an error in the original text of this page 'The node numbers you get out of the database with the SHOW NODE command are in hexadecimal. However, the node numbers you specify on the EXPIRE INVENTORY command must be decimal. You've got to convert them from hex to decimal yourself. To expire node SERVVM3V you must specify node number 308 (decimal), rather than 134 (hex)'.
So using the correct decimal number, to just expire data from that node you would use the command

EXPIRE INVENTORY BEGINNODEID=308 ENDNODEID=308

You should always use the CANCEL EXPIRATION command to cancel expire inventory, as that will terminate the command cleanly. Next time you run the transaction, it will start up from where it left off. If you want to start expiration again from the beginning use EXPIRE INVENTORY RESTART=NO.

You may notice that when the expire inventory completes, the report on the number of objects inspected is less that the number of objects in the database. This is because TSM optimizes the examination process to check which objects are ready for expiration, so it does not need to check them all.

If the expiration process runs and processes a node with a large amount of grouped objects a query process can show no progress, which suggests that the expiration has hung. This may ne be the case, as during expiration and deletion of grouped object, the amount of data processed is only updated after processing has completed for the whole group. This means that a 'query process' can report the same number of examined objects and deleted objects for quite some time.

It is possible to investigate further, but to do this you need to log onto native DB2. While the expiration process is running, take a note of the node it is currently processing, then

Logon to DB2 and run the following commands :

db2 connect to tsmdb1
db2 set schema tsmdb1
db2 select nodeid from nodes where nodename='NODENAME'

where you replace NODENAME with your node in uppercase. Take a note of the returned nodeid, then run the command below, replacing NNN with your nodeid.

db2 "select groupid,count(*) from backup_objects where nodeid=NNN group by groupid"

The result will be two columns of numbers, where the first column is the Groupid, and the second column is the number of backup objects. If you then rerun this command several minutes later, you should see the backup objects count decreasing for one of the groups, which means that expiration processing is running as designed.

back to top


Reclamation

Reclamation copies half empty tapes onto empty new tapes to consolidate the data and free up tapes. You probably want to control the times iwhen reclamation can run, as it will use 2 tape drives. The reclaim parameter relates to the amount of free space on the tape, and the IBM reccomendation is to set it to 60. There's not much point in making it less than 50 as that would just copy 2 old tapes to 2 new tapes and not achieve anything. Schedule command

UPD STG cartpool RECLAIM=60

to start it, and switch it off again with

UPD STG cartpool RECLAIM=100

If you run command Q VOL F=D, this will show how many tapes have reclaimable space of 40% or more, and will be processed. The numbers on this command don't necessarily add up. Estimated Capacity (MB) shows what is the apparent full capacity of this tape devclass. This may be much lower than what can actually be put on tape. Pct Util shows how much of that estimated capacity is currently used. Pct. Reclaimable space shows how much of the space that was occupied by data is now free, not the percent of the ACTUAL capacity of the tape that is free.

back to top


Delete Volhist

TSM retains various types of tapes apart from client backups. These tape volumes will have a status of PRIVATE, but do not show up with the command: Q VOL volume_name as QUERY VOLUME will only return information about volumes that belong to stgpools. The volume history keeps a record of all volumes and you can display all these non-stgpool volumes with the following commands:
q volh type=dbb
q volh type=dbs
q volh type=export
q volh type=backupset
q volh type=remote
The three types which can clog up your system are Database Backups, Snapshots and Export volumes. If you are running DRM, then you set a parameter called DRMDBBACKUPEXPIREDAYS and that determines how long you keep database backups. Database backup tapes are then expired when you run your DRM scripts.
If you use DRM, do not use the DELETE VOLHIST command below as that will interfere with DRM records. If you don't run DRM then database volumes will not expire as they do not have any retention settings. You have to run them manually with a delete volhist command. The best way is to schedule a script to run that executes a command

DELETE VOLHIST TYPE=DBBACKUP TODATE=TODAY-n

where 'n' is the number of days you want to keep your backups. Don't delete them all! If you run database snapshots then you need a similar command

DELETE VOLHIST TYPE=DBSNAPSHOT TODATE=TODAY-n

If you run client backup exports to tape, maybe to move client data between TSM servers, then those tapes also have no way of automatically expiring as the tapes do not have a retention parameter. For exports, you might want to delete them all using TODATE=TODAY, or you might want to retain the last few days tapes. The command to clear them out would be

DELETE VOLHIST TYPE=EXPORT TODATE=TODAY-n

If you have a PRIVATE volume that is not part of a stgpool and does not display in any of the above Q VOLH commands above then you can set it to scratch using the command:

UPDATE LIBVOL library_name vol_name STATUS=SCRATCH

back to top


Orphaned Volumes

Another issue that can cause empty tapes to get stuck in the system is if a scratch volume is deleted from the storage pool but is not automatically relabeled. This can happen if there is a problem with the VTL tape library and/or VTL tape drives. If this happens, you might see errors like

ANR8758W The number of online drives in the VTL library (libraryname) does not match the number of online drive paths for source TSM.
ANR8840E Unable to open device <device> with error number 78 and PVRRC 2839.
ANR8311E An I/O error occurred while accessing drive (devicename) for OFFL operation, errno = 16, rc = 1.
ANR8779E Unable to open drive <devicename>, error number= 5.

The normal process when you have the RELABELSCRATCH=YES option enabled is for the server to overwrite the label for any volume that is deleted and then to return the volume to scratch status in the library. As part of this operation, volumes are checked out of the library and then checked back in with an immediate LABEL LIBVOLUME command. If the RELABEL operation cannot obtain a drive or fails to relabel a volume for any reason, Tivoli Storage Manager will retry to relabel the volume on each future RELABEL attempt until a RELABEL operation is successful. However the list of volumes that were not successfully relabeled is kept in memory and this list is cleared when the Tivoli Storage Manager server instance is stopped and restarted. Therefore, on a server restart, the server will not relabel volumes that were not automatically relabeled prior to the restart.

If this happens, then you will have 'orphaned' scratch volumes in your tape library. If you think you have this problem, you can follow this process to find and reclaim any orphaned tapes

Audit the library inventory with the following command

audit library checkl=barcode

Verify the list of tapes currently in the library with the following command

show slots

Verify the list of tapes currently checked in with Tivoli Storage Manager with the following command

query libv

Compare the outputs between step 2 and step 3 and relabel the scratch tapes that are not checked in with the label libvolume command.

label libvolume search=yes labels=barcode checkin=scratch vollist=xxx overwrite=yes