TSM Tape Drives and Tapes

These links lead to sections in the text below.


TSM supports a wide range of tape devices. Take a look at this link for a device list, along with device drivers.

Finding faulty tapes

Follow these steps to identify and fix faulty tapes.

List unavailable tapes

q volume access=readonly and q volume access=unavailable

This will give you list of tapes that have been put in this state, probably because the system has identified an error. However, a tape will be marked as unavailable if TSM tried to mount it and it is not in library.

Look for tapes which have errors

select volume_name, read_errors,write_errors from volumes where (read_errors > 0 or write_errors > 0)

This will give a list of tapes that have reported read and write errors. If you have a lot of these, consider upping the thresholds so you can concentrate on tapes with a lot of errors first.

To fix a problem, run the audit command

AUDIT VOLUME volser FIX=YES

If a part of a tape is faulty, the audit command will try to fix it. If cannot fix a problem file, and a copy exists on another tape, then you need to use the RESTORE VOLUME command. If there is no copy, then the AUDIT command just deletes the entry from the database.

If the tape is hopelessly trashed, and you do not have a copy, the only answer the 'delete volume discard=yes' command. However, its always worth trying a MOVE DATA command first to see if you can rescue something from the tape.

back to top

What happens if a tape is accidentally overwritten by another application?

If you have a copy pool, you can restore the tape, otherwise you have to tell TSM to throw the data away. If your copy pool is taken offsite, then you need to know which tapes to bring back for the restore.
To find the tapes, use the command

restore volume volname preview=yes

and look at actlog after this process finishes. It will show you all the copytapes needed to recreate the primary tape. Get all these tapes back from your offsite copy group, and run the command again without the preview=yes. The old tape will be marked as destroyed and the data copied to a new tape. The old volume will then be deleted once all the data is restored.

To discard the data use the command

DELETE VOL volname DISCARDDATA=YES
  or if that fails
AUDIT VOL volname FIX=YES

and the active data will be backed up again on the next run. Of course, if these are older backup versions, then they are gone forever.

back to top

Investigating tape drive problems

Problems with tape drives can be a bit tricky to track down, as the issue could be with the TSM definitions, the configuration within the operating system, or with the physical hardware. This section details ways to narrow down the source of the problem, and deals with both AIX and Windows TSM servers.
Some basic questions to ask are :-

  • Are the libraries, drives and paths online to TSM?
  • Are the device drivers for the libraries and drives at the correct level?
  • Are the drives listed as available to the operating system?
  • Does the DB2 instance owner have read/write access to the drives?
  • Can you use O/S device utilities to access the drives?
  • Are there any error messages or failure lights on the hardware?

If all else fails, consider reconfiguring the library and/or the drives in both TSM and the O/S and see if that resolves the issue.

Are the Drives / Paths online?

At the TSM Server, check that the drives and paths are online with these commands

q drive
q path

If any drives or paths are not online, use these commands to update them, substituting your own server, library and drive names.

update drive lib-name drive-name online=yes
update path server-name drive-name srct=server destt=drive libr=lib-name online=yes

In AIX, to check that the db2 instance owner account has read/write access use the command

# ls -l /dev

For example:
# ls -l rmt[0-9]
crw-rw-rwT 1 root system 40,192 Mar 08 14:02 rmt1
crw-rw-rwT 1 root system 40,256 Mar 08 14:02 rmt2

External Tape Problems

It's possible that there is an external problem with the library or with drives. A good place to start on a Windows server is to check that Windows Device Manager lists the library (Medium Changer) and/or Tape Drives. Right click on 'My computer' and select 'Manage' then 'Device Manager'. If Device Manager indicates a problem then TSM will have problems.

If you cannot see any devices in device manager, then right click on the server name in the right side display and select 'Scan for Hardware Changes'. If this does not discover the devices, try reinstalling the device drivers. If that fails then you have a problem with your cabling or SAN ports.

To check device status From AIX, the command depends on which driver you are using. For tape devices using Atape or atldd

# lsdev -Cc tape

and for tape devices using tsmscsi

# lsdev -Cc adsm

Tape devices should be listed as Available.

Device Drivers

If you can see the devices but they are in an incorrect state, it could be that the incorrect device driver is loaded.
Check the device driver version through Windows Device Manager by right-clicking on the device, select 'Properties', then 'Device Driver' tab.
You will need to check up-to-date TSM and product information to see which is the correct device driver, but generally, IBM tape devices use IBMTape except that 3494 libraries use ibmatl. Non-IBM tape drives and libraries typically use the tsmscsi device driver that comes packaged with TSM Server and Storage Agent. A non-IBM library that uses IBM tape drives will typically use tsmscsi for the library and IBMtape for the drives.

On AIX, you can display the current device driver version with

lslpp -l Atape.*
lslpp -l atldd.* (for IBM 3494 libraries)
lslpp -l tivoli.tsm.devices.* (for non-IBM libraries)

A Windows error that I came across once was that the physical tape library had been swapped out, and the entries in the Windows registry were incorrect. In this case, I deleted the registry entries then rebooted the server, letting Windows pick up the new Library and drives and then it created new registry entries with the correct serial numbers and WWNs. The exact registry entries will depend on which SCSI port and bus you use so if you think that this is a problem, you will need to check out entries that look like the example below, with the lower case 'n's your numeric value.

HKEY-LOCAL-MACHINE\HARDWARE\DEVICEMAP\SCSI\SCSIPORTn\SCSIBUSn\Target ID n\ Logical unit id n\

Make sure TSM is DOWN before you do anything with the drivers, even on the Lanfree clients (put the Storage Agent on the Lanfree client in Disabled or Manual mode until all drivers are in, return the Storage Agent to Auto when all is well).

To install the IBMtape driver, run install-exclusive.exe.
To install the tsmscsi, from Device Manager, right-click on the device, select Update Driver -> Install from a list or specific location (Advanced) -> Don't search. I will select the driver to install -> Select the IBM Tivoli Storage Manager device driver -> Continue Anyway.

Linux uses a lin_tape driver which can be downloaded from the IBM FixCentral site. Other useful products are:
lin_taped, a daemon program that can automatically create or delete special files under the /dev directory that correspond to the attached or detached tape devices.
ITDT, a utility program that you can use to exercise or test the functions of the Linux device driver.

The 'IBM Tape Device Drivers Installation and User's Guide' details all these products

Windows device names incorrect

Windows device names can change after a reboot, use tsmdlst to check them. tsmdlst runs quicker if the TSM service is down, so it's often a good idea to run it as a routine before you start the service.
Open up a DOS command line navigate to C:\Program Files\tivoli\tsm\console, and run tsmdlst
You should see output something like the following. The example below shows a single library and drive.

Use the query path command to check if paths are using the wrong devices, and if so then correct them with the update path command.
To query library paths:

q path destt=libr f=d

To query tape paths for one server:

q path server-name f=d

To update a library path:

update path server-name lib-name srct=server destt=libr device=<lb#.#.#.#> online=yes

To update a tape path:

update path server-name drive-name srct=server destt=drive libr=lib-name device=<mt#.#.#.#> online=yes

After a Windows server reboot, many people as a matter of routine will run the following process, just to make sure no problems appear. They have most of these action scripted, so it's a simple case of running the script for each step.

  • Run tsmdlst
  • Start TSM server with client sessions disabled
  • delete tape and library paths
  • delete tape drives
  • delete tape library
  • generate path definitions from tsmdlst output (various ways to do this with scripts or spreadsheets
  • define tape library
  • define library path
  • define tape drives
  • define tape paths
  • checkin scratch tapes
  • checkin private tapes
  • enable client sessions

back to top

Tape Drive Persistent Binding

When a server is rebooted, the tape drive definitions can change, and this can make the tape paths in both servers and storage agents incorrect. You can prevent this from happening by using Persistent Binding. The trick is to find an attribute that does not change at reboot time, and the tape serial number is usually the best one.

AIX

In AIX, install the IBM Atape driver. This allows you to rename the tapes in AIX to a standard that suits you, and these names will survive a server reboot. You can make these tape names match the TSM server defintions, which makes troubleshooting a bit easier.
Another option is to identify the WWN or the serial number of the tape drive and its name and then use this information when defining paths on the TSM server.
The lsdev command will tell you what drives are defined to the AIX client

# lsdev -Cc tape
3584rmt0 Available 01-08-02 IBM 3580 Ultrium Tape Drive (FCP)
3584rmt1 Available 01-08-02 IBM 3580 Ultrium Tape Drive (FCP)
3584smc0 Available 01-08-02 IBM 3584 Library Medium Changer (FCP)
vtl1rmt0 Available 01-08-02 IBM 3580 Ultrium Tape Drive (FCP)
vtl1rmt1 Available 01-08-02 IBM 3580 Ultrium Tape Drive (FCP)
vtl1smc0 Available 01-08-02 IBM 3584 Library Medium Changer (FCP)

and then you can get the serial number with the lscfg command

# lscfg -vl 3584rmt0
3584rmt0 U5791.001.992029A-P1-C01-T1-W5005076300426C02-L0 IBM 3580 Ultrium Tape Drive (FCP)

Manufacturer................IBM
Machine Type and Model......ULT3580-TD2
Serial Number...............1110027758
Device Specific.(FW)........73V1

Use the q drive f=d command on the TSM server to get the equivalent drive at the TSM side

tsm: TSMPROD>q drive f=d
Library Name: 3584LIB
Drive Name: 3580-003
Device Type: LTO
...
WWN: 5005076300026C02
Serial Number: 1110027758

So here, drive 3584-003 has serial number 1110027758, so we use that info to define the drive path for the storage agent

define path STAGENT01 3584-003 srct=server destt=drive library=3584LIB device=/dev/3584rmt0

WINDOWS

On Windows, you can get persistent binding if you use Qlogic device adaptors. In Qlogic, bring up the Fibre Channel Port Configuration dialog box, right click on Host Adapter, device or LUN in the HBA tree, then click on Configure in the drop down menu. Select the BIND box, and that will bind each port to its target ID.

Alternatively, identify the WWN or the serial number of the device and its name so that you can use this information when defining paths on the TSM server. There is more than one way to do this, but one way is use the IBM Tape Diagnostic tool.

C:\ITDT>itdt scan
Scanning SCSI Bus ...
#0 \\.\Tape4801110 - [ULT3580-TD2]-[73V1] S/N:1110027758 H3-B0-T2-L0 Changer:0000000103640401
#1 \\.\Changer0 - [03584L32]-[0100] S/N:0049663839990402 H3-B0-T0-L0
#2 \\.\Tape4801107 - [ULT3580-TD3]-[5AT0] S/N:4966383000 H3-B0-T0-L1 Changer:0049663839990402
#3 \\.\Tape4801108 - [ULT3580-TD3]-[5AT0] S/N:4966383001 H3-B0-T0-L2 Changer:0049663839990402
#4 \\.\Tape4801109 - [ULT3580-TD2]-[73V1] S/N:1110058875 H3-B0-T1-L0 Changer:0000000103640401
#5 \\.\Changer1 - [03584L32]-[8980] S/N:0000000103640401 H3-B0-T1-L1
Exit with code: 0

This shows that device \\.\Tape4801107 has a serial number of 4966383000 so you use this information to define the DRIVE path at the TSM server. First find the device name on the TSM server that corresponds to this serial number

tsm: TSMPROD>q drive f=d
Library Name: VTL1-LIB
Drive Name: VTL1-003
Device Type: LTO
...
Allocated to:
WWN: 20050000C97CBAC2
Serial Number: 4966383000

In this case, drive VTL1-003 is the drive with serial number 4966383000 and the path definition would be

define path STAGENT03 VTL1-003 srct=server destt=drive library=VTL1-LIB device=\\.\Tape4801107

LINUX

The process is similar on a Linux server, you query the udev file to find which serial numbers are associated with logical tape names.

# more /etc/udev/rules.d/98-lin_tape.rules
KERNEL=="IBMtape*[!n]", SYSFS{serial_num}=="1110058875", SYMLINK="lin_tape/by-id/tapedrive_1110058875"
KERNEL=="IBMtape*[!n]", SYSFS{serial_num}=="1110027758", SYMLINK="lin_tape/by-id/tapedrive_1110027758"
KERNEL=="IBMtape*[!n]", SYSFS{serial_num}=="4966383000", SYMLINK="lin_tape/by-id/tapedrive_4966383000"
KERNEL=="IBMtape*[!n]", SYSFS{serial_num}=="4966383001", SYMLINK="lin_tape/by-id/tapedrive_4966383001"
KERNEL=="IBMchanger*[!n]", SYSFS{serial_num}=="0000000103640401", SYMLINK="lin_tape/by-id/changer_0000000103640401"
KERNEL=="IBMchanger*[!n]", SYSFS{serial_num}=="0049663839990402", SYMLINK="lin_tape/by-id/changer_0049663839990402"

Then use the "ls" command to find the device names.

# ls -la /dev/lin_tape/by-id
...
lrwxrwxrwx 1 root root 17 Feb 11 09:52 changer_0000000103640401 -> ../../IBMchanger0
lrwxrwxrwx 1 root root 17 Feb 11 09:52 changer_0049663839990402 -> ../../IBMchanger1
lrwxrwxrwx 1 root root 14 Feb 11 09:52 tapedrive_1110027758 -> ../../IBMtape1
lrwxrwxrwx 1 root root 14 Feb 11 09:52 tapedrive_1110058875 -> ../../IBMtape0
lrwxrwxrwx 1 root root 14 Feb 11 09:52 tapedrive_4966383000 -> ../../IBMtape2
lrwxrwxrwx 1 root root 14 Feb 11 09:52 tapedrive_4966383001 -> ../../IBMtape3

The above output shows that device /dev/lin_tape/by-id/tapedrive_1110058875 is the persistent device name associated to device /dev/IBMtape0. The persistent name includes the serial number in its name. In this case, the serial number is 1110058875.
Find the correct drive at the TSM server with the command

tsm: TSMPROD>q drive f=d
Library Name: 3584LIB
Drive Name: 3584-004
Device Type: LTO
...
WWN: 5005076300026C01
Serial Number: 1110058875

In this case, drive 3584-004 is the drive with serial number 1110058875, then use this information to define the drive path

define path MYAGENT 3584-DRV1 srct=server destt=drive library=3584LIB device=/dev/lin_tape/by-id/tapedrive_1110058875

There may be an easier way to get the tape serial numbers by using the Linux 'sginfo' tool. Among other functions, this tool can send to the device the xinquiry command and return its results in full or in part. To get the serial number for example issue the command:

sginfo -s /dev/tsmscsi/mt1
Serial Number '1110058875'

You can easily script this to get all the serial numbers.

for each in /dev/tsmscsi/mt*; do echo $each $(sginfo -s $each); done 2> /dev/null | sort -n -k1.16 | grep Serial

back to top


Reclaiming offsite tapes

You don't have to bring your offsite tapes in to do reclamation.

Set your copypool reclamation to a reasonable level, say 60%. TSM knows what files are still valid on offsite volumes that are to be reclaimed. It finds the copies of those files in the primary storage pool (which is still in the library); it moves a scratch tape to the copy pool and copies the files from the primary tape pool to the new copypool tape. The new copy tape is then marked to go offsite, and the old one marked for return.

back to top


Reclaiming tapes that are assigned to another TSM server

Imagine the scenario, you are using one TSM as a Library Manager, called TSM1, with maybe 3 other instances sharing the library. You decommission one of those instances, say its called TSM3, reclaim the physical server, then discover that TSM3 still has a lot of tapes allocated in the Library manager. The data is defunct, it's all been moved to other TSM servers and you want to reclaim those tapes as scratch. You can't change the tape to scratch status from TSM1 as it does not own the tapes. If you check them out and back in again, they are still owned by TSM3. You can't change the owner, so what do you do?

The problem is that TSM1 contains records about these tapes in it's volhist file. You need to delete the volhist record on TSM1 for each tape with this command. Put your own volume names in and be absolutely sure to get the names right or you will delete the wrong tapes!

DEL VOLHIST TODATE=TODAY TYPE=REMOTE VOLUME=volume-name FORCE=YES
UPD LIBR library-name volume-name STAT=SCR

back to top


Reporting on Tape usage

The following query will produce a report of the space usage of all storage pool volumes, summarised by storage pool and status.

SELECT STGPOOL_NAME AS STGPOOL, COUNT(VOLUME_name) AS COUNT, STATUS, CAST(MEAN(EST_CAPACITY_MB/1024) AS DECIMAL(5,2)) AS GB_PER_VOL FROM VOLUMES GROUP BY STGPOOL_NAME,STATUS

Example output is shown below

back to top


Logical Block Protection

Logical Block Protection is set by setting the LBPROTECT option to 'READWRITE', and this means that a CRC will be generated for each block written to tape. This only adds an extra two bytes of CRC data at the end of each block of data on tape, but it does mean that the TSM server code goes down a different code path and this will significantly degrade the tape read/write performance.

back to top