TSM Backup Hints

The links below take you to sections within this document


BACKUP TIPS

Selecting and Excluding Drives

You can select and exclude drives from scheduled backups by placing an entry in the DSM.OPT file on the client, with a line which looks like this for a Windows client

DOMAIN c: d:

The problem with these approaches is that you need to remember to update the dsm.opt file if you add new drives

DOMAIN ALL-LOCAL

will backup everything. A good variant, if you never wanted to backup your SYS: drives for example is

DOMAIN ALL-LOCAL -SYS:

This means that all drives are backed up except the SYS: drive, and you do not need to change the dsm.opt file as drives are added or removed. Specific selection criteria for the Windows System Object, or the Netware eDirectory (NDS) are given below.

There is an issue with Linux file systems as the root (/) file system is automounted and the DOMAIN ALL-LOCAL option causes the client to back up all local file systems except LOFS file systems and LOFS through automounter. As the root file system is automounted, it will not be processed by a simple DOMAIN ALL-LOCAL. Instead, you need to use

DOMAIN ALL-LOCAL /

However, the DOMAIN ALL-LOCAL approach will only work if you are backing up at domain level with a scheduled backup. If you manually select a volume that is excluded in the dsm.opt file, then it will be backed up.

dsmc incremental sys:\ -subdir=yes

will backup the entire SYS: volume, even though it is excluded in the domain statement. This is not necessarily a bad thing, it means that you can exclude SYS: from scheduled backups, but when you really want a backup, you can do it manually. This approach will also allow you to backup selected files or directories from an excluded disk

dsmc selective sys:\tivoli\tsm\* -subdir=yes

If you absolutely never want to let anyone backup the SYS: volume in any circumstances, then use EXCLUDE statements as these will always apply. To exclude an entire disk you need two commands, one to exclude the directories and one to exclude the files in the root folder, like this (Technically, this does not quite exclude the whole disk as the root of the volume itself will be backed up once).

exclude sys:\*
exclude.dir sys:\*

or

exclude c:\*
exclude.dir c:\*

So now if you try to backup with the command

dsmc selective sys:\tivoli\tsm\* -subdir=yes

nothing will happen, because those files are always excluded.

back to top


Including and Excluding data

There are two types of INCLUDE and EXCLUDE commands, INCLUDE, EXCLUDE, INCLUDE.DIR and EXCLUDE.DIR. The first two commands will include or exclude files, the second two commands include or exclude directories. Syntax is

INCLUDE c:\file\space\name\..\*
EXCLUDE.DIR c:\file\space\name\temp

Note that the directory exclude does not end with a '\'. The inclexcl list is normally processed from the bottom up but EXCLUDE.DIR is processed before the regular INCLUDE or EXCLUDE statements so the position of EXCLUDE.DIR does not matter. However it is best to put the DIR processing statements at the bottom of the list to make it more obvious how the processing works.

If the data path you are describing includes spaces, you must include the full statement in quotes, i.e.

INCLUDE"C:\Program Files\Microsoft SQL Server\MSSQL\Backup\...\*"

The \...\ means include subdirectories. If you just coded Server\MSSQL\Backup\* then the include would only apply to files within the backup directory, and not any files in subdirectories.
Remember, the EXCLUDE statements must come before the INCLUDE

EXCLUDE \filespec\*.* will only exclude files which have a period in the name. If you want to exclude all files in a path, you need EXCLUDE \filespec\* In particular, this option will include both files with extensions and files with no extensions

Recent versions of the TSM client software provide an exclude.fs statement that will exclude an entire file system. It is more efficient than using a plain exclude statement. An exclude.fs suppresses any examination of the directories within the file system. An exclude covering an entire file system does not. The TSM client will still read every file name in every directory within the file system. It will check each file name against the include and exclude statements. It will then decide not to back up that file (assuming the exclude for the entire file system is in the right place in the sequence of include and exclude statements).
You can also be very granular with the use of encryption by using the include.encrypt statement so the statement below will encrypt all backup data for the datadirect directory only.

include.encrypt \opt\datadirect\

The query inclexcl command lets you check your syntax. This was introduced in TSM 4.1 It was still possible to check syntax on older releases, using the unsupported show inclexcl command. A standard q inclexcl will not display management class assignments, if you want to see them you need to use

q inclexcl -detail

If you use the backup/archive GUI then you can use that to check to see if expected files are excluded. Open the GUI, select Backup then navigate through the directories to get to files that you expect to be excluded. If your statements are correct, then these files wil be flagged in the bottom right corner witb a red circle with a line through it.

Another variant is to use ranges within EXCLUDE commands, for example, say you wanted to backup a large disk with three backup streams, but you did not want to have to change include statements when adding new directories (it would be more likely that you would not know when new directories were added, so they would be missed). You could define three clients, each with its own dsm.opt file, with exclude statements as shown below. You would need to ensure that the ranges covered every possible directory name.

Content of dsm_a.opt
NODENAME Servername_A
EXCLUDE.DIR \\Servername\diskname\[K-O]*
EXCLUDE.DIR \\Servername\diskname\[P-Z0-9$]*

Content of dsm_b.opt
NODENAME Servername_B
EXCLUDE.DIR \\Servername\diskname\[A-J]*
EXCLUDE.DIR \\Servername\diskname\[P-Z0-9$]*

Content of dsm_c.opt
NODENAME Servername_C
EXCLUDE.DIR \\Servername\diskname\[A-J]*
EXCLUDE.DIR \\Servername\diskname\[K-O]*

You can use multiple wild cards, but only at one level in the file specification, so for example dsmc sel \opt\data\?file* will backup all files in the \opt\data\ directory that have exactly one alpha-numeric character before the expression 'file' then any number of characters after it. However the expression dsmc sel \opt\*\file* will fail with ANSA1076E as wildcards are used on 2 levels.

back to top


Incremental backup of multiple directories using the objects field

To perform an incremental backup for several directories, code them in the objects field in the schedule like this.

OBJECTS='/path1/ /path2/ /etc/'

The terminating slash at the end of the directory name backs up the files within the directories, and they need to be separated by spaces. If you wish to back up subdirectories within these directories, then you also need

OPTIONS=-SUBDIR=YES

back to top


Incremental by Date backups; faster but less secure

You would use this type of backup if your backup window is not long enough during the week, but you have plenty of time at the weekend. An incremental by date backup uses the last updated timestamp on a file to decide if it needs to be backed up or not. The problems is that this field is not one hundred percent reliable on open Systems data as some applications can update data without changing the last update field. A normal TSM backup will compare the attributes of every file with the current active backup, and if they do not match, will take a new backup. Incremental by date simply looks at the last modification date, so it is much faster, and uses less memory. The downside is that is might not backup every changed file.

Also, it will not expire backup versions of files that are deleted from the workstation, it will not rebind backup versions to a new management class if the management class has changed and it ignores the copy group frequency attribute of management classes

To run an incremental by date backup, you add the parameter '-incrbydate' in the 'OPTIONS' box in the 'Define Client Schedule' GUI window.

If you use incremental by date, then to ensure you do get a full client backup, and correctly manage file expiration and management class changes, you should plan to take at least one full incremental backup every week.

back to top


Adaptive Subfile Backups

What is the difference between Progressive Incremental and Adaptive Subfile backups?

TSM standard backups use progressive Incremental and IBM recommends that this type of backup should always be used where there is sufficient stablenetwork bandwidth. A progressive incremental backup copies all files to the TSM store the first time a backup is run, then just copies changed files on subsequent backups. Older versions of changes files are retained at the TSM server depending on the management class settings, but when a file changes, the entire file is copied on the next backup run.

Adaptive Subfile Backup is used for limited bandwidth networks or when there is a limited connection. Examples include a modem, wireless, or mobile connection. It backs up only the parts of a file that have changed since the last backup, essentially incremental backup within the file. This reduces the amount of transfer time and data transferred over the network. The TSM Server stores a complete full backup of the original file as a base file, and subsequent changed parts of the file called deltas.

The information required to create these deltas is stored in a subfile cache folder in the \baclient\ directory at the TSM client. Files smaller than 1KB or larger than 2GB are currently not supported by subfile backup. As the base file is required to recreated the current file, it is not deleted from backup when it passes data retention requirements, but older delta files will be deleted from backup to comply with the management class policies.

You invoke Adaptive subfile backup by adding the parameter

SUBFILEBackup yes

in your dsm.opt file. You can also be selective in the same file and specify exactly what you want to be processed using subfilebackup using include and exclude commands.

include.subfile c:\test\file.txt
exclude.subfile c:\test\file.txt

The subfilebackup option does not work correctly for migrated files. If you use a combination of subfilebackup and non-subfilebackup for migrated files, your data might be corrupted on the server.

back to top


Selective backups of Windows directories with embedded spaces

It is possible to select a number of directories from a Windows command line interface by listing them as in the example command below.

dsmc inc c:\dir1\* "d:\dir2\sub dir1\*" d:\dir3\ -subdir=yes

This command will incrementally backup all files in directories
c:\dir1
d:\dir2\sub dir1\
d:\dir3
and any subdirectories underneath them. Note that the asterisk is required in the second directory because 'sub dir1' contains an embedded space and needs to be enclosed in quotation marks. The Windows command processor treats a \" combination in a special way, but it will parse a \*" combination as expected

back to top


Using Collocation to speed up restores

When migrating data from disk to tape, or backing up direct to tape, TSM will normally use any tape in that storage pool that is in 'filling' status first, before it asks for a scratch tape. The consequence of this is that data for any given node can be spread over a lot of tapes. This is not an issue for the odd file restore, but it can be a major problem if you want to recover a full disk, or even a large directory, as TSM will spend a lot of time mounting and dismounting tapes. To resolve this issue, IBM introduced Collocation, where every node had its own dedicated set of tapes.
The consequence of this was that TSM then used a lot more tapes than it used to, so then IBM introduced Group Collocation, which is a sort of 'half way house' that is useful for grouping together smaller clients so that they can share tapes, without interfering with the storage for the larger clients. It is also possible to collocate by filespace for very large clients.

Collocation is enabled by storage pool, and the type of collocation specified applies to all the nodes in that pool. To enable collocation you use the command

UPDate STGpool pool_name COLlocate=NODe
UPDate STGpool pool_name COLlocate=FIlespace
UPDate STGpool pool_name COLlocate=GRoup

to turn collocation off, use

UPDate STGpool pool_name COLlocate=None

If you use group collocation then you need to define some collocation groups and add nodes to them with commands

DEFine COLLOCGroup group_name DESCription=description
DEFine COLLOCMember group_name node_name,node_name,...

Collocation groups do let you be a bit granular with nodes in a storage pool. For example say you group nodes into small, medium and large, then you define two collocation groups, one for small and one for medium size nodes and add your small and medium size nodes to the appropriate group. These nodes will then be group collocated, but as the large nodes do not belong to a group they will be collocated onto individual tapes.

However be aware that collocation affects the way that TSM does disk to tape migration. Migration processes each collocation entity as a separate transaction, so if you use Group Collocation. then all files in a specific group will be migrated in a single transaction. Collocation by Node means a seperate transaction for each node, and Collocation by Filespace means a separate migration transaction for each filespace. The processing time for migration is dependent on the number of files being migrated, and also the number of transactions used, as more transactions means more commit time. So migration on files that are collocated by filespace could take longer that migration on files that are collocated by group.

There is another potential issue with Filespace Collocation. When TSM initiates a backup, it builds up a list of all the files it needs to backup, but it runs multiple sessions on the client to do this, so the resulting list will not be arranged in filespace order. Instead, the files from different filespaces will be interleaved in the list. Now if you are backing up direct to tape, and are using filespace collocation, TSM needs to write the data from each filespace to a different tape. The result is that TSM will mount a tape for filespace1, write out some files, find the next set is for filespace2, dismount the tape and mount another tape from filespace2, write out some files until it find files that belong to filespace3, dismount the tape for filespace2, mount a tape for filepspace3, write out some files, and keep 'thrashing' around different tape volumes until the backup is complete.

You stop this from happening by simply adding the line

COLLOCATEBYFILESPEC YES

in the dsm.opt file on the client, then TSM will change the way it builds the list, so it lists all backups required for each filespace in turn.

Using Collocation with deduplication is a no-no. A deduplicated backup will have links to bit of data on other volumes and so will call other volumes for a restore. This defeats the purpose of collocation. It is not that it won't work, it is just that there is no point in doing it.

back to top


Running backups manually

You can run an incremental backup manually from the command line by simply starting up dsmc, then entering 'i'. This will run an incremental backup of all the domains in the dsm.opt file. dsmc i c:\program files\* -subdir=yes will do an incremental backup of that directory and subdirectories only. You need to be logged in as administrator or root to run this command, as you will need to have access to all the files. If you cannot get those elevated priviledges, then you need to run a one-off schedule at the TSM server, as detailed below.
Scheduling a one-off backup

TSM 6.4.1 introduced a new option to allow you to force a backup of all files. This is the 'absolute' option and it does not override exclude statement. It is only valid for full or partial progressive incremental backups of file systems or disk drives. You can use it with snapshot differential backups if you add the createnewbase=yes parameter and also with journal-based backups if you also specify the nojournal parameter.

What if a disk is not specified in your dsm.opt file? Say you have a DOMAIN e: f: statement, and you want to backup the d: A standard incremental will not do this, as the dsm.opt file does not include it, but you can override dsm.opt with a domain parameter on your command; dsmc i -domain=d: This adds the d: to the domain list, so the command will backup the d:, and the e: and f: too.

back to top


Backing up Archived files

TSM will always try to keep a backup of both a stub file and its associated migrated file. In some circumstances, this can mean that you backup a lot more data than you expected, as TSM maintains its position after file changes. You can force TSM to ignore migrated files with the skipmigrated option option. The default for this is 'no', but if you set it to 'yes' then TSM will not backup migrated files.

When the skipmigrated option is set to 'no', another parameter comes into play, checkreparsecontent.

If you set checkreparsecontent=yes, TSM compares the content of the local stub file with the content in Tivoli Storage Manager storage. If they are the same, the stub file is not backed up again but it will be if they are different.
If you set checkreparsecontent=no, TSM will not do any stub file comparison and will not back it up if it has changed. This could mean that you do not have a valid stub file backup, but if you need to do a restore you should be able to recover the complete migrated file.

The content of stub files changed with HSM for Windows 6.2 so if you upgrade, you need to redo the backup of the stubs. You would also need to refresh the stubfile backup if you move migrated files with the dsmmove.exe command or you changed the file space that is used for migration. In these cases, you should set checkreparsecontent=yes and skipmigrated=no for the next backup, but consider changing them back once the stub backups are refreshed.

As stated earlier, TSM always wants a complete backup copy of any migrated file. If no backup exists, TSM will temporarily recall the migrated file and back it up. You can set this process to write to a temporary directory file to prevent intefering with the stubs with the TSM client option stagingdirectory.

If the backup-archive client cannot create a complete backup copy of the migrated file, the backup-archive client does not back up the stub file. For example, if the stub is an orphan with no migrated copy in Tivoli Storage Manager storage, the stub is not backed up.

back to top


Using NetApp Snapdiff backups

Snapshots backups work well with TSM. You take an instant backup to disk, so you get a consistent copy of the data frozen at a point in time, while applications can continue to run and update the live data, without affecting the snapshot. You then use TSM to move that frozen data off disk and onto TSM backup media. NetApp snapshots use copy on write, take a look at the snapshot section if you want to understand how snapshots work in detail. The TSM incremental forever philosophy has a drawback; TSM has to scan the filesystem every time a backup is run, to work out which files have changed and need to be backed up, and this can take some time. For NAS and N-Series file servers that are running ONTAP 7.3.0, or later, TSM can use a NetApp feature so NetApp tells TSM which files to backup, if a TSM backup is run using the -snapdiff option.

The first time you perform an incremental backup with the snapdiff option, a snapshot is created (the base snapshot) and a traditional incremental backup is run using this snapshot as the source. The name of the snapshot that is created is recorded in the TSM database.
The second time an incremental backup is run with this option, a newer snapshot is either created, or an existing one is used to find the differences between these two snapshots. The second snapshot is called the diffsnapshot. TSM then incrementally backs up the files reported as changed by NetApp to the TSM server.
After backing up data using the snapdiff option, the snapshot that was used as the base snapshot is deleted from the snapshot directory, provided that it was created by TSM.
On Windows systems, the snapshot directory is in ~snapshot. On AIX and Linux systems, the snapshot directory is in .snapshot.

There are a few limitations;
You must configure a user ID and password on the Tivoli Storage Manager client to enable snapshot difference processing.
The filesystem that you select for snapshot difference processing must be mounted to the root of the volume. You cannot use the snapdiff option for any filesystem that is not mounted to the root of the volume.
For Windows operating systems, the snapdiff option can only be used to backup NAS/N-Series file server volumes that are NFS or CIFS attached and none of the NetApp predefined shares can be backed up using the snapdiff option, including C$, because the TSM client cannot determine their mount points programmatically.
For AIX and Linux operating systems, incremental backup using snapshot difference is only available with TSM 64 bit clients.

Because TSM is not deciding which files to backup, there are also some quirks in the way include/exclude processing works. Normally, if you change your exclude definitions, then all the files that are not excluded anymore will be backed up the next time you run an incremental. However, NetApp knows nothing of this, so if your are running snapdiff backups and you change the exclude statements, then those files will not be backed up until they are updated.
There are some other reasons why backups might be missed.

If you use the dsmc delete backup command to explicitly delete a file from the TSM inventory then NetApp does not detect that a file has been manually deleted from TSM.
If you want to run a full backup, and change the TSM policy setting from mode=modified to mode=absolute, then this will not be detected by NeApp and an incremental backup will run.
If you delete the entire file space is deleted from the TSM inventory, this will cause the snapshot difference option to create a snapshot to use as the source, and run a full incremental backup.
To make sure that all these changes are picked up correctly, you need to create a new base snapshot by running a backup with the CREATENEWBASE parameter:

dsmc incremental -snapdiff -createnewbase=yes /netapp/home

back to top


Schedmode, Polling or Prompted

SCHEDMODE POLLING means that every now and again the client asks the server if there is a schedule waiting to be started
SCHEDMODE PROMPTED means that the server contacts the client when it is time to start a backup

POLLING seems to work best with Windows clients, and is used with a QUERYSCHEDPERIOD parameter that tells it how often to contact the server to see if a backup is required. Typical parameters are shown below, and mean contact the server every hour.

SCHEDMODE         POLLING
QUERYSCHEDPERIOD    1

SCHEDMODE POLLING must be used if a client is outside a firewall.

SCHEDMODE PROMPTED is best used if you want to tell a client which specific LAN address and port it needs to use for a backup, otherwise it will use the address it used for first contact, every time. By default, TSM uses port 1501. If you find you are having problems with schedules missing with no apparent cause, it is possible that the server is trying to contact the client on the wrong address or port. You can force the server to use a specific ip address and port as shown below.

SCHEDMODE           PROMPTED
TCPCLIENTADDRESS    10.56.21.123
TCPCLIENTPORT       1501

Sometimes you will get backups missing due to port problems with the DSMCAD when you are running with SCHEDMODE=PROMPTED. Typically DSMCAD has to be recycled after each backup or backups will fail. You can check which port DSMCAD is using by recycling it then checking the dsmwebcl.log for an entry like:

(dsmcad) ANS3000I TCP/IP communications available on port XXXXX

DSMCAD should be listening for the server prompt on the port shown. You can check to see if it is listening by running a

netstat -an

command from an operating system command line, and you should see a listener on that port. Next, check that the TSM server can get to that port by opening an operating system command line from there, then running command

telnet client _ip_address port_no

If you get no messages, then the server is connecting OK. If you get errors then one possibility is that you are trying to get through a firewall, and you need that port opened up for both inbound/outbound communication.
Another option is to check that you can run a manual backup from the client. If this works then you could consider changing to schedmode polling.

back to top


Scheduling a one-off backup

If you want to run an immediate, one-off scheduled backup of a client, the easiest way to do this is to use the DEFINE CLIENTACTION command. The basic syntax is

define clientaction 'node_name' domain='domain_name' action=incremental

where you substitute your own node and domain names. TSM will then define a one-off schedule and associate the client node with that schedule. TSM will generate the schedule name, and return that name to the server console with messages ANR2500I and ANR2510I. You can use most of the parameters that are available to the define schedule command, like the 'OPTIONS' and 'OBJECTS' parameters, if you need to expand on this command.

The term 'immediate' should not be taked too literally, as what happens next depends on how your client scheduling mode is defined. If the schedule mode is PROMPTED, then the schedule is added to the list of schedules waiting to execute, and will normally run within 5 minutes or so.

If the client scheduling mode is POLLING, then the schedule will run next time the client scheduler polls the server to see if a schedule is waiting and that could take some time. The best action, is to recycle the DSMCAD or the scheduler daemon/service at the client end once you run the define clientaction command at the server, and that should immediately trigger the schedule.

back to top


Controlling how often 'files in use' are retried

This parameter is usually set at the TSM server for all clients and values are typically 3-5, with 4 being the default. This is fine for most of your data, but suppose you must take a daytime backup of a user area that you know contains several '.pst' mailbox files, that can be several gigabytes big and will probably be in use. If you need to retry all these files 4 times, before TSM accepts the failure and moves on, your backup will take hours. You can override this default for a specific client by adding the parameter below to your dsm.opt file, which means just retry files in use once.

CHAngingretries 1

back to top


DATA RETENTION

How to determine when TSM will expire a backup

The key to understanding TSM backup retention is to understand the difference between 'active' and 'inactive' backups. The 'active' backup isthe most recent backup while all the older backups are 'inactive backups'. However, once a file is deleted from the client, it becomes inactive after the next incremental backup. The active backup of a file is never deleted, as it is needed to recreate a disk
TSM uses 4 parameters to retain backups. The version parameters and the retain extra parameter can take a value of 1-9999 or NOLIMIT, while the retain only parameter can take these values and also take a value of 0.
There is a fundamental difference between the versions parameters and the retain parameters. The versions parameters are controlled by thebackup function, so changes to versions will not take effect until the next backup runs. The retain parameters are controlled by expiration, so changes to retention parameters take effect immediately.

  • 'Versions Data Exists' is used to determine the maximum number of backup versions that will be retain for files that currently exist on the client.
  • 'Versions Data Deleted' is used to determine the maximum number of retained backups for files that have been deleted from the client.
  • 'Retain Extra Versions' specifies the number of days that inactive backups are kept.
  • 'Retain Only Version' controls how long to keep the last backup of a file that has been deleted from the client.

So the pecking order goes like
If the file is deleted, then the most recent backup is kept for the number of days specified in 'retain only version', while older backups are retained by whichever of 'retain extra versions' and 'versions data deleted' is met first
If the file is not deleted, then the most recent backup is kept forever, while older backups are retained by whichever of 'retain extra versions' and 'versions data deleted' is met first

For example, you have RETEXTRA=31 and VEREXISTS=31. If you create 31 versions in the same day, and then (still on the same day) you create version 32, version 1 will expire, regardless of RETEXTRA, because the VEREXISTS criterion has been exceeded. Likewise, if you create version 1 today, then create version 2 a week later, then never create another version after that, then version 1 will expire 31 days after the creation of version 2, since the RETEXTRA criterion has been exceeded.

If you need to GUARANTEE data retention for 31 days you would need to code the parameters as below, but be aware that you could end up keeping a lot of backups.
Versions Data Exists = NOLIMIT
Versions Data Deleted = NOLIMIT
Retain Extra Versions = 31
Retain Only Version = 31

back to top


How to list out backups that are marked for expiration

How can you find out which backups have been marked for deletion?
Backups and Archives are marked for deletion when their expiry date or version limit is reached, or if they are deleted manually, but they are not purged from the TSM database until the next Expire Inventory is run.

TSM flags files that are eligible by giving them a special deactivation date, which is 1900-01-01. Then when an Expire Inventory process runs, any objects which have deactivate_date of 1900-01-01 are removed from the database.
It is possible to generate a list of these files with SQL queries, but be aware that the file listing might be very large and generating it might degrade your server performance. The date format is slightly different between TSM 5.x and TSM 6.x servers. Suitable SQL queries are:

TSM 5.x

select node_name,state,backup_date,hl_name,ll_name from backups where deactivate_date='1900-01-01 00:00:00.000000' AND type='FILE'

TSM 6.x

select node_name,state,deactivate_date,backup_date,hl_name,ll_name from backups where deactivate_date='1956-09-05-00.00.00.000000' and type='FILE'

The query output on the TSM6.x server will still return a deactivation_date of '1900-01-01 00:00:00.000000' even though you specified '1956-09-05-00.00.00.000000' on the SQL query. That's just the way TSM stores the date.

back to top


Using Different Management Classes

One way to bind a set of backups to a different management class is to add an include statement into the client options file with a statement like

INCLUDE "C:\Program Files\Microsoft SQL Server\MSSQL\Backup\...\*" MCSQLBK

This means bind all files and files in subdirectories in the Backup directory to special management class MCSQLBK. If you add this statement, you will bind all previous backups of these files to the new management class. The '\...\' means scan subdirectories

The rebind happens next time a backup runs. This will work for every backup version of the file, not just for the active one. The file must be examined again to get a new backup class, so you cannot change management classes for files that have been deleted from the client.
Another way is to define a client option set on the TSM server that contains INCLUDE and DIRMC statements that binds all files and directories to the desired management class, then update the client node to use that client option set.
Finally you could define a domain and policy set that contains only the single management class by which you want to manage the client node data, then assign the desired nodes to that domain.

back to top


BACKUP SCHEDULING

Scheduling - controlling the start time

Ever been in the situation where you have a tight backup window (and if you haven't, how do you get a job like that?) You schedule 8 backups to start at midnight, but some of them don't start until 01:00? Frustrating or what?

The problem is that by default, TSM tries to spread backups out in a schedule, unless you tell it not to. Issue the command QUERY STATUS on your TSM server, and look for the parameter Schedule Randomisation Percentage. By default, this will be set to 25, which means TSM will spread the start times of all the backups in a schedule, over the first 25% of the schedule window.

If you want all your backups to fire in right at the start of the schedule, then change this parameter to 0.

If you get a problem with the schedule, the error will be either 'failing' or being 'missed'. 'Failed' means that the scheduled event began, but encountered an error condition that prevented it from completing successfully. 'Missed' means that the scheduled event never began. For example, if the scheduler is not running on the client, then of course it can not execute its scheduled events; nor can it report that it did not run.

On the client machines that missed their schedule, verify that the scheduler service is started. Check the dsmsched.log and dsmerror.log to verify that the scheduler was able to successfully contact the server and query its next scheduled event.

back to top


Scheduling - choosing specific weekdays

If you define your client schedules using the GUI interface, you can chose to run a backup every day, weekdays only, or on a specific day of the week. You can only chose one of these options. If you are running TSM version 3 and you use a command line, you can be more selective. Say you want to define a schedule to run every Monday, Wednesday and Friday. Use a command like the one below. Note the SCHEDS=E parameter, which means use enhanced schedule style. You need enhanced style to pick out individual weekdays.

DEF SCH domain_name schedule_name T=C ACT=I STARTT=22:00:00 DUR=1 SCHEDS=E DAY=Monday,Wednesday,Friday

Another possible option is to schedule a backup to run on the last Sunday of the month. An appropriate command is

DEF SCH domain_name schedule_name T=C ACT=I STARTT=22:00:00 DUR=1 SCHEDS=E DAY=Sunday WEEK=LAST

back to top


BACKING UP CLUSTERED SERVERS

Backing up a Windows Cluster with TSM

Too big to fit in here, take a look at The Windows Cluster Backups page.

back to top


Backing up a Netware Cluster with TSM

Netware Clusters have two types of disk, local disks and clustered disks. The SYS: disk will probably be local, and you may have others. The local disks are always attached to one particular server, while the clustered disks can move around the various servers in the cluster. 'Takeover Scripts' are used to make sure that the disks move cleanly between servers. TSM backups can be 'cluster aware', that is they can move with the disks as the disks move between the cluster servers.

Actually, it is not the disks that move around between servers but Netware Partitions. To keep things simple, many sites set up each disk in its own Netware Partition, but your site may have several disks in each partition. When the TSM manuals refer to a 'cluster group' they really mean a Netware partition.

The TSM software has to run on a physical server, but there is normally no way to decide ahead of time which physical server will be hosting a volume.

The key to backing up a cluster volume is that the backup metadata must be available from whichever server is hosting that volume, so the metadata must be held on the cluster volume. The metadata includes the dsm.opt file and the password file. The schedlog, errorlog and webclient log also need to be held on the cluster volume to get continuity between messages as the volume moves between servers. Every Netware partition needs its own dsm.opt file.

Backing up the Local Volumes

Just use a standard TSM install on each of the physical servers. The dsm.opt file should specify CLUSTERNODE=NO (or miss it out as that is the default). With this setting, if you use a domain of ALL-LOCAL then it will not see the clustered disks. The NODENAME should be the same as the server name

Backing up the Clustered Volumes

Each Netware partition must be defined to TSM as a separate node, and must have a unique name that is not the same as any physical server name. As each partitions will have a virtual server name, it is easiest to use that as the node name.

Allocate a TSM directory on a volume in the partition and copy a dsm.opt file into it. Assuming that your are storing the tsm info on a disk called CAV1, edit the dsm.opt file with the following settings

NODENAME CAV1_SERVER
DOMAIN CAV1
CLUSTER YES
PASSWORDDIR CAV1:\TSM\PASS\
PASSWORDAccess GENERATE
NWPWFile YES
OPTFILE CAV1:\TSM\DSM.OPT
ERRORLOGName CAV1:\TSM\DSMERROR.LOG
SCHEDLOGName CAV1:\TSM\DSMSCHED.LOG

To set up the passwords, from your first clustered server enter the following commands:-

Unload TSAFS then reload it with TSAFS /cluster=off

dsmc query session -optfile=CAV1:/tsm/dsm.opt
dsmc query tsa -optfile=CAV1:/tsm/dsm.opt
dsmc query tsa nds -optfile=CAV1:/tsm/dsm.opt

Make a copy of dsmcad in the SYS:/Tivoli/tsm/client/ba/ directory and give it a unique name for this volume, say DSMCAD_CAV1 then start the scheduler with

dsmcad_CAV1 -optfile=CAV1:/tsm/dsm.opt

Repeat this for every server in the cluster, and get the DSMCAD command added to the takeover scripts so the correct DSMCAD is started as a volume moves between clustered servers.

back to top


BACKING UP DATABASES

Backing up Databases with TSM

Database backups are usually a bit special as a database usually consists of a number of physical files that all need to be backed up as an entity, often with consistent time stamps. Databases also have transaction logs to ensure that the data stored in a database is consistent, even after a hardware failure. Databases usually have internal catalogs which record these files, so when you do a restore, you need to make sure that the catalogs hold the correct information too. To help with this lot, databases have a Database Management System (DBMS), which tracks physical database files, transaction logs and backups. A DBMS will usually be able to run a backup while the database is active, which effectively means no backup window is required.

This means that it is not a good idea to use TSM simply to backup databases as a collection of flat files, but what you want is for TSM to use the DBMS facilities to get a good, secure backup. TSM does this by providing (chargeable) add-on modules called TDPs or Tivoli Data Protection modules. TDPs exist for Oracle, Informix and MS-SQL databases; and also Lotus Domino and MS-Exchange e-mail databases. DB2/UDB does not have a TDP but uses the TSM API to get a consistent database backup.

There is a third party product called Repostor that you can use to interface with a number of other databases, including FirebirdSQL, Ingres, MariaDB, MySQL, PostgreSQL, Progress, Sybase ASE and Sybase IQ. MQSQL is especially popular as it is the database component of WAMP and LAMP internet servers. Repostor runs on most popular platforms: AIX, HP-UX, Linux (Redhat AND SLES) and Windows.

Repostor comes in two flavours, Snap Protector and Data Protector. As you might expect, Data Protector backs up from the database itself while Snap Protector backs up from a snapshot.

Data Protector integrates between the specific DBMS and TSM to provide a secure database backup. Exactly what it can do depends on the DBMS so you need to check out the Repostor website for specific information. In general you get the normal TSM backup and restore facilities like full and incremental backups, full, incremental and point in time restores, including restore from older backup, restore to new name and restore to new machine. You also get comprehensive reporting facilities, not just on backup versions but also used backup capacity and compression and deduplication savings.

As well as backups from snapshots, Snap Protector lets you quickly roll a database back to any point in time from snapshot plus block changes on the TSM Server and lets you recover to any point in time using backed up transaction logs. After the initial snapshot, only block changes are sent to the TSM Server and after a snapshot restore, only changed blocks will be restored from the TSM Server.

Protector is licensed by each backed up host and it is possible to download a free 30 day evaluation copy - see the Repostor website for details.

It is also possible to interface TSM with DBMS systems for other databases using ADSMPIPE. This is described in the red paper

http://www.redbooks.ibm.com/abstracts/redp3980.html

back to top


QUERYING BACKUPS

Querying Backupsets

This will give you the list of backupset volumes from volhistory:

select volume_name from volhistory where type='BACKUPSET'

This will give you the NUMBER of backupset volumes from volhistory:

select count(volume_name) from volhistory where type='BACKUPSET'

This will give you a list of the backupset volumes that are NOT checked in:

select volume_name from volhistory where type='BACKUPSET' and volume_name not in select volume_name from libvolumes)

This will give you the NUMBER of backupset volumes that are NOT checked in:

select count(volume_name) from volhistory where type='BACKUPSET' and volume_name not in (select volume_name from libvolumes)

back to top


PROBLEMS

Compression space errors

Prior to a client sending a file, the space (same as allocated on client) is allocated in the TSM server's disk storage pool. If caching is active in the disk storage pool, and files need to be removed to make space, they are. But if the file grows in compression (client has COMPRESSIon=Yes and COMPRESSAlways=Yes), the cleared space is insufficient to contain the incoming data.
Typically, this results in an error - 'ANR0534W Transaction failed for session ssss for node nnnn - size estimate exceeded and server is unable to obtain storage'

This commonly happens where client compression is turned on, and client has large or many compressed files: TSM is fooled as compression increases the size of an already-compressed file.
The only resolution is to take client compression off.

back to top


TSM fails with out of memory errors on the backup client.

The first thing that TSM does when it starts a backup is to scan each filespace, then build a list of files that need a backup. It normally does all the calculations and holds the file list in memory. The amount of space used depends on the lengths of the filenames and paths. Sometimes 500,000 files can be a problem and sometimes TSM can cope with millions of files so it's hard to predict exactly when memory problems will start. If you are having problems with TSM running out of client memory then you have a number of options to fix it.

The easiest solution is to use Memory Efficient Backup then probably incrbydate, described next on this page. Other options include Journal Backups, Image backups, or on a UNIX system, define multiple virtual mount points within one file system. Each mount point would be backed up independently.

Two variants of Memory Efficient Backup exist;
The first method changes filespace processing so instead of scanning and building a file list for an entire filespace, TSM scans and processes one directory at a time. This method does not work if all your problem files are concentrated in one directory though, which happens if someone switches on trace logging and forgets about it.
The second method scans the entire filespace, but holds the file list on disk.

To implement the first method, simply place a line

memoryefficientbackup yes

in your dsm.opt file (dsm.sys in UNIX), or if you are backing up by command, add a parameter -memoryef=yes

to use the second method, you use INCLUDE.FS statements, and tell TSM where to store the file list. For example

INCLUDE.FS e: MEMORYEFFICIENTBACKUP=DISKCACHEMETHOD DISKCACHELOCATION=E:\TSM_cache
INCLUDE.FS f: MEMORYEFFICIENTBACKUP=DISKCACHEMETHOD DISKCACHELOCATION=F:\TSM_cache

The first time that you use a disk cache it will require lots of disk space, potentially several gigabytes, but the following backups will use less space.

You can combine both methods, then filespaces that specifically have INCLUDE.FS statements as above will use the disk cache method, while all other filespaces will use standard memory efficient backup.

back to top


Backups overunning, TSM continually mounting tapes

If you run a backups of a server where the files are going to different management classes and some are directed to disk and some to tape, then you might see TSM continually mounting and dismounting the tape. The backup runtime will be much longer than expected and you will see lots of messages on the TSM server log like this, and lots of 'Waiting for mount of offline media' messages on the client schedule log.

... 03:40:55 ANR0511I Session 5555 opened output volume T21345.(SESSION: 12345)
... 03:42:29 ANR0514I Session 5555 closed volume T21345. (SESSION:12345)
... 03:43:07 ANR0511I Session 5555 opened output volume T21345.(SESSION: 12345)
... 03:43:08 ANR0514I Session 5555 closed volume T21345. (SESSION:12345)
... 03:43:10 ANR0511I Session 5555 opened output volume T21345.(SESSION: 12345)
... 03:43:12 ANR0514I Session 5555 closed volume T21345. (SESSION:12345)

The problem is that every time TSM switches from backing up data to tape, to backing up data to disk, the tape volume is closed and dismounted, so next time it wants to send a file to tape it has to re-mount the tape drive and open the tape volume again.
The resultion is quite simple, just change the tsm client so it keeps the mount point.

update node node_name keepmp=yes

back to top


BACKING UP SYSTEM DATA

Backing up the Windows SystemState

The Windows System state contains all the data needed to recover the operating system from scratch. According to Microsoft, "System state is a collection of several key operating system elements and their files. These elements should always be treated as a unit by backup and restore operations." See the Windows section for more details about the System State . You backup the Windows System object by either using the command

backup systemstate

or by specifying a SYSTEMSTATE domain in the dsm.opt file. The ALL-LOCAL domain includes the system state. Exactly what you backup will depend on the release of Windows, and what Windows components are installed.

The systemstate is a special object type and requires special scheduling. If you are running a full incremental backup of a server, then the system state will be included. However if you want to be selective, then you must schedule a backup with an ACTION type of BACKUP, with SYSTEMSTATE in the OBJECTS field. The systemstate must be backed up on its own with no other objects in the schedule.

A system state backup uses Volume Shadow Copy Service (VSS), where each operating system 'element' is represented by a Microsoft VSS writer of type 'VSS_UT_BOOTABLESYSTEMSTATE'. Exactly which system state writers will be used depends on the Windows operating system. The 'System Writer' will process most of the files needed for the system state, but other writers may include the 'Registry Writer', the 'WMI Writer', the 'Task Scheduler Writer' , the 'COM+ REGDB Writer' and the 'ASR Writer'.
The backup process works like this

  • TSM, acting as a VSS requester, queries VSS for the list of bootable system state writers
  • VSS requester queries each VSS writer for its metadata which includes the files that need to be backed up for that writer
  • The necessary snapshot(s) are created by the VSS provider
  • The data is backed up from the snapshots
  • The snapshots are released
  • The backup is complete

The IBM recommendation is that you use Open File Support for drive backups, and investigate and fix all 'cc=4' open file errors. Do not exclude files unless you are certain they are not needed for restore, specifically do not exclude ntuser.dat or usrclass.dat files.

Backing up the system state became much more of a challenge with Windows 2008 onwards as the number of objects requiring backup were considerably higher, 8,000 with Windows 2003, maybe 80,000 with Windows 2008. This massive increase affected backup processing and TSM server housekeeping.

The first thing you will notice is that backups run for considerably longer, and will appear to hang for several hours. This is partly because an incremental systemstate backup needs to do a lot of work comparing client data with server data to decide what to backup. The other reason is that systemstate backups are 'grouped' and once the backup is complete the server will regroup the systemstate objects which can take a long time. While TSM is doing this, it holds the client session open, and will not mark the backup as complete until the regrouping is finished.
TSM server expiration will also take a long time, especially if you are retaining a lot of systemstate backup versions.

The first question to ask your server support people is, 'would they actually use a TSM backup to recreate a Windows system, or do they recreate from a standard build?
If TSM systemstate restores are not required, there is no point in running backups.
If backups are required, then consider that we tend not to do system maintenance on servers every day, so systemstates are usually quite static. We also do not want to backlevel a server by several weeks, if we need to restore, then we usually want the last backup. Based on these facts, it seems reasonable to backup the systemstate just 2-3 times per week, and keep the retention period low, 2 weeks would be more than adequate. This low retention rate would limit the impact of systemstate backups on the TSM server.

If you have a large number of Windows clients, then IBM has suggested the following strategy

  • Split the Windows clients into 3 domains, assume Domain1, Domain2, Domain3.
  • Backup each domain twice per week, on separate nights
    • Domain1, Monday/Thursday
    • Domain2, Tuesday/Friday
    • Domain3, Wednesday/Saturday
  • Retain 2 weeks worth of backups, that is, 6 versions.
  • Run expiration by domain using the domain=xxx parameter
    • Expire domain3 systemstate on Monday/Thursday
    • Expire domain1 systemstate on Tuesday/Friday
    • Expire domain2 systemstate on Wednesday/Saturday

Running expiration like this means that it will will not cause lock contention with the backups.

To restrict the number of backups held, you bind the systemstate files to a management class that keeps relatively few versions. You achieve this with the following include statement in the dsm.opt file, or in an include/exclude file if you keep these separate

INCLUDE.SYSTEMSTATE ALL yourmgmtclassname


TSM 6.2.3 introduced the ability to take incremental systemstate backups. Incremental is the default option, but if you need to take full backups, this can be controlled using a SYSTEMSTATEBACKUPMETHOD in the client options file (dsm.opt). The options are FULL, OPPORTUNISTIC and PROGRESSIVE.

As you would expect, FULL means backup all the files belonging to the system state.
OPPORTUNISTIC means that one or more files are changed since the last backup, the entire system writer is backed up, but if no files have changed then the smaller writers like registry are still backed up, but the huge system writer is not backed up again.
PROGRESSIVE is standard TSM incremental processing. That is, only those system writer files that have changed since the last system state backup will be backed up. This is the default.


For systemstate backups to work, the Windows VSS writers must be working successfully. When they are not working, you typically see error messages like 'ANS1950E Backup using Microsoft volume shadow copy failed'. The error message text usually includes 'vss'.

To resolve these errors, first check that the Windows VSS service is in 'Manual' mode and can be started. It's normal state is 'Stopped', as TSM must be able to start it up with the correct set of parameters. Use the Windows command 'vssadmin list writers' to check the status of the writers.

Second, check that the userid that you use to run your backups has the correct permissions to be able to access the writers

If these both look OK, then check with Microsoft Support for the latest hotfixes for VSS.

There is also a Microsoft utility, VSHADOW.EXE, that can be used to test and report on VSS writers. The following links describe the utility and tell you where the downloads are
VShadow Tool and Sample (Windows) - MSDN - Microsoft
Using Vshadow.exe to troubleshoot Windows VSS system state backup errors

Another option is to test the VSS writers using a Microsoft tool called DiskShadow.exe. This operates outside of TSM and so is useful to check to see if a problem lies with TSM or with VSS.
Open up a Windows command prompt and start up diskshadow with the command

diskshadow /l c:\diskshadow.log

This will give you a diskshadow prompt, so to get the status of the writes run commands

reset
list writers
list writers status
list writers metadata
list writers detailed
list providers
exit

check out file c:\diskshadow.log as that will hold the results of the commands, and will hopefully tell you if there are any errors

.

You can also create a snapshot of SystemState using diskshadow, which is independent of any snaphot created by TSM. Run the following commands:

reset
set verbose on
set option differential
set context volatile
add volume c:
add volume d: (if the system is on more than one disk, add them one by one)
create
exit

Again, checkout c:\diskshadowsys.log and see if any errors occured during the create phase. If you see any, then your issues are with VSS, not TSM.

back to top


Changing the Management class for System Objects

If you want to change the management class on System Objects, you need to add a line to dsm.opt. ALL system objects must be bound to the same management class.

include.systemobject ALL new-class-name

If the system object won't rebind to new mgt class, try deleting the filespace, and it should rebind on the next backup. The command to do this from the server is

del fi nodename "SYSTEM OBJECT" nametype=unicode

Why would you want to do this? Well the system state is large, so if your default management class holds 40 backup versions say, then you will use lots of TSM backup space for the system state for every server. You can assign a management class to the system state that keeps fewer versions for less time, to save space.

back to top


TSM and Windows ASR

Windows ASR applies to Windows 2003 servers and is not applicable to later versions. TSM can work with Windows 2003 ASR to simplify bare metal recovery, provided the TSM client is at 5.2.0 or above.
First, you must configure the TSM backup client for Online Image support and Open File support. Once you do that, you can take an ASR backup. This is not the same as creating an ASR recovery CD, that is discussed in the Windows section.

You take an ASR backup using the TSM client GUI, check the box that says 'Automated System Recovery' then click on the Backup button. Windows will then write system information to ASR state files called NTDLL.asr and SMSS.asr.
Then you need to backup the System State, by checking the boxes against System Services and System State and clicking on backup
Next, take an incremental backup of the local drive that contains the windows operating system. Once the backup is complete, select 'utilities' from the top line menu in the TSM GUI then select 'Create ASR diskette' from the drop down menu. You will be asked to confirm the floppy drive, and to insert a floppy diskette. If you did not take an ASR backup as in step one above, then this step will fail.

back to top


Backing up the Netware NDS

Novell Directory Services (NDS) is critical for the Netware server environment. You should backup the entire NDS tree periodically, and make sure that the DIRECTORY is included in the domain of at least one of your nightly, scheduled backups.

TSM will not be able to recover every part of an NDS, schema extensions, partition boundary information and replica assignments are not backed up at present. Its best to keep an accurate blueprint of the NDS tree so that you can recreate this information after a disaster.

TSM Version 5.1 and above uses a file space name of NDS to backup the NDS. This means that you cannot call any Netware disks NDS, or they will conflict with NDS backups.

Full NDS backup

To manually backup the NDS, use the command

selective nds:* -subdir=yes -volinfo

Version 5.2.2.0 of the TSM client for NetWare will automatically backup NDS on a server that holds a master replica. Previous versions of the TSM client, you need to add NDS: to the domain line in your dsm.opt file, it is not included in all-local domain

DOMain all-local, NDS

NDS trees

If you have split your NDS tree into several branches, then its network intensive to gather data from a remote branch. It best to exclude NDS entries from remote branches, and just backup from the root, and the local branch. Say the NDS tree has three branches, US, EU and OZ, and you want to backup the local EU tree, a typical include/exclude file would then be

EXCLUDE "NDS:.O=MYCOMPANY.OU=US.*"
EXCLUDE "NDS:.O=MYCOMPANY.OU=OZ.*"
EXCLUDE.DIR "NDS:.O=MYCOMPANY.OU=US.*"
EXCLUDE.DIR "NDS:.O=MYCOMPANY.OU=OZ.*"

TSM stores NDS object names as the typeful, distinguished object name including '.[Root]'. For example, the object CN=Miller.OU=Operations.O=RBS would have the following Tivoli Storage Manager name:

.[Root].CN=Miller.OU=Operations.O=RBS

This means that if you want to, you can exclude right down to common (leaf) objects. There are some restrictions, for example you cannot use the include and exclude.dir options together.

print queues

IBM recommended that you INCLUDE the directory structure for print queues, and EXCLUDE the print files, as they cannot be backed up anyway. This means that if the server is restored, print pending documents are lost, but the queue structure is available for new requests.

EXCLUDE <vol>:\QUEUES\*.QDR\*
INCLUDE <vol>:\QUEUES\*.QDR\*.Q

If Novell Distributed Print Services is used for printing, then Netware does not use print queues, so both the directory structure and the files can be excluded.

Server Specific information

There is some Netware information which is specific to a server, and which is not backed up by default. If you want to include it, you need to add a domain statement to the dsm.opt file

DOmain serverspecificinfo

The serverspecificinfo contains the following -

SERVDATA.NDS
Contains the information needed by the NetWare installation procedure to restore the server object to its original state.
DSMISC.LOG
A text file containing various NDS information, including the server's replica list.
VOLSINFO.TXT
A text file which contains the full server name (i.e., if the directory name had been truncated), a list of volumes present on the server at the time of the last server specific information backup, the name spaces and extend file formats (e.g., compression) enabled on the volumes. This file is meant to aid the user during server recovery.
AUTOEXEC.NCF
The server's autoexec.ncf file.
STARTUP.NCF
The server's startup.ncf file.

Also Server centric ids are unique ids assigned to each server, and are backed up server as part of the server specific information