Backing up a Linux VCS cluster with TSM

What is a VCS cluster?

Veritas Cluster Server from Symantec is a software based high availability cluster service that runs on AIX, Linux and Windows operating systems. It allows almost transparent application failover capability similar to HACMP or Microsoft Cluster Server.

The three principal components of VCS are

  • Systems, which are machines or LPARs. In the example below, they are VCSDC1 and VCSDC2. Systems can be located in different machine halls or data centres for cross-site resilience.
  • Resources, which are pieces of hardware or software, like a disk or dsmcad
  • Service Groups, which are all the resources that are needed to be available to run an application

VCS commands for Linux clusters are usually held in /opt/VRTS/bin/ and a few useful commands are:-

hastatus -summary  will list the status of systems, groups and resources
hasys -display     will check that a cluster is operating
haclus -display    will list out cluster information

For backup purposes you will need to know which file systems are owned by VCS and which are just local to the Linux server. Use the mount command, or df -h to find this out.
For VCS backups, you will need to identify a directory on a shared VCS file system that you can use to store TSM configuration and log files.

Configuration at the TSM server

Assume we have a 2 node cluster called VCS_Cluster, split across 2 datacenters, DC1 and DC2. I'll also assume that we have a single TSM server, called TSM_SERVER, that can failover between DC1 and DC2. The server IP address is 10.11.12.13 and it uses port 1543.
Also, we will assume that the VCS cluster contains 2 filespaces, /VCS and /VCS1. All the TSM related cluster data will be held in directory /VCS/tsm/

You then need 3 TSM client nodes defined on TSM_SERVER, one for the 'rootvg' or local data backup in each datacentre and one to backup the shared cluster data. These are called VCSDC1, VCSDC2 and VCSSHARE respectively.
These naming standards would not be suitable for a real environment, but are adequate to illustrate the configuration.

Configuration at the Client

You need a rootvg option file on each client, usually held in /opt/tivoli/tsm/client/ba/bin

dsm.opt

servername     TSM_SERVER
other parms

and then a dsm.sys file on each client

Note how the domain statements work. The rootvg backup uses DOMAIN ALL-LOCAL, that is, all file spaces, then excludes the two cluster file spaces with DOMAIN -/VCS -/VCS1 The '-' in front of the filespace means exclude.
These two file spaces are then specifically included in the cluster stanza.

Note that the log files for local backup are held on the local /opt/tivoli/tsm/client/ba/bin directory on each server, but the log files for the shared resource are held on /VCS/tsm/ This ensures that the shared resource logfiles are continuous and preserved on failover. The inclexcl files are held in the same way. You need to allocate these files or TSM will not start up.

you also need a shared dsm.opt file in the /VCS/tsm/ directory

dsm_vcs.opt

servername     TSM_VCS
other parms

DSCMAD Scheduler

You need two dsmcad services defined, a rootvg service that runs on both nodes in the cluster, and a cluster service that runs on whichever node is hosting the cluster. This dsmcad service needs to fail over automatically with the cluster.
There is an IBM technote that describes how to set up a TSM client schedule on a RHEL Linux client. It mentions 2 ways of setting things up so the scheduler is started automatically at boot time. One way is to place a command entry in etc/inittab like this

cad::once:/opt/tivoli/tsm/client/ba/bin/dsmcad >/dev/null 2>&1 # TSM Webclient

The other way is to define a /etc/init.d script, and this method simplifies cluster failover, as it gives you easy ways to stop, start, recycle and check dsmcad, with simple commands like

service dsmcad start

service dsmcad stop

service dsmcad restart

service dsmcad status

I recommend you check this link out and follow the instructions given. Once you install the script, you then use the chkconfig command to set the script to run automatically at boot time

/sbin/chkconfig dsmcad on

Use the --list option to check if a service set to start at boot time, and the off option to stop a service from starting at boot time

/sbin/chkconfig dsmcad --list

/sbin/chkconfig dsmcad off

Cluster Configuration

The cluster needs to be able to stop the dsmcad on one node then start it up on a different node. It also needs to be able to query the node that currently owns the cluster and check that the dsmcad is running. However, this needs to be a unique dsmcad name for the cluster to avoid confusion with the dsmcad for the rootvg backups. So define a dsmcad with a different name, on both nodes in the cluster, with the command

ln -s /opt/tivoli/tsm/client/ba/bin/dsmcad dsmcad.vcs

Take a copy of the IBM script above, call it dsmcad.vcs and amend it to manage a service called dsmcad.vcs. The cluster can now stop or start or query the dsmcad.vcs with the following commands.

/sbin/service dsmcad.vcs stop

/sbin/service dsmcad.vcs start

/sbin/service dsmcad.vcs status

Add this as a resource to the Cluster Group, with a dependency on the disk being available. First, to find Cluster Group resources used command

hagrp -resources VCS_Cluster

Then you may have to set the cluster configuration to read/write before you can change it.

haconf -makerw

create the dsmcad resource as an application resource

hares -add dsmcad.vcs Application VCS_Cluster

By default, critical is set to 1, which means if dsmcad fails or stops the whole Service Group fails over. Change it to 0 as we don't want to stop the application for a dsmcad failure.

hares -modify dsmcad.vcs Critical 0

Tell the resource what command to execute to start and stop the application (both are required), and tell it what to look for to check the resource is active.

hares -modify dsmcad.vcs StartProgram "/sbin/service dsmcad.vcs start VCS_Cluster"

hares -modify dsmcad.vcs StopProgram "/sbin/service dsmcad.vcs stop VCS_Cluster"

hares -modify dsmcad.vcs MonitorProcesses /opt/tivoli/tsm/client/ba/bin/dsmcad.vcs -optfile=/VCS/tsm/dsm_vcs.opt

Link the dsmcad to the availability of the filesystem, note syntax is link parent child

hares -link MNT -VCS dsmcad.vcs

Enable the resource - this lets the agent track it

hares -modify dsmcad.vcs Enabled 1

Check it out

hares -display dsmcad.vcs

Bring it online and check it came online OK

hares -online dsmcad.vcs -sys VCSDC1
hares -state dsmcad.vcs

Copy the configuration and put it back to readonly

haconf -dump -makero

Use the switch command to fail the cluster over, and check that the dsmcad fails over correctly

hagrp -switch vcstest -to VCSDC1
hagrp -switch vcstest -to VCSDC2

You can check the various cluster actions in the log files, /var/VRTSvcs/log/