Data Archiving

The Archive Command

Data archiving is intended to preserve a copy of a related set of files as they stood at a point in time for legal or compliance purposes. This set of files might consist of tax records, end-of-project reports or similar.
An Archive will typically be requested by the customer as a one-off process, it would not normally be a scheduled event. An Archive will usually be retained for several years.
If an Archive is required again, then normally the entire set of files is brought back to a new location, and the process is called 'Retrieve'.

An Archive is not the same as Backup, which typically involves copying an unrelated set of changed files to tape and retaining them for a relatively short time. Also a Backup does not affect the source data, whereas an Archive can delete the source data. An Archive is not the same as HSM migration, which involves moving older files off primary disk to cheaper storage. Migration is about managing disk space, Archive is about retaining data.

The TSM ARCHIVE command has the following options

ARCHIVE pathname.filename
Will simply archive a file
ARCHIVE pathname/* -deletefiles -subdir=yes
Will archive files in a directory including subdirectories then delete them from source
ARCHIVE -filelist=textfilename
You create a list of files to be archived and put them in a text file. The Archive command reads this list and archives the data off. The file list must include the full path name, so this can be used to archive selected files from different paths
ARCHIVE -filelist=textfilename -archmc=mc7yrs -deletefiles -description="End of year data for inland revenue, requested by Colin Green"
Archive a list of files, bind the archive to a seven year management class (that you have previously set up with a seven year retention), delete the originals and give it a meaningful description (254 chars max)
ARCHIVE pathname.filename -v2Archive
Use the v2archive option to generate secondary description tables - see the performance section below

It is possible to archive data using the Web client GUI, but you get fewer options than with the command line.

The Retrieve Process

The following command line options are available to retrieve the data. The different options can be used in combination

RETRIEVE pathname.filename
Will simply retrieve an archived file to the original location. You will be prompted if the data already exists
RETRIEVE pathname/* newpathname/
Retrieve all files in a directory to a new location
RETRIEVE pathname/* -pick
Get a pick list of archived files from a specified directory. You can then select those files you want retrieved.
RETRIEVE -filelist=textfilename
Retrieve a list of files that are specified in a file

It is also possible to retrieve files with the GUI as shown below

Finding Archived files

To find individual archive runs you could use an SQL query

select NODE_NAME,ARCHIVE_DATE,CLASS_NAME,DESCRIPTION from ARCHIVES where NODE_NAME='node'

One easy way to find archived files is to use the TSM Web Client GUI as shown above. Point your browser to http://servername:1581 and select the retrieve option. This will list all archives for that server, and you can drill down into each archive to find individual files.

If you need to provide a list of archived files, then from the command line you can use the QUERY ARCHIVE command and pipe the output into a file for perusal by a user.

Archive Performance Improvements

The GUI and Web clients use the archive description as the primary way to navigate to a specific archive but as these descriptions are in text format and were held in the same primary archive table as the archived file path names the search can take a long time. To speed up search performance some of these search items are also held in secondary description tables. Archives that are invoked from the Web Client or GUI always use the secondary tables. Command line archives can be forced to use the secondary tables if they are 'converted' by using a CONVERT ARCHIVE command.

CONVERT ARCHIVE

Use this command convert archives run from the command line that should be forced to store filespace and description data in secondary tables to speed up searches. This is just appropriate if you run repeated archives over the same set of data, giving each archive a different description. If you use the command line for archives and recalls, and do not use the description to identify archives; then do not convert the archives to save on database size, and use the -v2Archive option with subsequent archive requests. Syntax:

CONVERT ARCHIVE nodename

UPDATE ARCHIVE

Use this command to save on database space if your database has large numbers of archive entries, where large means 100,000 or more. This command should not be used if anyone uses, or may use the Web Client or GUI to work with archived files.

UPDate ARCHIve node_name -SHOWstats -RESETDescriptions -DELETEDirs

SHOWstats
statistics include the number of directory and file entries, the number of entries for directories with the same path specification but different descriptions, and whether the node is converted.
RESETDescriptions
resets the description field to the same description for all archive entries for a node. This means that every archive for a given directory will belong to the same package. Once the descriptions are changed, they cannot be restored.
DELETEDirs
deletes all archive directory entries for the node. This means that the original access permissions cannot be provided when files are retrieved. This might not be as important as saving the database space. Once the directory entries are removed, they cannot be restored.

UNDO ARCHCONVERSION

This command empties out the secondary description tables. It does not lose the archive directory or file data as that is held in the primary archive table entries. You can either use this command on its own to free up the database space used by the secondary description tables, or you can follow it with the CONVERT ARCHIVE command to audit and refresh the secondary description tables. The syntax is

UNDO ARCHConversion node_name