FDRinstant Hints and Tips

Setting up the FDRinstant jobs

If you are converting from existing FDRABR backups then you are probably backing up by volume groups (MOUNT VOLG=PLW) or storage pools (MOUNT STORGRP=PLWSYS). With FDRinstant you have to provide one mount statement for each disk. The MOUNT statements when using flashcopy are

MOUNT VOL=VOL001,FLASHUNIT=B035

If you use fixed target disks, then you might find the VERIFYVOLSER=YES parameter useful to ensure that you are not overwriting the incorrect target disk. Obviously, you cannot use this parameter for the first backup.

Catalog errors

The flashcopy jobs will fail if they find a dataset with a catalog error. This could be an uncatalogued, SMS managed dataset, or a dataset with a VVDS error. The problem is that the job will not just fail the dataset, it will fail to flash the whole disk. If you ignore the error and run an FDRABR copy to tape job, it will put out a message

FDR211   FDR ERROR ON DD=DISKONL2 REASON=F - INSTANT BKUP ALREADY DONE

and will end condition code 0, but you will not have a backup.
You must rerun the FDRinstant job and get it to work before you run the disk-to-tape copy! But, the whole point of using instant copy is that the flashcopy happens very quickly. You do not want to have to spend time fixing catalogs in a ten minute backup window. The answer is to add the parameter SMSPROT=NONE to the FCOPY statement.

FCOPY TYPE=FDR,DSNENQ=USE,SMSPROT=NONE

FDR will then report on uncatalogued datasets, but will flashcopy the disk, and will not fail the job.

Getting exclusive use of datasets

This is really an FDRABR issue, but, FDR supplies 4 levels of dataset enqueue, NONE, TEST, USE and HAVE in ascending order of severity. The recommendation is to use 'USE'. If you do this, FDR will try to get exclusive use of all files. If files are in use, then FDRINSTANT will not back the disk unless you also code ENQERR=NO and then you get a 'fuzzy' backup. This is not that same as FDRABR which, if it can't get exclusive use of some files, will take a 'fuzzy' backup of them anyway. So a final recommendation for your flashcopy jobs is

FCOPY TYPE=FDR,DSNENQ=USE,ENQERR=NO,SMSPROT=NONE

Make sure this is acceptable and will work with your data before putting it into production.

DSNENQ=NONE and TEST are usually used for testing. DSNENQ=HAVE is really useful for annoying your operators. It will issue a WTOR to the system console for any dataset which is in use, and will want to be told what to do about it. FDRINSTANT uses a default VTOC ENQ=RESERVE instead of the FDRABR default which is ON.
If you use DSNENQ= or ENQ= on your ABR dump jobs then they are ignored, since these would prevent access to the online data sets and prevent changes to the online volume's VTOC, not the offline point-in-time image.

Unable to find backups?

Another FDRABR tip. FDRABR records backup information in the F1DSCB in the VTOC. If the dataset is deleted, the backup information is transferred to the ABR catalog. It the dataset is allocated again, it get a fresh F1DSCB with no backup information. Restores of the dataset will fail with

FDR321 unable to restore - reason code L

The original backup information will still be recorded in the scratch catalog, and a scratch report will find it. You will need a restore specifying volume, gen and cycle and that should work.

RESTORE  TYPE=ABR,DYNTAPE
SELECT   DSN=DARC.CICSVAM.SLT05,GEN=3,
         CYCLE=0,VOL=P9CD00

FDRinstant jobs run for an excessive time

An FDRinstant job should run in under 5 minutes, even if it is doing full dumps of every disk. If FDR is used with Multi Image Manager (MIM) from CA, then the run times can be excessive, 45 minutes or more, due to ENQ contention.
If you see messages in your job logs like

:MIM1038 FLASHJOB CONTENTION WITH DSNDBM1 OWNS SHR ON SYS3

then you should investigate adding an FDR specific exit to MIM called FDRCONXT.

You can multi-stream the flashcopy process by using multiple TAPE DD statements. Up to four concurrent processes are supported. Each TAPEx DD statement must have a matching SYSPRINx DD statement, so the JCL to run with the maximum four streams would look like -

//DUMP01  EXEC PGM=FDRABR,REGION=0M
//SYSPRINT DD SYSOUT=*
//SYSPRIN1 DD SYSOUT=*
//SYSPRIN2 DD SYSOUT=*
//SYSPRIN3 DD SYSOUT=*
//SYSPRIN4 DD SYSOUT=*
//TAPE1    DD DUMMY,LABEL=EXPDT=99000
//TAPE2    DD DUMMY,LABEL=EXPDT=99000
//TAPE3    DD DUMMY,LABEL=EXPDT=99000
//TAPE4    DD DUMMY,LABEL=EXPDT=99000

How access the flashed copy disks for testing or instant recovery

When you create an instant copy of a disk using the FCOPY parameter FDRFLASH of FDRABR, FDRABR will modify the volume label on the target disk so it cannot be brought online. If you have a pool of volumes that are available to FDRinstant for copies, you should also have them marked in your IOCP so they do not come on-line at IPL time.

If you want to bring the flashcopy volumes online, you must fix the volume label with the FDRVOL1 utility first.

//FDRVOL1 EXEC PGM=FDRVOL1,PARM='UNIT=u*'
//SYSPRINT DD SYSOUT=*
//DISK1    DD UNIT=SYSALLDA,VOL=SER=anyvol,DISP=OLD

You must specify an online volume for JCL syntax checking reasons on the DISK1 DD statement, shown as anyvol above, but this volume will not be altered by FDRVOL1. The volumes that will be changed are specified in the PARM='UNIT=u*' parameter. u* can be any prefix of a four digit device address, including a fully specified unit, for example, PARM='UNIT=1234' will modify the volume label at address 1234; PARM='UNIT=123*' will modify all offline units starting 123 and ' PARM='UNIT=1*' will modify all offline units starting 1. Any online disks in those ranges will not be affected.

Once you do this, you are in the perilous position of having two volumes with the same label, both of which are capable of being varied online, although only one can be online at a time. If you manage to get some of the flashed volumes online alongside some production volumes you will end up with some of your data back leveled, and be in a nasty recovery situation. This is an especial problem at IPL time, as the IPL will spot duplicate volumes and ask your operators to chose the ones that should be mounted. The safest way to ensure that the wrong volumes are not brought on at IPL time is to have two IOCP configurations, a standard one for the normal volumes, and a secondary one that will bring the flashed volumes online.

I can see two scenarios where you might want to vary the flashed disks online; you might want to vary them online to a different LPAR for testing purposes, or you might want to have the ability to vary them online for rapid recovery, if all the data on your primary volumes became corrupt.
Under normal processing, you will run an FDRABR job to copy the flashed data from disk to tape, and this job will release the flashcopy. By default, the flashcopy runs as NOCOPY mode, so once the flashcopy is released the disks are unusable. You can specify FCOPY=COPY on the FCOPY statements to make a permanent offline copy of the data. This will require that all the data is copied from primary to flashcopy, but that will have an overhead on the disk subsystem.