TDMF Overview

TDMF (Transparent Data Migration Facility) is used to move z/OS mainframe disk volumes without affecting active applications. It is completely independent of any hardware microcode, so it can be used to move disks between different vendor's storage equipment. Provided line speeds are adequate to deal with active I/O rates, it can also be used to move volumes between data centres. TDMF moves volumes; its sister product zDMF moves datasets.

    GFS Advert

TDMF can run as a started task but I've only ever seen it used in Batch. In brief the move process involves;

  1. Check that all the move conditions are correct to ensure data integrity
  2. Copy all tracks from the source to the target disk, while recording any tracks that have changed at the source
  3. Copy all changed tracks to the target while still recording changed tracks at the source. Repeat this until the number of changed tracks is less than a threshold number
  4. Temporarily quiesce the I/O from all applications on all LPARS, copy remaining changed tracks
  5. Switch volumes so the source goes offline and the target online
  6. Resume the applications

   EADM Advert

Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

TDMF can move most z/OS volumes that are in ECKD format, the only exceptions are volumes containing active local page datasets and coupling facility datasets. Native Linux volumes and VM volumes cannot be moved with TDMF as they are not in CKD format. TDMF z/OS v5.5 supports migration of 1 terabyte EAV volumes.

Before you start to use TDMF you need to run a job called SYSOPTN which sets up the security keys, defines some TDMF parameters and sets some default options. If your site has used TDMF previously then you may need to run it in update mode to refresh the licence. If you want to know what those default options are, then go into TDMF and take option 9 from the main panel. You will then see a screen that shows you all the installation options. If you press enter on this screen, you will also see the current TDMF version number.

A TDMF session can be defined as the set of batch jobs, Master and Agents, required to perform a TDMF migration and an associated COMMDS file. The following actions are needed to run a TDMF session

  1. Allocate the COMMDS. The COMMDS is a dataset used to communicate between LPARS.
  2. Set up the Master Job. The Master system runs on one LPAR and controls the migration sessions.
  3. Set up the Agent Jobs. The Agents run on the other LPARS and record the I/O activity on those LPAPS. From TDMF version 5.0, up to 63 agents can be defined.
  4. Run the Master Job
  5. Run the Agent Jobs. These jobs must start no more than 15 minutes after the Master.
  6. Monitor

Setting up a TDMF session

The COMMDS

The COMMDS (sometimes called the SYSCOM file as it is pointed to by a SYSCOM DD statement in batch jobs) is used to pass information between the Master task and the Agent tasks on the other LPARS, and contains the status and messages related to a specific session. You also use the COMMDS in the TDMF TSO monitor to get information about current and past sessions. TDMF uses hardware RESERVES to serialise disk I/O and so the COMMDS must be placed on a quiet volume, preferably on a dedicated volume. Use a dataset name that is excluded from SMS just to ensure the COMMDS is placed on the volume that you want, and make sure it is not on a volume that you are moving with TDMF.

It is best to use a separate name for each job so you can check previous TDMF runs so a good standard is to incorporate the job name into your file name, as this makes it easy to tie a COMMDS back to a specific job. A good standard is TDMF.jobname.COMMDS.

The communication dataset must be allocated on a cylinder boundary with contiguous space, and it must be on a CKD/E disk. If you are relocating to a new disk subsystem or Data Center then it is best to allocate a TDMF specific, non-SMS volume on the target subsystem.

The syscom dataset itself is formatted and cannot really be browsed directly; you need to view it with TDMF.

The size (number of cylinders required) is based upon the following formula:

CYLS = V * (S + K)

Where:
V = is related to the number of volumes:

  • 64 volumes V = 2.5
  • 128 volumes V = 5.0
  • 256 volumes V = 7.5
  • 512 volumes V = 10.0

S = is the number of participating systems or LPARS
K = is related to the size of the source volumes involved

  • 3390-3 K = 4
  • 3390-9 K = 6
  • 3390-27 K = 15

For example: if you are moving 128 3390-3 and 128 3390-9 volumes across 8 LPARs. Setting 'K' for the largest device type in session,

CYLS = 7.5 * (8 + 6) CYLS = 105

If you are allocating a lot of COMMDS files up from for a migration, the best way to do it is to use an in-stream JCL procedure like this.

//ALLOCS PROC
//PROC01 EXEC PGM=IEFBR14
//NEWLIB DD  DISP=(,CATLG,DELETE),
//    DSN=&DSN, UNIT=SYSDA,
//    SPACE=(CYL,(75,1),,CONTIG),
//    DCB=(LRECL=4096,BLKSIZE=4096,
//    RECFM=F,DSORG=PS)
//    STORCLAS=NONSMS,VOL=SER=TDMF01
// PEND
//STEP001 EXEC ALLOCS,DSN='TDMF.JOB00001.COMMDS'
//STEP002 EXEC ALLOCS,DSN='TDMF.JOB00002.COMMDS'
//STEP003 EXEC ALLOCS,DSN='TDMF.JOB00003.COMMDS' etc

TDMF has one quirk that you should be aware of when using a COMMDS: it bypasses the VTOC and accesses the data by absolute track. Suppose that you are running a lot of TDMF jobs, and you have one disk that you are using to store all your COMMDS files. When you complete a section of the migrations, you may want to wipe that disk and start again, so you delete all the old COMMSDS files and allocate a new set. Now, if you go into TDMF and pick up a new COMMDS from a job that has not run yet, say using option 7 - PAST SESSION DISPLAY DETAILS, instead of telling you there is no data to display, it will retrieve the data that still exists on disk from the previous old COMMDS and return it. This can cause a bit of confusion! This is not a bug, it is a feature of TDMF. The only foolproof answer is to always use a new, clean volume if you can.

The Master task

There is only one master task per session, but a Master task can handle multiple groups of volumes. All active LPARS must be defined to the Master system to prevent data corruption issues. You should place your Master task on the LPAR with the most update activity to the volumes that you are moving. It is possible to run several Master tasks in parallel, but they should all have their own COMMDS and you must run either GRS or MIM to ensure data integrity.

  • Initialises the system environment and the COMMDS
  • Establishes the XCF environment
  • Starts and controls all sessions
  • Monitors Source volume I/O activity
  • Monitors Target volume I/O activity
  • Copies data from Source to Target
  • Processes Source updates identified by the Agents

Sample JCL to run a Master task as a batch job is

//TDMF EXEC PGM=TDMFMAIN,PARM=MASTER
//STEPLIB DD DISP=SHR,DSN=SYS3.TDMF.PROCLIB
//SECCOM DD DISP=SHR,DSN=SYS3.TDMF.PROCLIB
//SYSCOM DD DISP=SHR,DSN=TDMF.JOB00001.COMMDS
//SYSPRINT DD SYSOUT=*
//SYSSNAP DD SYSOUT=*
//SYSIN DD *
SESSION XPADISKS
    MASTER(SA00) AGENTS(SB00 SC00 SD00)
    OPTIONS(
      UNIDENTIFIEDSYSTEMS(TERMINATE)
      CHECKTARGET
      CONCURRENT(05 ACTIVE)
      PACING(NORMAL)
      NOPROMPT
      NOPURGE
      FASTCOPY
      TIME(LOCAL)
      RELABEL(TD)
      )
  MIGRATE XPA20A SPA92A
  MIGRATE XPA20B SPA92B

The two MIGRATE statements above identify two volumes, XPA02A/B that are moving to the addresses used by volumes SPA92A/B. You would normally want more than two volumes in a job. How many volumes should you have? That is really up to you, but bear in mind that if you make your jobs too small, then running and checking them will be very laborious, while if you make them too big, you have to wait a long time for them to finish, especially if you are having problems. It is not a good idea to cancel a migration.

As a rule of thumb, it takes about 2-3 minutes to move a 3390-3 and 5-8 minutes to move a 3390-9. I'd suggest a 30 minute job runtime is reasonable, so if you are running 10 migrates in parallel, then 120 mod3s or 60 mod 9s could be appropriate. The exception is system volumes like SYSRES, PAGE, SPOOL etc, which I'd always run individually.

The OPTIONS can be used to override the global options set by the SYSOPTN job. These are SAMPLE options and may not be valid for your site. I'd suggest that you check the TDMF manual for a full explanation of these options, select the ones that work for you then TEST THEM on appropriate data and systems.

UNIDENTIFIED SYSTEMS is used to determine what to do about systems or LPARS that have a route to a volume, but no Agent is active on those systems. Options are IGNORE, WARN, ERROR and TERMINATE. This is not a foolproof way of identifying all systems as it depends on 3990-6 controller facilities and not all vendors support these.

CHECKTARGET means check that the volume is empty before proceeding

NOPROMPT means TSDF will not send out a confirmation message before synchronising source with target.

The RELABEL(TD) option means that when the migrates are complete, the source volumes will be relabelled as TDA92A/B. Alternatively, you can code this explicitly for each volume in the migrate statements as follows

MIGRATE XPA20A SPA92A TDA92A
MIGRATE XPA20B SPA92B TDA92B

FASTCOPY means that TDMF will just copy used bytes to the target disk. This is appropriate for new disks, but may not be a good idea if you are copying over existing data.

PACING means that TSMF will initially move 15 tracks at a time, but will reduce this if it finds that the disk is busy.

NOPURGE means do not delete off any existing data on the target that was not overwritten by the source data.

The Agent Tasks

There must be an agent on every LPAR except the Master LPAR. The Agents

  • Communicate with the Master for migration requests
  • Monitor Source I/O activity on their LPAR
  • Monitor Target I/O activity on their LPAR
  • Notifies the Master about any Source I/O updates

It is possible to run more than one Agent job in an LPAR, as long as each Agent is associated with a different Master task and communicates with that task with a separate and unique COMMSDS. All the Agent tasks on every LPAR must be started within 15 minutes of the Master task starting or the session will time out. However an Agent can be started before the Master.

Sample JCL for an Agent task is

//STEP1 EXEC PGM=TDMFMAIN,PARM=AGENT
//STEPLIB    DD DISP=SHR,DSN= SYS3.TDMF.PROCLIB
//SECCOM     DD DISP=SHR,DSN= SYS3.TDMF.PROCLIB
//SYSCOM     DD DISP=SHR,DSN=TDMF.JOB00001.COMMDS
//SYSPRINT   DD SYSOUT=*
//SYSUDUMP    DD SYSOUT=*
//SYSSNAP    DD SYSOUT=*
//SYSIN      DD DUMMY
//

If all your participating LPARS are in a single SYSPLEX then you can easily set the Master and all the Agents in one PDS member, using
/*JOBPARM S=system
route commands to make sure the correct job runs on the correct system, then you just type 'SUB' once to run all the jobs.

The 7 phases of TDMF

The Master system initiates and controls all migrations/replications. The Master initiates each phase and all Agents must acknowledge this to proceed.

SYSTEM INITIALISATION phase

System initialisation involves the Master task and all the Agent tasks starting up within 15 minutes, and reporting error-free validation for all volumes within a session. Checking includes making sure no other LPARS are accessing those volumes, and if the TDMF session has been set up to use SAF, then the volumes have the correct SAF authorisation.

INITIALISATION phase

This phase confirms that the source and target volumes are valid and if requested, waits for the Operator to reply to the Confirm WTOR. Once this is confirmed, the volume-level control blocks and real storage frames are allocated.

ACTIVATION phase

This phase starts the copy task and enables the monitoring of user I/O activity. While the data is copied from source to target, if updates to a source volume are detected in any participating LPAR, the Master system gathers that information for the REFRESH task.
Once all the tracks are copied by the COPY volume task, the Master then starts the copy REFRESH task. Further updates may happen, so the Master will run multiple refresh tasks until TDMF determines that synchronization of the target volume may be achieved, at which time, the Master system will move on to the Quiesce phase.

QUIESCE phase

The Master system instructs all Agents to stop all I/O activity to the source volume and pass it a final list of all updated tracks. The Master then performs a copy synchronous task to make the target disk a replica of the source.

Volume I/O redirect phase

All I/O is now permanently redirected to the target volume, which is effectively now the source volume. Once the redirect request is successful, the Master rewrites volume labels on both source and target.

Resume phase

The Master initiates a resume request via the Agents, to resume all I/O activity, now directed to the Target volume, and the original Source is varied offline.

Terminate phase

When a volume completes a migration, that volume's fixed storage will be freed for possible re-use within the current session

Hints and tips

The key to a successful TDMF migration is careful planning up front, which applies to most projects of course. There are a few datasets and volumes that need special treatment and some of them are discussed here, but consult your TDMF manual to get a full picture.

Unidentified LAPRS

Many large sites have more than one SYSPLEX, or they have LPARS are not part of the SYSPLEX. It is probably that some volumes will be shared and online between SYSPLEXES or rogue LPARS. Typically these will be IODF volumes, tape management system volumes or volumes used by Sysprogs for various nefarious purposes.
This means that you have a good chance of a TDMF job failing with an unidentified system message. If you are migrating an entire string of disks, and that string has one volume online to an unidentified system, then the entire string will fail with an error message like

TDM2381I This source volume connected to 1 unidentified system(s).
TDM2382I 8000029880 2094 02/28/2009 01:18:46.

The answer is to find the rogue LPAR, look at the string and check out which volumes are online. You will have already identified these in your detailed planning of course. You can then safely change the UNIDENTIFIEDSYSTEMS(TERMINATE) parameter in the MASTER job to UNIDENTIFIEDSYSTEMS(IGNORE) and rerun the job for those volumes that are NOT online to the rogue LPAR. For the volumes that are online, you need to allocate a COMMDS that can be accessed from every LPAR, then rerun the jobs for those disks with an agent on every LPAR including the rogue one. Alternatively, you can move the data without using TDMF.
The LPAR can be identified from the TDMF2382I message, the last four characters "9880" in the message are the CPU number, and the 5th last "2" is the LPAR number.
TDMF z/OS now utilizes QHA and SPID fence to help ensure that the TDMF agent is running on all systems which share a device to be migrated when a migration begins.

Special Datasets and Volumes

Be aware of where your TDMF load library resides and take care not to try to move it with TDMF. When you need to move that volume, move the load library to another disk and temporarily APF authorise it.

Watch out for various control datasets, like DFHSM BCDS / MCDS / OCDS, HSC, RACF and MIM control files. TDMF can move them, but it is recommended that you move them one at a time. For absolute safety, it is recommended that DFHSM and DFRMM be stopped when moving their control datasets.

Several CA products have control files that need special handling. For example, if you migrate the CA7 Commsds then you should shut down both CA7 and ICOM. See the manual for full details.

If your work volumes are SMS managed and you have sufficient capacity, it is also a good idea to QUINEW the volumes in SMS before the move to prevent new allocations and limit the amount of active IO to the disks

V SMS,VOL(xxxxxx,ALL),Q,N

When you are finished, enable them again with

V SMS,VOL(xxxxxx,ALL),E

Note: These SMS commands act over all LPARS and while that is the correct action to disable or quiesce volumes, that might not be the correct enabled configuration for your site.

When you move SYSRES volumes, the unit address will change, and it's the unit address that is used to reference a SYSRES volume at IPL time, not the VOLSER. So when you move a SYSRES volume, you need to let the Operators and System programmers, and anyone else who might be interested know about the new unit address. It is good practice to move the SYSRES and alternate SYSRES volumes in separate sessions. When you move a SYSRES volume your TDMF job will end CC=4, this is normal, it's just warning you that it's a SRSRES volume.

JES Spool volumes are usually busy so when you move them, do them in a quiet period, one at a time and set them to Drain to prevent new access. Remember to change the JES CHKPT addresses in SYS1.PARMLIB(COMMND00)

You cannot move active local page datasets with TDMF, if you try your job will fail CC=12 (yes, been there!). It's best to move these volumes one at a time with sysprog assistance, get them to drain the volumes before you try to move them.

The best way to handle SYSPLEX coupling datasets is to just switch to the alternates, then move the originals when they are not in use. If you use GDPS then you must switch the datasets with a GDPS script or you may cause a system outage.

How to make sure no-one else uses your Targets

The best way to set your targets up is to initialise them as SMS in your ISKDSF job by using the SG parameter, but do not add them to any SMS storage pool. This means that a non-SMS user cannot allocate data on them as they are SMS defined, but they cannot be used by SMS as they are not in a pool. When TDMF migrates a source onto a target it copies the VTOC and the VOLSER from the source, so the target then becomes usable. If you really want your target volumes in an SMS pool, then they must be set to DISNEW. Use the command

V SMS,VOL(xxxxxx,ALL),D,N

Hardware considerations

If you are a GDPS user then set Hyperswap to OFF while doing any TDMF moves. TDMF version 5 onwards will automatially issue HYPERSW OFF and HYPERSW ON commands for you to minimise hyperswap downtime.

The volumes that you are moving cannot be in an active Flashcopy relationship.

If a TDMF job is cancelled for any reason then the source or target volumes could have an invalid DPTSIO pointer. If your COMMDS is intact, you can fix this by running the original TMDF jobs again, with PARM=RECOVERMASTER or RECOVERAGENT as appropriate.

Softek recommends that you switch DASD fast write cache off while moving work volumes. Your DASD may not allow this, but if it does, the commands are

//S1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
SETCACHE VOLUME(xxxxxx) CACHEFASTWRITE UNIT(3390) OFF

And to switch cache back on again after the move,

SETCACHE VOLUME(xxxxxx) CACHEFASTWRITE UNIT(3390) ON

Note that while the command just references one volume, cache is actually turned off for the whole subsystem.

If you move a smaller disk to a larger disk then you need to rebuild the VTOCIX so it recognises the extra free space. The best way to do this is to use the TDMF parameter EXTVTOC and let TDMF do it as part of the move. Otherwise, you need to vary the volume offline to all other LPARS, then run the following job

//S1 EXEC PGM=ICKDSF,PARM='NOREPLYU'
//SYSPRINT DD SYSOUT=J
//SYSCX1 DD UNIT=3390,VOL=SER=volser,DISP=SHR
//SYSIN DD *
REFORMAT DDNAME(SYSCX1) VERIFY(volser) REFVTOC
/*
//

TDMF will clip the source volume to a different volser at the end of the move. All this does is changes the volume label (Change Label In Place) but the original data is still there, and the original VTOC too. If you are not decommissioning the subsystem, then this may cause confusion later as it looks like the volume is full of data, so it is good practice to re-initialise the disks once the moves are complete. If you are decommissioning the volumes, then you really should be running something like FDRERASE to wipe the data anyway.
TDMF was recently enhanced to take advantage of the Soft Fence capability in z/OS v1.12 and higher. This marks the old volumes as unavailable for access after swap migration so as to prevent any accidental use by the system.

TDMF testing

If you are trying to put together a complex migration plan and you are struggling to get your head around all the variables, then you can validate the correctness of your TDMF job by specifying
EXEC PGM=TDMFMAIN,PARM=(MASTER,SCAN)
in your master JCL. This is an excellent way of finding out any potential problems with a TDMF move before trying to execute it for real.

The output from the job looks something like

TDM1177I The source volume RGS002 is mounted on device 9198 on this system.
TDM1186I The target volume RRDF42 is mounted on device 8941 on this system.
other volumes lines, including any errors
TDM2405I This volume successfully selected for initialization.
TDM2281I The Master system is starting the initialization process for a volume.
TDM2283I The Master system is starting the migration process for a volume.
TDM2722I Volume termination requested by "SCAN ONLY".
TDM2293I The Master system is starting the termination process for a volume.
TDM2303I The Master system has completed the migration process for a volume.
TDM2410I All storage frames to migrate this volume have been successfully page freed

The source and target volumes are not affected in any way by the test. The only problem with this is that if you check the status of the run though option '6' on the TDMF panels, all the disks are in 'terminated' mode, as they were terminated by the scan. However if the jobs ends cc=0 then that is a good indication of success.

back to top