TDMF Overview

TDMF (Transparent Data Migration Facility) is used to move z/OS mainframe disk volumes without affecting active applications. It is completely independent of any hardware microcode, so it can be used to move disks between different vendor's storage equipment. Provided line speeds are adequate to deal with active I/O rates, it can also be used to move volumes between data centres. TDMF moves volumes; its sister product zDMF moves datasets.
TDMF can also be used for a PIT migration, where the source and target disks are not swapped once all the data is copied. This could be used to create a system copy. Perpetual PIT is another function, used to keep copies of data for recovery purposes. Once all the data is copied, the target disks are frozen, but TDMF continues to record changes to the source disks. You can then reply to a prompt message to create a new PIT, without needing to copy all the data.

TDMF can run as a started task but I've only ever seen it used in Batch. In brief the move process involves;

  1. Check that all the move conditions are correct to ensure data integrity
  2. Copy all tracks from the source to the target disk, while recording any tracks that have changed at the source
  3. Copy all changed tracks to the target while still recording changed tracks at the source. Repeat this until the number of changed tracks is less than a threshold number
  4. Temporarily quiesce the I/O from all applications on all LPARS, copy remaining changed tracks
  5. Switch volumes so the source goes offline and the target online
  6. Resume the applications

TDMF can move most z/OS volumes that are in ECKD format, the only exceptions are volumes containing active local page datasets, swap datasets and active coupling facility datasets. Native Linux volumes and VM volumes cannot be moved with TDMF as they are not in CKD format. TDMF z/OS v5.5 supports migration of 1 terabyte EAV volumes.

Before you start to use TDMF you need to run a job called SYSOPTN which sets up the security keys, defines some TDMF parameters and sets some default options. If your site has used TDMF previously then you may need to run it in update mode to refresh the licence. If you want to know what those default options are, then go into TDMF and take option 9 from the main panel. You will then see a screen that shows you all the installation options. If you press enter on this screen, you will also see the current TDMF version number.

A TDMF session can be defined as the set of batch jobs, Master and Agents, required to perform a TDMF migration and an associated COMMDS file. The following actions are needed to run a TDMF session

  1. Allocate the COMMDS. The COMMDS is a dataset used to communicate between LPARS, whihc must be unique for each TDMF session.
  2. Set up the Master Job. The Master system runs on one LPAR and controls the migration sessions.
  3. Set up the Agent Jobs. The Agents run on the other LPARS and record the I/O activity on those LPAPS. From TDMF version 5.0, up to 63 agents can be defined.
  4. Run the Master Job
  5. Run the Agent Jobs. These jobs must start no more than 15 minutes after the Master.
  6. Monitor

Setting up a TDMF session

First, make sure the TDMF software is up to date, with all current PTFs applied.

The COMMDS

The COMMDS (sometimes called the SYSCOM file as it is pointed to by a SYSCOM DD statement in batch jobs) is used to pass information between the Master task and the Agent tasks on the other LPARS, and contains the status and messages related to a specific session. You also use the COMMDS in the TDMF TSO monitor to get information about current and past sessions. TDMF uses hardware RESERVES to serialise disk I/O and so the COMMDS must be placed on a quiet volume, preferably on a dedicated volume. Use a dataset name that is excluded from SMS just to ensure the COMMDS is placed on the volume that you want, and make sure it is not on a volume that you are moving with TDMF.

You must have a different COMMDS for each active TDMF session, but it is best to use a separate COMMDS name for each job so you can check previous TDMF runs. A good standard is to incorporate the job name into your file name, as this makes it easy to tie a COMMDS back to a specific job, for example TDMF.jobname.COMMDS.

The communication dataset must be allocated on a cylinder boundary with contiguous space, and it must be on a CKD/E disk. If you are relocating to a new disk subsystem or Data Center then it is best to allocate a TDMF specific, non-SMS volume on the target subsystem.

The syscom dataset itself is formatted and cannot really be browsed directly; you need to view it with TDMF.

The size (number of cylinders required) is based upon the following formula:

CYLS = V * (S + K)

Where:
V = is related to the number of volumes:

  • 64 volumes V = 2.5
  • 128 volumes V = 5.0
  • 256 volumes V = 7.5
  • 512 volumes V = 10.0

S = is the number of participating systems or LPARS
K = is related to the size of the source volumes involved

  • 3390-3 K = 4
  • 3390-9 K = 6
  • 3390-27 K = 15

For example: if you are moving 128 3390-3 and 128 3390-9 volumes across 8 LPARs. Setting 'K' for the largest device type in session,

CYLS = 7.5 * (8 + 6) CYLS = 105

If you are allocating a lot of COMMDS files for a migration, the best way to do it is to use an in-stream JCL procedure like this.

//ALLOCS PROC
//PROC01 EXEC PGM=IEFBR14
//NEWLIB DD  DISP=(,CATLG,DELETE),
//    DSN=&DSN, UNIT=SYSDA,
//    SPACE=(CYL,(75,1),,CONTIG),
//    DCB=(LRECL=4096,BLKSIZE=4096,
//    RECFM=F,DSORG=PS)
//    STORCLAS=NONSMS,VOL=SER=TDMF01
// PEND
//STEP001 EXEC ALLOCS,DSN='TDMF.JOB00001.COMMDS'
//STEP002 EXEC ALLOCS,DSN='TDMF.JOB00002.COMMDS'
//STEP003 EXEC ALLOCS,DSN='TDMF.JOB00003.COMMDS' etc

TDMF has one quirk that you should be aware of when using a COMMDS: it bypasses the VTOC and accesses the data by absolute track. Suppose that you are running a lot of TDMF jobs, and you have one disk that you are using to store all your COMMDS files. When you complete a section of the migrations, you may want to wipe that disk and start again, so you delete all the old COMMSDS files and allocate a new set. Now, if you go into TDMF and pick up a new COMMDS from a job that has not run yet, say using option 7 - PAST SESSION DISPLAY DETAILS, instead of telling you there is no data to display, it will retrieve the data that still exists on disk from the previous old COMMDS and return it. This can cause a bit of confusion! This is not a bug, it is a feature of TDMF. The only foolproof answer is to always use a new, clean volume if you can.

The Master task

There is only one master task per session or batch job, but a Master task can handle multiple groups of volumes. All active LPARS must be defined to the Master system to prevent data corruption issues. You should place your Master task on the LPAR with the most update activity to the volumes that you are moving. It is possible to run several Master tasks in parallel, but they should all have their own COMMDS and you must run either GRS or MIM to ensure data integrity. The Master Task:

  • Initialises the system environment and the COMMDS
  • Establishes the XCF environment
  • Starts and controls all sessions
  • Monitors Source volume I/O activity
  • Monitors Target volume I/O activity
  • Copies data from Source to Target
  • Processes Source updates identified by the Agents

Sample JCL to run a Master task as a batch job is

//TDMF EXEC PGM=TDMFMAIN,PARM=MASTER
//STEPLIB DD DISP=SHR,DSN=SYS3.TDMF.PROCLIB
//SECCOM DD DISP=SHR,DSN=SYS3.TDMF.PROCLIB
//SYSCOM DD DISP=SHR,DSN=TDMF.JOB00001.COMMDS
//SYSPRINT DD SYSOUT=*
//SYSSNAP DD SYSOUT=*
//SYSIN DD *
SESSION XPADISKS
    MASTER(SA00) AGENTS(SB00 SC00 SD00)
    OPTIONS(
      UNIDENTIFIEDSYSTEMS(TERMINATE)
      CHECKTARGET
      CONCURRENT(05 ACTIVE)
      PACING(NORMAL)
      NOPROMPT
      NOPURGE
      FASTCOPY
      TIME(LOCAL)
      RELABEL(TD)
      )
  MIGRATE XPA20A SPA92A
  MIGRATE XPA20B SPA92B

The two MIGRATE statements above identify two volumes, XPA02A/B that are moving to the addresses used by volumes SPA92A/B. You would normally want more than two volumes in a job. How many volumes should you have? That is really up to you, but bear in mind that if you make your jobs too small, then running and checking them will be very laborious, while if you make them too big, you have to wait a long time for them to finish, especially if you are having problems. It is not a good idea to cancel a migration. The source problem should not be affected by a cancel, but you might have to re-initialise the target.

As a rule of thumb, it takes about 2-3 minutes to move a 3390-3 and 5-8 minutes to move a 3390-9. I'd suggest a 30 minute job runtime is reasonable, so if you are running 10 migrates in parallel, then 120 mod3s or 60 mod 9s could be appropriate. The exception is system volumes like SYSRES, PAGE, SPOOL etc, which should always run individually. You can define a maximum of 512 volumes to the Master task, but the total number of volumes that can be defined to both Master and Agents is 2048. So if you have 3 agent tasks, you can move 512 volumes at once.

The OPTIONS can be used to override the global options set by the SYSOPTN job. These are SAMPLE options and may not be valid for your site. I'd suggest that you check the TDMF manual for a full explanation of these options, select the ones that work for you then TEST THEM on appropriate data and systems.

UNIDENTIFIED SYSTEMS is used to determine what to do about systems or LPARS that have a route to a volume, but no Agent is active on those systems. Options are IGNORE, WARN, ERROR and TERMINATE. This is not a foolproof way of identifying all systems as it depends on 3990-6 controller facilities and not all vendors support these.

CHECKTARGET means check that the volume is empty before proceeding

NOPROMPT means TDMF will not send out a confirmation message before synchronising source with target.

The RELABEL(TD) option means that when the migrates are complete, the source volumes will be relabelled as TDA92A/B. Alternatively, you can code this explicitly for each volume in the migrate statements as follows:

MIGRATE XPA20A SPA92A TDA92A
MIGRATE XPA20B SPA92B TDA92B

FASTCOPY means that TDMF will just copy used bytes to the target disk. This is appropriate for copying to new disks, but may not be a good idea if you are copying over existing data.

PACING means that TSMF will initially move 15 tracks at a time, but will reduce this if it finds that the disk is busy.

NOPURGE means do not delete off any existing data on the target that was not overwritten by the source data.

The Agent Tasks

There must be an agent on every LPAR except the Master LPAR. The Agents

  • Communicate with the Master for migration requests
  • Monitor Source I/O activity on their LPAR
  • Monitor Target I/O activity on their LPAR
  • Notifies the Master about any Source I/O updates

It is possible to run more than one Agent job in an LPAR, as long as each Agent is associated with a different Master task and communicates with that task with a separate and unique COMMSDS. All the Agent tasks on every LPAR must be started within 15 minutes of the Master task starting or the session will time out. However an Agent can be started before the Master.

Sample JCL for an Agent task is

//STEP1 EXEC PGM=TDMFMAIN,PARM=AGENT
//STEPLIB    DD DISP=SHR,DSN= SYS3.TDMF.PROCLIB
//SECCOM     DD DISP=SHR,DSN= SYS3.TDMF.PROCLIB
//SYSCOM     DD DISP=SHR,DSN=TDMF.JOB00001.COMMDS
//SYSPRINT   DD SYSOUT=*
//SYSUDUMP    DD SYSOUT=*
//SYSSNAP    DD SYSOUT=*
//SYSIN      DD DUMMY
//

If all your participating LPARS are in a single SYSPLEX then you can easily set the Master and all the Agents in one PDS member, using
/*JOBPARM S=system
route commands to make sure the correct job runs on the correct system, then you just type 'SUB' once to run all the jobs.

Once you submit all your jobs, you can check progress from option 1 of the TDMF ISPF monitor. You need to point it to the correct COMMDS, then you will get a list of all volumes being moved with a percent complete action bar for each one.

The 7 phases of TDMF

The Master system initiates and controls all migrations/replications. The Master initiates each phase and all Agents must acknowledge this to proceed.

SYSTEM INITIALISATION phase

System initialisation involves the Master task and all the Agent tasks starting up within 15 minutes, and reporting error-free validation for all volumes within a session. Checking includes making sure no other LPARS are accessing those volumes, and if the TDMF session has been set up to use the System Authorisation Facility (SAF), then the volumes must have the correct SAF authorisation. For Migration, you need ALTER authority for both Source and Target volumes. If you use SAF you also need to set VOLUME SECURITY = YES in the SYSOPTN batch job as a default TDMF value.

INITIALISATION phase

This phase confirms that the source and target volumes are valid and if requested, waits for the Operator to reply to the Confirm WTOR. Once this is confirmed, the volume-level control blocks and real storage frames are allocated. It also confirms that all source and target volumes are online to all systems.

ACTIVATION phase

This phase starts the copy task and enables the monitoring of user I/O activity. While the data is copied from source to target, if updates to a source volume are detected in any participating LPAR, the Master system gathers that information for the REFRESH task. All the volumes are also allocated to prevent them from being taken offline.
Once all the tracks are copied by the COPY volume task, the Master then starts the copy REFRESH task. Further updates may happen, so the Master will run multiple refresh tasks until TDMF determines that synchronization of the target volume may be achieved, at which time, the Master system will move on to the Quiesce phase. Copy and Refresh are sometimes recorded as separate phases.

QUIESCE phase

The Master system instructs all Agents to stop all I/O activity to the source volume and pass it a final list of all updated tracks. The Master then performs a copy synchronous task to make the target disk a replica of the source.

Volume I/O redirect phase

All I/O is now permanently redirected to the target volume, which is effectively now the source volume. Once the redirect request is successful, the Master rewrites volume labels on both source and target.

Resume phase

The Master initiates a resume request via the Agents, to resume all I/O activity, now directed to the Target volume, and the original Source is varied offline.

Terminate phase

When a volume completes a migration, that volume's fixed storage will be freed for possible re-use within the current session

Hints and tips

The key to a successful TDMF migration is careful planning up front, which applies to most projects of course. There are a few datasets and volumes that need special treatment and some of them are discussed here, but consult your TDMF manual to get a full picture.

Be aware of the impact that TDMF might have on your disk channels, especially if your system is busy. Monitor a small test run and try to keep channel utilisation below 75%. When you set your jobs up, spread the workload so the jobs run over several channels. If you are migrating off old ESCON channels then a good rule of thumb is no more than 2 jobs per channel.

Unidentified LAPRS

Many large sites have more than one SYSPLEX, or they have LPARS are not part of the SYSPLEX. It is probably that some volumes will be shared and online between SYSPLEXES or rogue LPARS. Typically these will be IODF volumes, tape management system volumes or volumes used by Sysprogs for various nefarious purposes.
This means that you have a good chance of a TDMF job failing with an unidentified system message. If you are migrating an entire string of disks, and that string has one volume online to an unidentified system, then the entire string will fail with an error message like

TDM2381I This source volume connected to 1 unidentified system(s).
TDM2382I 8000029880 2094 02/28/2009 01:18:46.

The answer is to find the rogue LPAR, look at the string and check out which volumes are online. You will have already identified these in your detailed planning of course. You can then safely change the UNIDENTIFIEDSYSTEMS(TERMINATE) parameter in the MASTER job to UNIDENTIFIEDSYSTEMS(IGNORE) and rerun the job for those volumes that are NOT online to the rogue LPAR. For the volumes that are online, you need to allocate a COMMDS that can be accessed from every LPAR, then rerun the jobs for those disks with an agent on every LPAR including the rogue one. Alternatively, you can move the data without using TDMF.
The LPAR can be identified from the TDMF2382I message, the last four characters "9880" in the message are the CPU number, and the 5th last "2" is the LPAR number.
TDMF z/OS now utilizes QHA and SPID fence to help ensure that the TDMF agent is running on all systems which share a device to be migrated when a migration begins. SPID prevents DASD that is intentionally offline from being brought online. If a volume is in a SPID fence status, then z/OS systems cannot bring it online. This means that if you have z/OS LPARS that were not online to the disks, and so do not have agent tasks, then they cannot bring the disks online in the middle of the migration and corrupt the data. You can use DEVSERV to see if volumes are in SPID fenced status and ICKDSF to clear a SPID fence. The 'SPF' value in the Extended Function Checking field below means this volume is SPID fenced.

DS QD,1234
IEE495I 09.22.18 DEVSERV QDASD 136
UNIT VOLSER SCUTYPE DEVTYPE CYL   SSID SCU-SERIAL DEV-SERIAL EFC
1234 ABC123 2107961 2107900 30051 0078 0175-W3107 0175-W3107 SPF

//STEP EXEC PGM=ICKDSF,PARM='NOREPLYU'
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
CONTROL -
UNIT(1234) -
CLEARFENCE -
SERIAL(ABC123) -
SCOPE(DEV)

Special Datasets and Volumes

Be aware of where your TDMF load library resides and take care not to try to move it with TDMF. When you need to move that volume, move the load library to another disk and temporarily APF authorise it.

Watch out for various control datasets, like DFHSM BCDS / MCDS / OCDS, HSC, RACF and MIM control files. TDMF can move them, but it is recommended that you move them one at a time. For absolute safety, it is recommended that DFHSM and DFRMM be stopped when moving their control datasets.

Several CA products have control files that need special handling. For example, if you migrate the CA7 Commsds then you should shut down both CA7 and ICOM. See the manual for full details.

If your work volumes are SMS managed and you have sufficient capacity, it is also a good idea to QUINEW the volumes in SMS before the move to prevent new allocations and limit the amount of active IO to the disks

V SMS,VOL(xxxxxx,ALL),Q,N

When you are finished, enable them again with

V SMS,VOL(xxxxxx,ALL),E

Note: These SMS commands act over all LPARS and while that is the correct action to disable or quiesce volumes, that might not be the correct enabled configuration for your site.

When you move SYSRES volumes, the unit address will change, and it's the unit address that is used to reference a SYSRES volume at IPL time, not the VOLSER. So when you move a SYSRES volume, you need to let the Operators and System programmers, and anyone else who might be interested know about the new unit address. It is good practice to move the SYSRES and alternate SYSRES volumes in separate sessions. When you move a SYSRES volume your TDMF job will end CC=4, this is normal, it's just warning you that it's a SYSRES volume.

JES Spool volumes are usually busy so when you move them, do them in a quiet period, one at a time and set them to Drain to prevent new access. Remember to change the JES CHKPT addresses in SYS1.PARMLIB(COMMND00)

You cannot move active local page datasets with TDMF, if you try your job will fail CC=12 (yes, been there!). It's best to move these volumes one at a time with sysprog assistance, get them to drain the volumes before you try to move them. You can move volumes that contain PLPA and Common Page datasets, but they should be run with 1 volume per session.

The best way to handle SYSPLEX coupling datasets is to just switch to the alternates, then move the originals when they are not in use. If you use GDPS then you must switch the datasets with a GDPS script or you may cause a system outage.

How to make sure no-one else uses your Targets

The best way to set your targets up is to initialise them as SMS in your ISKDSF job by using the SG parameter, but do not add them to any SMS storage pool. This means that a non-SMS user cannot allocate data on them as they are SMS defined, but they cannot be used by SMS as they are not in a pool. When TDMF migrates a source onto a target it copies the VTOC and the VOLSER from the source, so the target then becomes usable. If you really want your target volumes in an SMS pool, then they must be set to DISNEW. Use the command

V SMS,VOL(xxxxxx,ALL),D,N

Modern disk subsystems have the ability to isolate disks by using a process called 'soft fencing'. TDMF will use soft fencing be default if it is available. Once a migration is complete and processing swaps to the target volume, TDMF will put the original source volume into soft fence status, so ensuring that no IO can go to the original disk. If you want to know if a volume is in soft fenced status, use the DEVSERV command. In the example below, the value SOF in the EFC field mens that this volume is in soft fenced status

DS QD,1234
IEE495I 09.22.18 DEVSERV QDASD 136
UNIT VOLSER SCUTYPE DEVTYPE CYL   SSID SCU-SERIAL DEV-SERIAL EFC
1234 ABC123 2107961 2107900 30051 0078 0175-W3107 0175-W3107 SOF

Hardware considerations

If you are a GDPS user then set Hyperswap to OFF while doing any TDMF moves. TDMF version 5 onwards will automatially issue HYPERSW OFF and HYPERSW ON commands for you to minimise hyperswap downtime.

The volumes that you are moving cannot be in an active Flashcopy relationship.

If a TDMF job is cancelled for any reason then the source or target volumes could have an invalid DPTSIO pointer. If your COMMDS is intact, you can fix this by running the original TMDF jobs again, with PARM=RECOVERMASTER or RECOVERAGENT as appropriate.

Softek recommends that you switch DASD fast write cache off while moving work volumes. Your DASD may not allow this, but if it does, the commands are

//S1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
SETCACHE VOLUME(xxxxxx) CACHEFASTWRITE UNIT(3390) OFF

And to switch cache back on again after the move,

SETCACHE VOLUME(xxxxxx) CACHEFASTWRITE UNIT(3390) ON

Note that while the command just references one volume, cache is actually turned off for the whole subsystem.

If you move a smaller disk to a larger disk then you need to rebuild the VTOCIX so it recognises the extra free space. The best way to do this is to use the TDMF parameter EXTVTOC and let TDMF do it as part of the move. Otherwise, you need to vary the volume offline to all other LPARS, then run the following job

//S1 EXEC PGM=ICKDSF,PARM='NOREPLYU'
//SYSPRINT DD SYSOUT=J
//SYSCX1 DD UNIT=3390,VOL=SER=volser,DISP=SHR
//SYSIN DD *
REFORMAT DDNAME(SYSCX1) VERIFY(volser) REFVTOC
/*
//

TDMF will clip the source volume to a different volser at the end of the move. All this does is changes the volume label (Change Label In Place) but the original data is still there, and the original VTOC too. If you are not decommissioning the subsystem, then this may cause confusion later as it looks like the volume is full of data, so it is good practice to re-initialise the disks once the moves are complete. If you are decommissioning the volumes, then you really should be running something like FDRERASE to wipe the data anyway.
TDMF was recently enhanced to take advantage of the Soft Fence capability in z/OS v1.12 and higher. This marks the old volumes as unavailable for access after swap migration so as to prevent any accidental use by the system.

TDMF and disk mirroring

Disk volumes are often mirrored for disaster recovery purposes, and different hardware vendors use different mirroring products, like SRDF, TruCopy and PPRC. If you are migrating to a different type of disk, both source and target disks could be using different types of remote mirroring and this could affect, or even destroy your DR position. If the disaster recovery or mirroring type is not compatible between any source and target volumes then TDMF will fail the whole session, unless you specify the ALLOWmirrorchange option. You can use ALLORmirrorchange(NOACK), in which case TDMF ignores any incompatibility issues for this session. The alternative is ALLOWmirrorchange(ACK) in which case if TDMF detects an incompatibility it will wait for an acknowledgment that you must send through the TDMF monitor, a TDMF batch monitor, or the system console. The default is NOALLOWmirrorchange.

TDMF testing

If you are trying to put together a complex migration plan and you are struggling to get your head around all the variables, then you can validate the correctness of your TDMF job by specifying
EXEC PGM=TDMFMAIN,PARM=(MASTER,SCAN)
in your master JCL. This is an excellent way of finding out any potential problems with a TDMF move before trying to execute it for real.

The output from the job looks something like

TDM1177I The source volume RGS002 is mounted on device 9198 on this system.
TDM1186I The target volume RRDF42 is mounted on device 8941 on this system.
other volumes lines, including any errors
TDM2405I This volume successfully selected for initialization.
TDM2281I The Master system is starting the initialization process for a volume.
TDM2283I The Master system is starting the migration process for a volume.
TDM2722I Volume termination requested by "SCAN ONLY".
TDM2293I The Master system is starting the termination process for a volume.
TDM2303I The Master system has completed the migration process for a volume.
TDM2410I All storage frames to migrate this volume have been successfully page freed

The source and target volumes are not affected in any way by the test. The only problem with this is that if you check the status of the run though option '6' on the TDMF panels, all the disks are in 'terminated' mode, as they were terminated by the scan. However if the jobs ends cc=0 then that is a good indication of success.

back to top