SRDF

SRDF mirroring modes

SRDF has 4 modes. The choice basically depends on whether you want the best possible performance, or to be absolutely sure that your data is consistent between sites.

Synchronous (SRDFe/S)

In this mode, a copy of the data must be stored in cache in both local and remote machines, before the calling application is signaled that the I/O is complete. This means that data consistency between sites is guaranteed. If the remote symmetrix is more that 15k away, then this can significantly degrade performance.
When SRDF mirroring is running in SYNC mode, it is also possible to switch on the 'domino effect'. If you then get a problem with a disk or the SRDF links so that mirroring cannot proceed, the Symm places the other disk into 'not ready' mode, so it cannot be accessed by the host until the problem is fixed.

Semi-synchronous (SRDFe/A)

The data on a secondary logical volume can be one write I/O behind the primary, which may sound almost as good as Synchronous, but Semi-synch will not give you I/O consistency across volumes. The local symmetrix will return Channel end / Device end once a write I/O is safely in the local cache, and then it sets the logical volume to busy status, so it will not accept any more writes. Then SRDF passes the write I/O to the remote symmetrix, and once it is safely stored in cache there, the busy flag is removed from the logical volume.
The advantage of Semi-synch is that the application does not have to wait for the remote I/O to complete, so performance does not suffer.
The disadvantage is that in a disaster there is no guarantee that all the I/Os that an application thinks it completed actually made it to the remote site. There could be several write I/Os queued up in the local controller (one for each logical disk) and these are processed by a FIFO queue. If an application is sending I/Os to more than one controller, there is no FIFO synchronisation between controllers so the remote data could be inconsistent.

Adaptive copy - write pending (SRDFe/AR)

Data is written asynchronously to the secondary device and can be up to 65535 IOs behind the primary. Data which has not been copied are called 'dirty tracks', and the amount of dirty tracks permissible is set by a 'skew value' parameter. If the skew value is exceeded, then the mode switches to Synchronous or Semi-synchronous until the remote symm catches up. At that point, it switches back to adaptive copy-write pending mode. Adaptive copy is useful where sites are too far apart for synchronous operation, and some data loss is acceptable.

Adaptive Copy - Disk (SRDFe/DM)

This mode is intended for electronically moving data between sites. There is no I/O consistency across volumes; data is simply moved without any acknowledgment

SRDF STAR

Some sites have a requirement for 'belt and braces' disaster recovery, where they need three operational sites, so that if one is lost, they can still run with mirroring protection between two, or maybe if there is a metropolitan disaster that means two sites are lost, there is still a third, remote site. SRDF STAR is intended to provide that level of resilience.

Let us call the production site SiteA, the 'local' secondary site SiteB and the remote third site SiteC. By local we mean SiteA and SiteB must be within 200km of each other, while SiteC can be an unlimited distance away.
SiteB is then a synchronous mirror of SiteA, while SiteC asynchronously mirrors the production data, either direct from SiteA, or from SiteB, depending on the configuration. SRDF STAR comes in two flavours:

CONCURRENT SRDF/STAR, the data is synchronously mirrored between SiteA and SiteB, and asynchronously mirrored between SiteA and SiteC. Links exist between SiteB and SiteC, so that if SiteA is lost, differential mirroring can be established between siteB and SiteC, and so a DR position can be maintained. Production systems will have to be switched to SiteB, so there will be some impact on services.

CASCADED SRDF/STAR, the data is synchronously mirrored between SiteA and SiteB, and asynchronously mirrored between SiteB and SiteC. Links exist between SiteA and SiteC, so that if SiteB is lost, differential mirroring can be established between siteA and SiteC, and so a DR position can be maintained. As SiteA is not affected, there should be no service impact.

SRDF Volume terminology and Device Groups

R1, Source volume, the production volume that is accessed by the user, equivalent to a PPRC primary volume

R2, Target volume, the mirrored copy of a source volume, equivalent to a PPRC secondary volume

Local volume, simply a non-mirrored volume. (EMC often use the term 'mirroring' to describe RAID1 protection, which can cause confusion, as a local volume can be RAID1 mirrored. In this context, mirroring means remote mirroring between symms.)

SRDF volumes must be formed into device groups, a device group is just a set of volumes that needed to all be handled the same way. There are three different types of device group, corresponding to the three types of device above. The RDF group types are RDF1 and RDF2, normal disk groups are type regular. You define device groups with the commands below, where r1_devg_001 and r2_devg_001 are just names, you can call yours whatever you like. If you do not specify a -type parameter then the group type defaults to normal.

symdg create r1_devg_001 -type RDF1
symdg create r2_devg_001 -type RDF2

You then add devices to the correct device groups. The devices must be the correct type, local, R1 or R2, and must be in the same symm. The devices themselves can be standard, RAID or BCV, as long as they match the group type.

symdg list
symld -g r1_devg_001 add dev 01c

Volumes can be in three possible states.

  • Not ready (NR) - can't be accessed by the host at all
  • Write Disabled (RO) - can be accessed by the host for read only
  • Write enabled (RW) - can be accessed by the host for read and write

The actual status of a volume depends on its SRDF state, and its Channel interface state. A Source volume has six different possible combinations of states, and a target volume has nine.

The desirable state for a source volume is SRDF state=RW and CI state=RW so volume state=RW If a primary volume CI state is RW, but the SRDF state is NR, then it may be possible to access the data from the target volume, if it is in the correct state.

The desirable state for a target volume is SRDF state=RO and CI state=RW so volume state=RO

SRDF and Consistency Groups

An SRDF group is basically a set of SRDF director ports in a local symm that are configured to connect to another set of SRDF director ports in a remote symm. SRDF groups can be static or dynamic. Static group definitions are held in the bin file and are usually maintained by EMC staff. Dynamic RDF groups are maintained using CLI commands. They are often called RA groups or RDF groups and the three terms more or less mean the same thing. The command to create a new SRDF group looks like this, but many of the parameter values will depend on your site.

symrdf addgrp -label your_name -rdfg r1_devg_001 -sid 1234 -dir 4C -remote_rdf r2_devg_001 -remote_sid 4567 -remote_dir 2C

To add devices to your SRDF group, create a file that contains devices in pairs, where the first device is the R1 and the second device the corresponding R2 like this, it's always best of you can arrange devices to there is a straight correspondence between R1 and R1. Then you run the command below to pick them up. My text file is called rdf_list.txt

  0220    0320
  0221    0321
  0222    0322
  0223    0323

  srmrdf establish -sid 1234 -rdfg r1_devg_001 -type rdf1 -file rdf_list.txt -g group_name -estalish

Once you create an SRDF group, you can use composite SRDF commands to control all the disks in that group. For example

symrdf -g group-name failover

You can use this command to fail an entire consistency group over to the DR site. It will Write Disable the source volumes, set the link to Not Ready and Write Enable the target volumes

To Failback, that is restore service to your primary site, use the command

symrdf -g group-name failback

This will write disable the target (remote) disks, suspend the RDF link, merge changed disk tracks, resume the link then write enable the source disks.

While failback is in progress, you do not have a remote DR position. You can speed the failback operation up by copying invalid tracks before write disabling any disks with the command

symrdf -g group-name update

If you want to split the SRDF managed disks, that is stop mirroring and allow the disks at both sites to be updated independently, then you need the split command. This suspends the RDF link and write-enables the target disks.

symrdf -g group-name split

And once you do this, you will probably want to go back to an SRDF mirrored state again, so you need the establish command

symrdf -g group-name -full establish

This will write-disables the target disks, suspend the rdf link, Copy data from source to target then resume the rdf link.

The restore command does this the other way around. It will copy the data from the target disk back to the source. The command is

symrdf -g group-name -full restore

This write disables both source and target disks, suspends the rdf link, merges the track tables, resumes the rdf link then write enables R1

Other useful commands, which should be self explanatory are;

symrdf -g group-name suspend
symrdf -g group-name resume
symrdf -g group-name set mode sync
symrdf -g group-name set domino on
symrdf -g group-name set acp-disk skew 1000

A Consistency Group is a collection of volumes in one or more symmetrix devices that need to be kept in a consistent state. If a write to a Symmetrix cannot be propagated to the Remote Site, the Symmetrix will hold the I/O for a fixed period of time. At the same time it presents a SIMM back to the host. The Congroup STC will detect the SIMM and issue the equivalent of PPRC FREEZE to all the other Symmetrix online to that Host. All Volumes in that consistency Group will then be suspended. Once they are all suspended the equivalent of PPRC RUN is issued and I/O can complete, including the first I/O that triggered the SIMM.
Consistency Group processing with SRDF does not lose data because it employs a FREEZE/RUN approach similar to PPRC FREEZE/RUN.

To create a consistency group, add devices to it and enable it, you use commands

symcg create r1_cg001 -type rdf1
symcg -cg r1_cg001 -sid 1234 add dev 0220
symcg -cg r1_cg001 -sid 0011 add dev 001C
symcg -cg r1_cg001 enable

SRDF data replication software from EMC arguably has better functionality than PPRC, but it used to have one major failing when used on an IBM mainframe, its command set was totally different. What? Well, SRDF commands only work on EMC disks. Other vendors such as HDS took the IBM PPRC command set, and interpreted it to run their own replication software, so the underlying code is different, but the command set is the same. This meant that you could run a disk farm of IBM and HDS disks, and control all the mirroring using one set of commands. EMC did have a half-way solution; you could run a mainframe started task that intercepted the PPRC commands and converted them to SRDF commands before passing them down the channel. This was far from ideal, and prone to error. However, EMC have now joined the fold; they will now accept native PPRC commands at the Symmetrix, and convert them into SRDF commands in the microcode.

back to top