GDPS Overview

Mirroring data to a remote site is a good starting point for coping with a disaster, but much more is needed than just simple mirroring. How do you detect that you are getting problems at your primary site? How do you suspend mirroring if problems occur, so your remote site data is in a consistent state? What about switching applications and services? To some extent, this can be managed with 'stretched' clusters, for example a Windows server cluster with some of the servers in a remote site, but this does not cater for managing services that run on different platforms.

GDPS is about Business Resilience, the ability of IT systems to quickly recover from a disaster, or to be able to switch services quickly between datacenters to reduce downtime incurred by planned outages. GDPS should provide almost continuous application availability. The Recovery Time Objective (RTO), the time it takes to recover services is very short for GDPS, say between 1 to 2 hours. The Recovery Point Objective (RPO), the data loss time ranges between zero and a few minutes, depending on the GDPS configuration used. GDPS simplifies the management of ddata mirroring as you cna process whole groups of disks, or systems with a single command.

GDPS was originally developed for IBM mainframes using PPRC or Metro Mirror between two sites. However mainframe data is a relatively small part of the picture now, compared to 10 years ago, and applications run on different platforms. There is also a requirements for 3-site solutions, where two datacenters run at Metropolitan distances, but a third, very remote site exists for extreme DR. GDPS now comes in the following flavours:

  • GDPS/PPRC; Metropolitan distances, about 30km. Zero RPO, about 1 hour RTO, Mainframe plus limited Open Systems support
  • GDPS/XRC; Unlimited distance between 2 sites. 1-2 minutes RPO, about 1 hour RTO, Mainframe support, plus limited Open Systems if hosted on CKD disks
  • GDPS/Global Mirror; Unlimited distance between 2 sites. 1-2 minutes RPO, about 1 hour RTO, Mainframe plus limited Open Systems support
  • GDPS/Metro + Global Mirror; 3 site solution, Metropolitan distance between 2 sites with unlimited distance to third site. zero RPO and about 1 hour RTO for 'local' recovery, 1-2 minutes RPO, about 1 hour RTO for remote site. Mainframe plus limited Open Systems support
  • GDPS/Metro + z/OS Global Mirror; 3 site solution, Metropolitan distance between 2 sites with unlimited distance to third site. zero RPO and about 1 hour RTO for 'local' recovery, 1-2 minutes RPO, about 1 hour RTO for remote site. Mainframe plus limited Open Systems support if hosted on CKD disks

GDPS supports non-IBM storage devices, as long as they are compatible with PPRC.

GDPS Components

GDPS consists of a number of components, the base component, plus RCMF and PSMF.

RCMF (Remote Copy Management Facility) was ISPF panel driven, but now also has a GUI interface which makes it easier to use. RCMF simplifies the management of PPRC disks, as we can manage the whole configuration with single commands, instead of working with one volume at a time.

PSMF (Parallel Sysplex Management Facility), is a panel driven system to allow you to swap between parallel sysplex configurations.

GDPS in Operation

GDPS makes it easy to control site management. It goes way beyond what a Storage person would normally do, and it automates complete site switching using simple panel options. Two main scenarios are offered -

A planned outage, where GDPS will

  1. perform a controlled shutdown of the applications on the production site.
  2. Freeze a consistent set of updates to all the subsystems in the recovery site.
  3. Remove the systems from the Parallel Sysplex cluster
  4. If the secondary disks are to be used, swap the mirroring, so the secondaries become the primaries, and the primaries become the secondaries.
  5. If peer to peer virtual tape systems (PtPVTS) are under GDPS control, then they will be swapped, so the secondary PtPVTS becomes the primary.

The recovery site can then be "IPLed," and the network switched to make the applications available from the recovery site.

GDPS principles are illustrated in the GIF below

GDPS and Freeze

If your production site goes belly up, GDPS will freeze the remote copy to maintain data integrity. It is crucial that the data used after a disaster is consistent. If parts of databases are out of step with each other, then they will have to be recovered from backups and logs. This takes a long time. GDPS will freeze all data on all storage systems at the same point, so a GDPS recovery should typically take 30-60 minutes.

The disks will be swapped to the subsystems in the secondary site, and, from this point, the recovery will continue as described in the planned reconfiguration.

A freeze command will be issued, if GDPS detects any hardware errors in the system. Exactly what happens next will depend on what you ask GDPS to do. You can ask for -

  • FREEZE AND GO, freeze the secondary volumes, but allow processing to continue on the primary volumes.
    This prevents errors at the recovery site from affecting production service, but it also means that in a real disaster, you might lose a few transactions, if work can continue at the primary site, after the freeze.
  • FREEZE AND STOP, just that.
    All processing on both Primary and Secondary disks will stop if any error is detected. This sounds ideal, as it means no data loss. It also means your production site will get locked out every time a problem is detected. One disk has an error, all processing stops! Some people will want the data loss guarantee, and will be willing to take the hit on site freezing.
  • FREEZE AND STOP CONDITIONAL
    This one sounds a bit better. GDPS will try to work out if a real disaster is happening or not, then decide whether to use FREEZE and STOP or FREEZE and GO. In other words, system availability and data integrity are both just as good as your automation.
  • FREEZE AND SWAP,STOP - Hyperswap to second site and stop processing
  • FREEZE AND SWAP,GO - Hyperswap to second site and continue processing.
    A hyperswap always invokes a freeze to keep the data consistent. The controlling system will attempt to hyperswap to the secondary devices, and if it is successful, it will then swap all other systems. Any systems which cannot swap correctly will be put in system reset status so they do not continue accessing the primary disks.
    There is no hyperswap equivalent for Freeze and stop conditional

GDPS now has a set of z/OS dialogs which are used to create GDPS policy options. One of these is the GEOPLEX OPTIONS, which controls various functions within GDPS, including whether or not HyperSwap can be invoked automatically, and if tape errors can cause a DASD freeze.

back to top